Source: MS Access on Windows network share
Target: MySQL/MariaDB on Ubuntu
Tools: mdb-export, mysqlimport
record count: 1,5 Mio +
I wonder if there is a fast and reliable way of comparing the imported data records.
Is there an SQL standard equivalent to e.g. md5 fingerprint hashes of files? Right now, I am building different import routines and I only want to fast check for similarity and (if failed) search for the detailed differences later on.
A somewhat of a quick-and-dirty approach for individual columns can be implemented using stored aggregate functions which should be SQL standard.
This is how you'd do it with MariaDB:
CREATE AGGREGATE FUNCTION IF NOT EXISTS my_checksum(x TEXT) RETURNS CHAR(40)
DETERMINISTIC
BEGIN
DECLARE cksum CHAR(40) DEFAULT SHA1('');
DECLARE CONTINUE HANDLER FOR NOT FOUND
RETURN cksum;
LOOP
FETCH GROUP NEXT ROW;
SET cksum = SHA1(CONCAT(cksum, x));
END LOOP;
END
You can then calculate a checksum from of a column as such:
MariaDB [test]> create or replace table t1(data varchar(20));
Query OK, 0 rows affected (0.063 sec)
MariaDB [test]> create or replace table t2(data varchar(20));
Query OK, 0 rows affected (0.064 sec)
MariaDB [test]> insert into t1 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.011 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> insert into t2 values ('Hello'), ('World'), ('!');
Query OK, 3 rows affected (0.015 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select my_checksum(data) from t1;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
+------------------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> select my_checksum(data) from t2;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 5f683ea3674e33ce24bff5f68f53509566ad4da2 |
+------------------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> delete from t2;
Query OK, 3 rows affected (0.011 sec)
MariaDB [test]> insert into t2 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.012 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select my_checksum(data) from t2;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
+------------------------------------------+
1 row in set (0.001 sec)
This of course relies on the SHA1 of the column being the same on all databases. Conversions into strings should make it mostly compatible but there might be differences in how these are implemented in different databases.
The percona toolkit has the tool you need.
https://docs.percona.com/percona-toolkit/
See pt-table-checksum and pt-table-sync
I found it.
It's very simple and very fast.
CHECKSUM TABLE tbl_name
Will give you a number value to compare.
And it's Transact-SQL so will hopefully work the same on MS Access, MySQL and MariaDB
Related
I have a mysql table which has a data structure as follows,
create table data(
....
name char(40) NULL,
...
)
But I could insert names which has characters more than 40 in to name field. Can someone explain what is the actual meaning of char(40)?
You cannot insert a string of more than 40 characters in a column defined with the type CHAR(40).
If you run MySQL in strict mode, you will get an error if you try to insert a longer string.
mysql> create table mytable ( c char(40) );
Query OK, 0 rows affected (0.01 sec)
mysql> insert into mytable (c) values ('Now is the time for all good men to come to the aid of their country.');
ERROR 1406 (22001): Data too long for column 'c' at row 1
If you run MySQL in non-strict mode, the insert will succeed, but only the first 40 characters of your string is stored in the column. The characters beyond 40 are lost, and you get no error.
mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into mytable (c) values ('Now is the time for all good men to come to the aid of their country.');
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> show warnings;
+---------+------+----------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'c' at row 1 |
+---------+------+----------------------------------------+
1 row in set (0.00 sec)
mysql> select c from mytable;
+------------------------------------------+
| c |
+------------------------------------------+
| Now is the time for all good men to come |
+------------------------------------------+
1 row in set (0.00 sec)
I recommend operating MySQL in strict mode (strict mode is the default since MySQL 5.7). I would prefer to get an error instead of losing data.
I have a column in the database's table with data format like this "000011" and an SQL query like:
SELECT *
FROM table
WHERE a = 000010 OR a = 000001 or a = 000011
But if the value is 111111. It will have a lot of OR condition in it.
If data format like 001 (3 digits) it's can use wildcard ( _ )to do this, but I'm stuck when trying to use it in case (6 digits).
Please help me to find other ways?
First, you can use in:
SELECT *
FROM table
WHERE a in (000010, 000001, 000011)
But, I suspect your "data" is actually an integer and you want boolean & or |:
WHERE (a & 000011)
If you want to show data that contains a 1 then use LIKE
SELECT *
FROM table
WHERE a LIKE '%1%'
SELECT *
FROM table
WHERE position('1' in a) > 0
From what I understand you want to find all rows where the binary representation is less than your input. If that is the case, you could use the BINARY function to get the result you want:
mysql> create table bintab (a varchar(10));
Query OK, 0 rows affected (0.07 sec)
mysql> insert into bintab values('000001');
Query OK, 1 row affected (0.04 sec)
mysql> insert into bintab values('000010');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000011');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000100');
Query OK, 1 row affected (0.00 sec)
mysql> insert into bintab values('000101');
Query OK, 1 row affected (0.00 sec)
mysql> insert into bintab values('000110');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000111');
Query OK, 1 row affected (0.00 sec)
mysql> select * from bintab where binary(a) < binary('000100');
+--------+
| a |
+--------+
| 000001 |
| 000010 |
| 000011 |
+--------+
3 rows in set (0.00 sec)
There is quite a lot of confusion in the description of these variables, in official documentation of MySQL.
According to it, max_binlog_cache_size means,
If a transaction requires more than this many bytes of memory, the
server generates a Multi-statement transaction required more than
'max_binlog_cache_size' bytes of storage error.
max_binlog_cache_size sets the size for the transaction cache only
and binlog_cache_size means,
The size of the cache to hold changes to the binary log during a
transaction.
binlog_cache_size sets the size for the transaction cache only
On reading the documentation, I observed there is no difference among these two. There is also something very confusing in the documentation like,
In MySQL 5.7, the visibility to sessions of max_binlog_cache_size
matches that of the binlog_cache_size system variable; in other words,
changing its value effects only new sessions that are started after
the value is changed.
When I query the server variables it shows both. I have a MySQL 5.6 and a MySQL 5.7. All I need to know is, which variable I should consider and configure for which server.
binlog_cache_size for MySQL 5.6 and max_binlog_cache_size for MySQL 5.7??
There are additional confusing variables max_binlog_stmt_cache_size and binlog_stmt_cache_size, related to these.
Both variables can be configured in both versions, they have different meaning. Definitions in the manual and in the help are confusing; here is a much better explanation: http://dev.mysql.com/doc/refman/5.6/en/binary-log.html
binlog_cache_size defines the maximum amount of memory that the buffer can use. If transaction grows above this value, it uses a temporary disk file. Please note that the buffer is allocated per connection.
max_binlog_cache_size defines the maximum total size of a transaction. If the transaction grows above this value, it fails.
Below is a simple demonstration of the difference.
Setup:
MariaDB [test]> select ##binlog_cache_size, ##max_binlog_cache_size, ##binlog_format;
+---------------------+-------------------------+-----------------+
| ##binlog_cache_size | ##max_binlog_cache_size | ##binlog_format |
+---------------------+-------------------------+-----------------+
| 32768 | 65536 | ROW |
+---------------------+-------------------------+-----------------+
1 row in set (0.01 sec)
MariaDB [test]> show create table t1 \G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`a` text
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
1. Transaction size is below ##binlog_cache_size
(transaction succeeds, uses the cache, does not use the disk)
MariaDB [test]> flush status;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> begin;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.01 sec)
MariaDB [test]> insert into t1 values (repeat('a',10000));
Query OK, 1 row affected (0.04 sec)
MariaDB [test]> commit;
Query OK, 0 rows affected (0.05 sec)
MariaDB [test]> show status like 'Binlog_cache%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Binlog_cache_disk_use | 0 |
| Binlog_cache_use | 1 |
+-----------------------+-------+
2 rows in set (0.01 sec)
2. Transaction size is above ##binlog_cache_size, but below ##max_binlog_cache_size
(transaction uses the cache, and the cache uses the disk)
MariaDB [test]> flush status;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> begin;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.10 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.10 sec)
MariaDB [test]> commit;
Query OK, 0 rows affected (0.03 sec)
MariaDB [test]> show status like 'Binlog_cache%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Binlog_cache_disk_use | 1 |
| Binlog_cache_use | 1 |
+-----------------------+-------+
2 rows in set (0.01 sec)
3. Transaction size exceeds ##max_binlog_cache_size
(transaction fails)
MariaDB [test]> flush status;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> begin;
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.12 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.15 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
Query OK, 1 row affected (0.12 sec)
MariaDB [test]> insert into t1 values (repeat('a',20000));
ERROR 1197 (HY000): Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again
So, if your transactions are big, but you don't have too many connections, you might want to increase ##binlog_cache_size to avoid excessive disk writes.
If you have many concurrent connections, you should be careful to avoid connections trying to allocate too much memory for the caches simultaneously.
If you want to make sure that transactions don't grow too big, you might want to limit ##max_binlog_cache_size.
##binlog_stmt_cache_size and ##max_binlog_stmt_cache_size should work in a similar way, the difference is that %binlog_cache% values are for transactional updates, and %binlog_stmt_cache% for non-transactional updates.
While experimenting, please note that the values are not 100% precise, there are some hidden subtleties with initially allocated sizes. It shouldn't matter for practical purposes, but can be confusing when you play with low values.
I notice a very weird problem in my code. I am inserting a value of 128 but in my database it says 127.
I'd like to look at the mysql general/query logs however i dont ever see any log files produce no matter what i do. I tried -l , -l with an absolute path and --general_log_file. No luck. I also used mysqladmin flush-logs. Still nothing
Are you using a signed TINYINT datatype by any chance?
CREATE TABLE my_table (id TINYINT);
Query OK, 0 rows affected (0.03 sec)
INSERT INTO my_table VALUES (128);
Query OK, 1 row affected, 1 warning (0.00 sec)
SELECT * FROM my_table;
+------+
| id |
+------+
| 127 |
+------+
1 row in set (0.00 sec)
We are using MySql 5.0 on Ubuntu 9.04. The full version is: 5.0.75-0ubuntu10
I created a test database. and a test table in it. I see the following output from an insert statement:
mysql> CREATE TABLE test (floaty FLOAT(8,2)) engine=InnoDb;
Query OK, 0 rows affected (0.02 sec)
mysql> insert into test value(858147.11);
Query OK, 1 row affected (0.01 sec)
mysql> SELECT * FROM test;
+-----------+
| floaty |
+-----------+
| 858147.12 |
+-----------+
1 row in set (0.00 sec)
There seems to be a problem with the scale/precision set up in mySql...or did I miss anything?
UPDATE:
Found a boundary for one of the numbers we were inserting, here is the code:
mysql> CREATE TABLE test (floaty FLOAT(8,2)) engine=InnoDb;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into test value(131071.01);
Query OK, 1 row affected (0.01 sec)
mysql> insert into test value(131072.01);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM test;
+-----------+
| floaty |
+-----------+
| 131071.01 |
| 131072.02 |
+-----------+
2 rows in set (0.00 sec)
mysql>
Face Palm!!!!
Floats are 32 bit numbers stored as mantissa and exponents. I am not 100% sure how MySql will split the storage but taking Java as an example they would use 24 bits for a signed mantissa and 8 bits for an exponent (scientific notation). This means that the maximum value a FLOAT can have is +8388608*10^127 and the minimum is -8388608*10^127. This means only 7 significant digits, and my FLOAT definition used 8.
We are going to switch all of these 8,2 to DOUBLE from FLOAT.
MySQL docs mention "MySQL performs rounding when storing values" and I suspect this is the issue here. I duplicated your issue but changed the storage type to be DOUBLE:
CREATE TABLE test (val, DOUBLE);
and the retrieved value matched the test value you provided.
My suggestion, for what it's worth, is use DOUBLE or maybe DECIMAL. I tried the same original test with:
CREATE TABLE test (val, DECIMAL(8,2));
and it retrieved the value I gave it: 858147.11.