Improve ORDER BY RAND() in large table [duplicate] - mysql

This question already has answers here:
MySQL select 10 random rows from 600K rows fast
(28 answers)
quick selection of a random row from a large table in mysql
(24 answers)
Closed last year.
I have a very large table with about 6.6 million records and I want to select a random sample of 100,000 records
SELECT column FROM table
ORDER BY RAND()
LIMIT 100000
Is EXTREMELY SLOW on each record.
I have not found a solution that works with MySQL/MariaDB to extract a random sample of 100,000 records.
Please advise.
Thank you.

You may try to decrease the amount of rows to be sorted.
Example:
Create source data
mysql> create table test (id int auto_increment primary key, val int);
Query OK, 0 rows affected (0.05 sec)
mysql> set ##cte_max_recursion_depth := 10000000;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into test (val) with recursive cte as ( select 1 num union all select num+1 from cte where num < 6600000 )select rand() * 1000000000 from cte;
Query OK, 6600000 rows affected (1 min 48.62 sec)
Records: 6600000 Duplicates: 0 Warnings: 0
mysql> create table tmp (val int);
Query OK, 0 rows affected (0.05 sec)
Insert without sorting
mysql> insert into tmp select val from test limit 100000;
Query OK, 100000 rows affected (1.93 sec)
Records: 100000 Duplicates: 0 Warnings: 0
Insert with random sorting
mysql> insert into tmp select val from test order by rand() limit 100000;
Query OK, 100000 rows affected (26.31 sec)
Records: 100000 Duplicates: 0 Warnings: 0
Insert with random selection (1.1 is overage coefficient)
mysql> insert into tmp select val from test where rand() < 1.1 * 100000 / 6600000 limit 100000;
Query OK, 100000 rows affected (15.89 sec)
Records: 100000 Duplicates: 0 Warnings: 0
Insert with random selection (1.1 is overage coefficient) and random sorting
mysql> insert into tmp select val from test where rand() < 1.1 * 100000 / 6600000 order by rand() limit 100000;
Query OK, 100000 rows affected (19.26 sec)
Records: 100000 Duplicates: 0 Warnings: 0
Overage coefficient may be adjusted. If you decrease it then you'll improve the query (slightly) but the probability that the amount of output rows will be less than needed 100k rows will increase.

Related

Compare of imported databasea - (Fingerprinting possible?)

Source: MS Access on Windows network share
Target: MySQL/MariaDB on Ubuntu
Tools: mdb-export, mysqlimport
record count: 1,5 Mio +
I wonder if there is a fast and reliable way of comparing the imported data records.
Is there an SQL standard equivalent to e.g. md5 fingerprint hashes of files? Right now, I am building different import routines and I only want to fast check for similarity and (if failed) search for the detailed differences later on.
A somewhat of a quick-and-dirty approach for individual columns can be implemented using stored aggregate functions which should be SQL standard.
This is how you'd do it with MariaDB:
CREATE AGGREGATE FUNCTION IF NOT EXISTS my_checksum(x TEXT) RETURNS CHAR(40)
DETERMINISTIC
BEGIN
DECLARE cksum CHAR(40) DEFAULT SHA1('');
DECLARE CONTINUE HANDLER FOR NOT FOUND
RETURN cksum;
LOOP
FETCH GROUP NEXT ROW;
SET cksum = SHA1(CONCAT(cksum, x));
END LOOP;
END
You can then calculate a checksum from of a column as such:
MariaDB [test]> create or replace table t1(data varchar(20));
Query OK, 0 rows affected (0.063 sec)
MariaDB [test]> create or replace table t2(data varchar(20));
Query OK, 0 rows affected (0.064 sec)
MariaDB [test]> insert into t1 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.011 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> insert into t2 values ('Hello'), ('World'), ('!');
Query OK, 3 rows affected (0.015 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select my_checksum(data) from t1;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
+------------------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> select my_checksum(data) from t2;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 5f683ea3674e33ce24bff5f68f53509566ad4da2 |
+------------------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> delete from t2;
Query OK, 3 rows affected (0.011 sec)
MariaDB [test]> insert into t2 values ('hello'), ('world'), ('!');
Query OK, 3 rows affected (0.012 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select my_checksum(data) from t2;
+------------------------------------------+
| my_checksum(data) |
+------------------------------------------+
| 7f6fb9a61c2097f70a36254c332c47364c496e07 |
+------------------------------------------+
1 row in set (0.001 sec)
This of course relies on the SHA1 of the column being the same on all databases. Conversions into strings should make it mostly compatible but there might be differences in how these are implemented in different databases.
The percona toolkit has the tool you need.
https://docs.percona.com/percona-toolkit/
See pt-table-checksum and pt-table-sync
I found it.
It's very simple and very fast.
CHECKSUM TABLE tbl_name
Will give you a number value to compare.
And it's Transact-SQL so will hopefully work the same on MS Access, MySQL and MariaDB

Why MySQL reduce float precision and recover back would not change the number?

Here is the test:
First I create a field with float(11, 9) and insert a number 0.12345679, when selecting this row, it shows 0.123456791;
Second I alter the field to float(11, 4) and select it out, it shows 0.1235;
Third I alter the field back to float(11, 9) and get the number, it shows 0.123456791;
Why the action changing field precision would not lose the data's precision, and wondering why.
mysql> create table `test`.`test_float`(id int primary key auto_increment, `num` float(11, 9));
Query OK, 0 rows affected, 1 warning (0.03 sec)
mysql> insert into test.test_float(num) values(0.123456789);
Query OK, 1 row affected (0.00 sec)
mysql> select * from test.test_float\G
*************************** 1. row ***************************
id: 1
num: 0.123456791
1 row in set (0.00 sec)
mysql> alter table test.test_float modify num float(11, 4);
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 1
mysql> select * from test.test_float\G
*************************** 1. row ***************************
id: 1
num: 0.1235
1 row in set (0.00 sec)
mysql> alter table test.test_float modify num float(11, 9);
Query OK, 0 rows affected, 1 warning (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 1
mysql> select * from test.test_float\G
*************************** 1. row ***************************
id: 1
num: 0.123456791
1 row in set (0.00 sec)

SQL LIKE and OR conditions,

I have a column in the database's table with data format like this "000011" and an SQL query like:
SELECT *
FROM table
WHERE a = 000010 OR a = 000001 or a = 000011
But if the value is 111111. It will have a lot of OR condition in it.
If data format like 001 (3 digits) it's can use wildcard ( _ )to do this, but I'm stuck when trying to use it in case (6 digits).
Please help me to find other ways?
First, you can use in:
SELECT *
FROM table
WHERE a in (000010, 000001, 000011)
But, I suspect your "data" is actually an integer and you want boolean & or |:
WHERE (a & 000011)
If you want to show data that contains a 1 then use LIKE
SELECT *
FROM table
WHERE a LIKE '%1%'
SELECT *
FROM table
WHERE position('1' in a) > 0
From what I understand you want to find all rows where the binary representation is less than your input. If that is the case, you could use the BINARY function to get the result you want:
mysql> create table bintab (a varchar(10));
Query OK, 0 rows affected (0.07 sec)
mysql> insert into bintab values('000001');
Query OK, 1 row affected (0.04 sec)
mysql> insert into bintab values('000010');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000011');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000100');
Query OK, 1 row affected (0.00 sec)
mysql> insert into bintab values('000101');
Query OK, 1 row affected (0.00 sec)
mysql> insert into bintab values('000110');
Query OK, 1 row affected (0.01 sec)
mysql> insert into bintab values('000111');
Query OK, 1 row affected (0.00 sec)
mysql> select * from bintab where binary(a) < binary('000100');
+--------+
| a |
+--------+
| 000001 |
| 000010 |
| 000011 |
+--------+
3 rows in set (0.00 sec)

Is Offset should specify at the end of Sql query?

SELECT * CUSTOMERS LIMIT 5 OFFSET 0.
Assume CUSTOMERS is table of details. The above query works fine but if i specify offset other than end of query i get error.
Created a table with following details.
Table name is sms_view
Query:
SELECT SMS FROM sms_view WHERE read=2 LIMIT 5 OFFSET 0;
Result is
The above result is expected and it is based on read value. So, the table is created based on read value, offset and limit applied on the created table. so the result is shown above.
But My requirement is, offset and Limit should apply on entire table and read value should apply on created table.
Expected result is:
I need a query on expected result.
Yes, it should be at the end. See https://dev.mysql.com/doc/refman/5.7/en/select.html
SELECT * FROM CUSTOMERS
ORDER BY somecolumn -- important to get consistent results
LIMIT 5 OFFSET 0
Another way to do the same thing is:
SELECT * FROM CUSTOMERS
ORDER BY somecolumn
LIMIT 0, 5
or in this case (as the offset is 0):
SELECT * FROM CUSTOMERS
ORDER BY somecolumn
LIMIT 5
MariaDB [sandbox]> Drop table if exists sms_view;
Query OK, 0 rows affected (0.10 sec)
MariaDB [sandbox]> create table sms_view(SMS int,db_id int, `read` int);
Query OK, 0 rows affected (0.28 sec)
MariaDB [sandbox]> insert into sms_view values
-> (1, 2, 3) ,
-> (2, 2, 3),
-> (3, 2, 2) ,
-> (4, 2, 2) ,
-> (5, 2, 2) ,
-> (6, 2, 2) ,
-> (7, 2, 2) ,
-> (8, 2, 2) ,
-> (9, 2, 2) ,
-> (10, 2, 2);
Query OK, 10 rows affected (0.04 sec)
Records: 10 Duplicates: 0 Warnings: 0
MariaDB [sandbox]>
MariaDB [sandbox]> select sms from
-> (
-> SELECT * FROM sms_view LIMIT 5 OFFSET 0
-> ) s
-> WHERE `read` = 2;
+------+
| sms |
+------+
| 3 |
| 4 |
| 5 |
+------+
3 rows in set (0.00 sec)

MySQL Update Query, Rows Matched But Not Changed

Why will this not update 501 records? What's wrong with my query?
MariaDB [contacts]> UPDATE history h, phone_corrections t SET h.contact = t.new_nmbr WHERE h.contact = t.old_nmbr;
Query OK, 0 rows affected (0.03 sec)
Rows matched: 501 Changed: 0 Warnings: 0
MariaDB [contacts]>
FIXED! There were several records where the wrong_nmbr field was the same value as the right_nmbr field. Sorry for the post.