I've found a different behaviour in how a query is interpreted between these 2 databases and wondered if anyone could shed any light on what is happening here. The query looks like this:
SELECT t1.id, t2.album_id
FROM t1
LEFT OUTER JOIN t2
ON t1.data_id = t2.id
AND t1.event_type IN (1002, 1001, 1000)
WHERE
t1.event_type IN (1000, 1001, 1002, 1200, 1201, 1202, 1203)
GROUP BY t1.id
ORDER BY t1.id DESC
LIMIT 0, 20;
The MariaDB result looks like this:
+-----+----------+
| id | album_id |
+-----+----------+
| 623 | NULL |
| 622 | NULL |
| 621 | NULL |
| 620 | NULL |
| 619 | NULL |
| 618 | NULL |
| 617 | NULL |
| 616 | NULL |
| 615 | NULL |
| 614 | NULL |
| 613 | NULL |
| 612 | 194 |
| 611 | NULL |
| 610 | NULL |
| 609 | NULL |
| 608 | 193 |
| 607 | NULL |
| 606 | NULL |
| 605 | NULL |
| 604 | NULL |
+-----+----------+
And the Oracle MySQL result looks like this:
+-----+----------+
| id | album_id |
+-----+----------+
| 623 | NULL |
| 622 | NULL |
| 621 | NULL |
| 620 | NULL |
| 619 | NULL |
| 618 | NULL |
| 617 | NULL |
| 616 | 196 |<-- different
| 615 | NULL |
| 614 | NULL |
| 613 | NULL |
| 612 | 194 |
| 611 | 194 |<-- different
| 610 | NULL |
| 609 | NULL |
| 608 | 193 |
| 607 | 193 |<-- different
| 606 | NULL |
| 605 | NULL |
| 604 | NULL |
+-----+----------+
Additionally, when I EXPLAIN the queries, I can see that the two databases are interpreting the query differently. (See the "Extra" column)
MariaDB
+------+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 20 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | foo.t1.data_id | 1 | Using where |
+------+-------------+-------+--------+---------------+---------+---------+------------------------+------+-------------+
Oracle MySQL
+----+-------------+-------+--------+---------------+---------+---------+---------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------+------+-------------+
| 1 | SIMPLE | t1 | index | NULL | PRIMARY | 4 | NULL | 20 | Using where |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | foo.t1.data_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------+------+-------------+
I have found workarounds for this but would really like to know what is going on here. Does anyone have any ideas?
If you want to try it yourself, a dump of the data I used in this example can be found here.
Thanks.
edit: It have been pointed out in the comments that the query is invalid SQL in most databases but that MySQL allows it - but that the database is free to return any aggregated value from the GROUP BY. I'd just like to point out that what appears to be happening here is different, because the values are not ambiguous. There is only a single matching row, but that does not correspond to the value that MariaDB is returning.
SELECT t1.id, t2.album_id
FROM t1
JOIN t2
ON t1.data_id = t2.id
WHERE
t1.id = 616
;
+-----+----------+
| id | album_id |
+-----+----------+
| 616 | 196 |
+-----+----------+
1 row in set (0.00 sec)
It turns out this is actually a bug in MariaDB which can produce the wrong results when using group by and a left join on 2 conditions.
This query is using a so caled MySql extension to GROUP BY
See this link for details: http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
They clearly said that:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Taking the above into account, this behaviour is with accordance with specification.
Related
I have a table with the following properties :
+-----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | <null> | auto_increment |
| c2 | varchar(255) | YES | MUL | <null> | |
| c3 | int(11) | YES | | <null> | |
| c4 | varchar(255) | YES | | <null> | |
| c5 | varchar(255) | YES | | <null> | |
| c6 | int(11) | YES | MUL | <null> | |
| c7 | int(11) | YES | | <null> | |
| c8 | int(11) | YES | | <null> | |
| c9 | datetime | YES | | <null> | |
| c10 | datetime | YES | | <null> | |
| c11 | char(40) | YES | UNI | <null> | |
| c12 | tinyint(1) | NO | MUL | 1 | |
| c13 | text | YES | | <null> | |
| c14 | int(11) | YES | MUL | <null> | |
| c15 | varchar(64) | YES | MUL | <null> | |
+-----------------------+--------------+------+-----+---------+----------------+
show index from table_one; shows the following output :
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table_one | 0 | PRIMARY | 1 | id | A | 1621972 | NULL | NULL | | BTREE | | |
| table_one | 0 | c11 | 1 | c11 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 1 | c2 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 2 | c6 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 3 | c8 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 0 | c2_c6_c8_and_c14_unique | 4 | c14 | A | 1621972 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c12 | 1 | c12 | A | 1 | NULL | NULL | | BTREE | | |
| table_one | 1 | c6 | 1 | c6 | A | 20794 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c14 | 1 | c14 | A | 577 | NULL | NULL | YES | BTREE | | |
| table_one | 1 | c15 | 1 | c15 | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------------+------------+--------------------------------------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Now, when I run the following query, it takes around 5.8 seconds average :
select * from table_one
where c6 = 12345 and c14 = 12
limit 10 offset 0;
When I run explain on the above query, it says it has used index_merge:
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
| 1 | SIMPLE | table_one | index_merge | ....................... | c14, c6 | 5,5 | NULL | 9 | Using intersect(c14,c6);|
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+------+-------------------------+
But if I force the table to use index on c6 only, it returns results in 0.6 seconds average :
select * from table_one force index(c6) where c6 = 12345 and c14 = 12 limit 10 offset 0;
Why is MySQL using index_merge on its own and making it slow? I am aware that I don't have composite index on c6, c14, but they exist individually.
Also, the explain query for the force index shows more count in the rows accessed to perform the query, but is still 10x faster.
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| 1 | SIMPLE | table_one | ref | ....................... | c6 | 5 | const | 22388 | Using where; |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
This is causing our production to go down when someone hits the APIs at a high rate. MySQL just doesn't return any results for 59 seconds and the query gets timed out.
Plus, I can't really add composite indexes or change schema without downtime since we already have 2 million entries in it.
The current temporary fix is to add the force index(c6) to the query, but I am not really sure how scalable it would be or if we might end up having problems later on.
EDIT 1
Could the slowness be because of the order in which the index_merge is done?
More information regarding c6 and c14: Consider c6 as countries and c14 as states.
EDIT 2 : 2020-06-15 07:52:35 UTC :
I tried running the query by forcing the index of c14 and it turns out to be slower by roughly 3x:
select * from table_one force index(c14) where c6 = 12345 and c14 = 12 limit 10 offset 0;
The query took 2.1 seconds.
And the explain query gives the following output:
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
| 1 | SIMPLE | table_one | ref | ....................... | c14 | 5 | const | 730 | Using where; |
+----+-------------+---------------------+-------------+-----------------------------+---------+---------+-------------------------------+--------+-------------------------+
The rows to be accessed by the query is 30x less than when the index is forced on c6. i.e. here the rows are 730, whereas with the previous query, its 22k. What factors can make this index slow even with lesser rows to access?
Some more information if it helps in any manner :
mysql> select count(*) from table_one where c14 is null;
+----------+
| count(*) |
+----------+
| 7490 |
+----------+
1 row in set (0.02 sec)
mysql> select count(*) from table_one;
+----------+
| count(*) |
+----------+
| 1936278 |
+----------+
1 row in set (1.68 sec)
mysql> select count(*) from table_one where c6 is null;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
Consider
use (dbname);
ALTER TABLE table_one ADD INDEX table_one_c6_and_c14 (c6,c14);
remove the force request in your query.
And let us know how long it took to create the multi column index, please.
And the time to complete your query, now that an appropriate index is available.
select *
from table_one force index(c14)
where c6 = 12345
and c14 = 12
limit 10 offset 0;
Discussion:
You forced it to use INDEX(c14)
It will reach into that index at the first entry for c14 = 12.
Then it will scan, quite efficiently, across the rows with "12".
Each will be checked for c6 = 12345. This is not as efficient as if there were INDEX(c14, c6).
It might find 10 (cf limit) such rows quickly, or it might have to step over thousands of rows with the 'wrong' value of c6.
That is, the query time (for this query) depends a lot on the distribution of the data.
With INDEX(c14, c6), only 10 index rows need be touched -- much faster, and (relatively) consistent speed.
INDEX(c6, c14) would as fast as the same as INDEX(c14, c6).
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
This is driving me nuts.
I have the exact same database on two different machines, one Arch and one Debian. I'm running a query on the table below:
describe wellness;
+--------------+-------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------------------+----------------+
| wellness_id | int(11) | NO | PRI | NULL | auto_increment |
| people_id | int(11) | NO | MUL | NULL | |
| time_checked | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| check_type | varchar(1) | NO | MUL | NULL | |
| username | varchar(16) | NO | MUL | NULL | |
| return_date | timestamp | NO | MUL | 0000-00-00 00:00:00 | |
| seen_by | varchar(16) | YES | MUL | NULL | |
+--------------+-------------+------+-----+---------------------+----------------+
7 rows in set (0.00 sec)
and the query:
mysql> explain select * from wellness where wellness_id in ( select max(wellness_id) from wellness group by people_id) and time_checked < (now() - interval 48 hour);
+----+--------------------+----------+-------+------------------+---------------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------+-------+------------------+---------------+---------+------+-------+-------------+
| 1 | PRIMARY | wellness | ALL | time_checked_key | NULL | NULL | NULL | 62546 | Using where |
| 2 | DEPENDENT SUBQUERY | wellness | index | NULL | people_id_key | 4 | NULL | 231 | Using index |
+----+--------------------+----------+-------+------------------+---------------+---------+------+-------+-------------+
2 rows in set (0.00 sec)
On my Debian server, where I'm migrating the application that uses this database, the query takes 7 minutes to run. On my Arch server, it takes less than a second. The weird thing is, the EXPLAIN is different on my Arch box, where I grabbed the SQL data from in the first place:
MariaDB [redacted]> explain select * from wellness where wellness_id in ( select max(wellness_id) from wellness group by people_id) and time_checked < (now() - interval 48 hour);
+------+--------------+-------------+--------+--------------------------+---------------+---------+------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------+--------+--------------------------+---------------+---------+------------------------------+------+--------------------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 221 | |
| 1 | PRIMARY | wellness | eq_ref | PRIMARY,time_checked_key | PRIMARY | 4 | <subquery2>.max(wellness_id) | 1 | Using where |
| 2 | MATERIALIZED | wellness | range | NULL | people_id_key | 4 | NULL | 221 | Using index for group-by |
+------+--------------+-------------+--------+--------------------------+---------------+---------+------------------------------+------+--------------------------+
3 rows in set (0.00 sec)
Any thoughts on what I need to adjust to get this working properly? As far as I can tell the Apache and PHP settings are the exact same on both servers, so I feel this is likely a database issue.
compare the output from
show variables LIKE 'sql_mode';
and verify the settings in your my.cnf.
also if you dump the database from one server and import it into the other, the datafile are not identical. the internal order of the row can be different.
you can also run this for your table to update the optimizer statistic
SELECT * FROM YOUR_TABLE PROCEDURE ANALYSE();
I'm trying to optimize my query, however, MySQL seems to be utilizing non-optimal indexes on the query and I can't seem to figure out what is wrong. My query is as follows:
SELECT SQL_CALC_FOUND_ROWS deal_ID AS ID,dealTitle AS dealSaving,
storeName AS title,deal_URL AS dealURL,dealDisclaimer,
dealType, providerName,providerLogo AS providerIMG,createDate,
latitude AS lat,longitude AS lng,'local' AS type,businessType,
address1,city,dealOriginalPrice,NULL AS dealDiscountPercent,
dealPrice,scoringBase, smallImage AS smallimage,largeImage AS image,
storeURL AS storeAlias,
exp(-power(greatest(0,
abs(69.0*DEGREES(ACOS(0.82835377099147 *
COS(RADIANS(latitude)) * COS(RADIANS(-118.4-longitude)) +
0.56020534635454*SIN(RADIANS(latitude)))))-2),
2)/(5.7707801635559)) *
scoringBase * IF(submit_ID IN (18381),
IF(businessType = 1,1.3,1.2),IF(submit_ID IN (54727),1.19, 1)
) AS distance
FROM local_deals
WHERE latitude BETWEEN 33.345362318841 AND 34.794637681159
AND longitude BETWEEN -119.61862872928 AND -117.18137127072
AND state = 'CA'
AND country = 'US'
ORDER BY distance DESC
LIMIT 48 OFFSET 0;
Listing the indexes on the table reveals:
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| local_deals | 0 | PRIMARY | 1 | id | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 0 | unique_deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | deal_ID | 1 | deal_ID | A | 193893 | NULL | NULL | | BTREE | | |
| local_deals | 1 | store_ID | 1 | store_ID | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeOnline_ID | 1 | storeOnline_ID | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | storeChain_ID | 1 | storeChain_ID | A | 117 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | userProvider_ID | 1 | userProvider_ID | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | expirationDate | 1 | expirationDate | A | 3127 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | createDate | 1 | createDate | A | 96946 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | city | 1 | city | A | 17626 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | state | 1 | state | A | 138 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | zip | 1 | zip | A | 38778 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | country | 1 | country | A | 39 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | latitude | 1 | latitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | longitude | 1 | longitude | A | 193893 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | eventDate | 1 | eventDate | A | 4215 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | isNowDeal | 1 | isNowDeal | A | 3 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | businessType | 1 | businessType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | dealType | 1 | dealType | A | 5 | NULL | NULL | YES | BTREE | | |
| local_deals | 1 | submit_ID | 1 | submit_ID | A | 5 | NULL | NULL | YES | BTREE | | |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Running explain extended reveals:
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | ref | state,country,latitude,longitude | state | 35 | const | 52472 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
There are around 200k rows in the table. What is strange is that it is ignoring the latitude and longitude indexes as those should filter the table more. Running a query where I remove the "state" and "country" where commands reveals the following explain:
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| 1 | SIMPLE | local_deals | range | latitude,longitude | longitude | 5 | NULL | 30662 | 100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
This shows that the longitude index would better filter the table to 30,662 rows. Am I missing something here? How can I get MySQL to use all queries. Note that the table is InnoDB and I'm using MySQL 5.5.
The best index for your query is a composite index on (country, state, latitude, longitude) (country and state could be swapped). MySQL has good documentation on multi-column indexes, which is here.
Basically, latitude and longitude are not particularly selective individually. Unfortunately, the standard B-tree index only supports one inequality, and your query has two.
Actually, if you want GIS processing, then you should use a spatial extension to MySQL.
Depending on the size of your table, Gordon's suggested index may be "good enough". If you need to get even more performance, you need to go to a 2D partitioning technique, wherein you partition on latitude and arrange for the InnoDB PRIMARY KEY to begin with longitude. More details, and sample code, are available in my article.
A generic technique for problems like this is to build a subquery with these properties:
It returns no more than LIMIT rows; and those are all that you need.
There is a "covering index" for the columns involved, plus the PRIMARY KEY.
You are using InnoDB.
Something like
SELECT b. ..., a.distance
FROM local_deals b
JOIN (
SELECT id,
(...) AS distance,
FROM local_deals
WHERE latitude BETWEEN 33.34536 AND 34.79464
AND longitude BETWEEN -119.61863 AND -117.18137
AND state = 'CA'
AND country = 'US'
ORDER BY distance ASC
LIMIT 48 OFFSET 0
) AS a ON b.id = a.id
ORDER BY a.distance;
INDEX(country, state, latitude, longitude, id) -- `id` is the PK
-- country and state first (because of '='); id last.
Why this helps...
The index is "covering", so the lengthy scan (of a lot more than 48 rows) is done entirely in the index's BTree. This cuts down on I/O for huge tables.
All the other fields (b.*) are not hauled around through tmp tables, etc. Only 48 are sets of those fields are dealt with.
The 48 lookups by id are especially efficient in InnoDB due to the "clustered PK".
When working with "huge" tables, where I/O dominates, this technique can be counted thus:
1, or a small number of, blocks in the index are needed for the subquery. Note that the desired records are consecutive, or nearly so. (OK, if there are 30K to look through, it could be more than 100 blocks; hence my comment about shrinking the bounding box to start with.)
Then 48 (LIMIT) random fetches via id get the 48 rows.
Without the subquery, the bulky rows need to be fetched. And, depending on the index used, that could be up to 30K blocks fetched. That's orders of magnitude slower.
Also, 48 rows versus 30K rows will be written to a tmp table for sorting (ORDER BY).
I have two tables like this:
logbook:
+------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date_added | timestamp | NO | | CURRENT_TIMESTAMP | |
| username | varchar(16) | NO | | NULL | |
| entry | longtext | NO | MUL | NULL | |
+------------+-------------+------+-----+-------------------+----------------+
and
read_logbook:
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| logbook_id | int(11) | NO | | NULL | |
| username | varchar(16) | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
What I'd like to do is to select EVERYTHING from logbook, but only if the logbook.id AND logbook.username DO NOT appear in read_logbook.logbook_id and read_logbook.username, respectively.
I've experimented with some left-union-right queries, as well as some not in queries, and keep getting either thousands more results than expected, or no results at all.
Any thoughts?
EDIT - I'd run this for a specific username... so basically, if my username was jmd9qs, I'd want all results from logbook where read_logbook.id != logbook.id and read_logbook.username != jmd9qs
I hope that's clear enough...
EDIT - TEST DATA
The logbook:
mysql> select id, date_added, username from logbook order by id desc limit 10;
+----+---------------------+-----------+
| id | date_added | username |
+----+---------------------+-----------+
| 94 | 2013-09-03 14:54:25 | tluce |
| 93 | 2013-09-03 13:12:02 | tluce |
| 92 | 2013-09-03 11:42:14 | tluce |
| 91 | 2013-09-03 08:28:20 | jmd9qs |
| 90 | 2013-09-03 07:13:36 | jmd9qs |
| 89 | 2013-09-03 07:05:19 | jmd9qs |
| 88 | 2013-09-03 06:57:47 | jsawtelle |
| 87 | 2013-09-03 06:15:42 | jsawtelle |
| 86 | 2013-09-03 05:21:14 | jsawtelle |
| 85 | 2013-09-03 03:52:25 | jsawtelle |
+----+---------------------+-----------+
Logbook entries that have been "marked" as read:
mysql> select logbook_id, username from read_logbook group by logbook_id desc limit 10;
+------------+----------+
| logbook_id | username |
+------------+----------+
| 94 | jmd9qs |
| 93 | jmd9qs |
| 92 | jmd9qs |
| 91 | jmd9qs |
| 90 | jmd9qs |
| 89 | jmd9qs |
| 88 | jmd9qs |
| 87 | jmd9qs |
| 86 | jmd9qs |
| 85 | jmd9qs |
+------------+----------+
10 rows in set (0.00 sec)
So when I run the query for jmd9qs, nothing should come up because in read_logbook, his username and the logbook id show up.
CLARIFICATION -
So in the logbook, the username is just the person who wrote logbook.entry. In read_logbook, username is the person who READ that entry. So if I'm logged in as jmd9qs, and I try to view the logbook, since I've read everything no logbook.entry's should come up. But for another user who HASN'T read that specific entry, the entry WOULD show up.
SELECT *
FROM logbook
WHERE logbook.username NOT IN
(SELECT read_logbook.username
FROM read_logbook
WHERE read_logbook.username='jmd9qs')
AND logbook.id NOT IN
(SELECT read_logbook.logbook_id
FROM read_logbook);
If I understand your needs, you should try
SELECT t1.* FROM logbook t1
LEFT JOIN read_logbook t2
ON t1.id = t2.id AND t1.username = t2.username
WHERE t2.id IS NULL AND t2.username IS NULL
I'm using the following query to extract the frequent short values from a mediumblob column :
select bytes, count(*) as n
from pr_value
where bytes is not null && length(bytes)<11 and variable_id=5783
group by bytes order by n desc limit 10;
The problem I have is that this query takes too much time (about 10 seconds with less than 1 million records) :
mysql> select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+-------+----+
| bytes | n |
+-------+----+
| 32 | 21 |
| 27 | 20 |
| 52 | 20 |
| 23 | 19 |
| 25 | 19 |
| 26 | 19 |
| 28 | 19 |
| 29 | 19 |
| 30 | 19 |
| 31 | 19 |
+-------+----+
The table is as follows (unrelated columns not shown) :
mysql> describe pr_value;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| product_id | int(11) | NO | PRI | NULL | |
| variable_id | int(11) | NO | PRI | NULL | |
| author_id | int(11) | NO | PRI | NULL | |
| bytes | mediumblob | YES | MUL | NULL | |
+-------------+---------------+------+-----+---------+-------+
The type is mediumblob because most values are big. Less than 10% are short as the ones I'm looking for with this specific query.
I have the following indexes :
mysql> show index from pr_value;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pr_value | 0 | PRIMARY | 1 | product_id | A | 8961 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 3 | author_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 1 | bytes | 1 | bytes | A | 842402 | 10 | NULL | YES | BTREE | | |
| pr_value | 1 | bytes | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
MySQL explains my query like this :
mysql> explain select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | pr_value | range | bytes | bytes | 13 | NULL | 421201 | Using where; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
Note that the condition on the length of the bytes column can be removed without changing the duration.
What can I do to make this query fast ?
Of course i'd prefer not to have to add columns.
your index on (bytes, variable_id) is not very smart. If you always have a variable_id clause in your queries you should add index with variable_id first :
(variable_id, bytes)
It depend on how discriminant variable_id is. But it sould help.
Another tip is to add a new indexed column with the result of "length(bytes)<11" :
update pr_value set small = length(bytes)<11;
Add a new index with (small,variable_id).
Why are you GROUP BY'ing the blob column? I'd imagine that's the bottleneck as then the Query actually does a compare against all blob columns to each other. Is it because you want unique values for the BLOB? I think the DISTINCT keyword might perform better than the GROUP BY.