I have a MariaDB table that looks like this:
+--------+--------+--------+---------------------+
| realm | key2 | userId | date |
+--------+--------+--------+---------------------+
| AB3 | 123 | 1 | 2017-08-04 17:30:00 |
| AB3 | 124 | 1 | 2017-08-04 17:30:00 |
| AB3 | 125 | 1 | 2017-08-04 17:30:00 |
| XY7 | 97 | 2 | 2017-08-04 17:35:00 |
| XY7 | 98 | 2 | 2017-08-04 17:35:00 |
| XY7 | 99 | 2 | 2017-08-04 17:35:00 |
| AB3 | 110 | 3 | 2017-08-04 17:40:00 |
| AB3 | 111 | 3 | 2017-08-04 17:40:00 |
+--------+--------+--------+---------------------+
PRIMARY_KEY (realm, key2)
INDEX (realm, userId)
INDEX (date)
This table operates as some sort of queue for processing user actions. Basically a server always takes the oldest data from this table, processes it and deletes it from this table. Each realm has its own server processing this queue.
Now I want to find out a user's position in queue for that realm. So, using the example above, when I request the position for userId 3 in realm 'AB3', I want to get the result 2 because only one other user (userId 1) is to be processed earlier for realm AB3.
(The row key2 might be irrelevant in this example. I only included it because it is part of the primary key which may make it relevant for finding a good solution)
Here is the SQL schema:
CREATE TABLE `queue` (
`realm` varchar(5) NOT NULL,
`key2` int(10) UNSIGNED NOT NULL,
`userId` int(10) UNSIGNED NOT NULL,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `queue` (`realm`, `key2`, `userId`, `date`) VALUES
('AB3', 110, 3, '2017-08-04 17:40:00'),
('AB3', 111, 3, '2017-08-04 17:40:00'),
('AB3', 123, 1, '2017-08-04 17:30:00'),
('AB3', 124, 1, '2017-08-04 17:30:00'),
('AB3', 125, 1, '2017-08-04 17:30:00'),
('XY7', 97, 2, '2017-08-04 17:35:00'),
('XY7', 98, 2, '2017-08-04 17:35:00'),
('XY7', 99, 2, '2017-08-04 17:35:00');
ALTER TABLE `queue`
ADD PRIMARY KEY (`realm`,`key2`),
ADD KEY `ru` (`realm`,`userId`) USING BTREE,
ADD KEY `date` (`date`);
I came up with this query that seems to work but is pretty slow (~3 seconds) on a table with 10,000,000 entries:
SELECT (COUNT(DISTINCT `realm`, `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT `date`
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
GROUP BY `realm`, `userId`
)
SQL Fiddle: http://sqlfiddle.com/#!9/fb04fd/9/0
EXPLAIN EXTENDED of this query:
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
| 1 | PRIMARY | queue | ref | PRIMARY,ru,date | PRIMARY | 767 | const | 5266123 | 100.00 | Using where | |
| 2 | SUBQUERY | queue | index_merge | PRIMARY,ru | ru,PRIMARY | 771,767 | | 496 | 75.00 | Using intersect(ru,PRIMARY); Using where | |
+----+-------------+-------+-------------+-----------------+------------+---------+-------+---------+----------+------------------------------------------+--+
Do you have any ideas how I can optimize this query to run faster on a table with like 10,000,000 entries?
Other queries that are run on this table:
SELECT `m`.*
FROM `queue` `m`
JOIN (
SELECT `m`.*
FROM `queue` `m`
WHERE `m`.`realm` = ?
ORDER BY `date` ASC
LIMIT 1
) `mm` ON `m`.`realm` = `mm`.`realm` AND `m`.`userId` = `mm`.`userId`;
and
DELETE FROM `queue` WHERE `realm` = ? AND `userId` = ?;
How could I optimize my indexes?
I feel like something wrong with the table DDL. Anyway, i would have rewriten your query like :
SELECT (COUNT(DISTINCT `userId`)+1) `position`
FROM `queue`
WHERE `realm` = 'AB3'
AND `date` < (
SELECT min(`date`)
FROM `queue`
WHERE `realm` = 'AB3' AND `userId` = 3
)
and perhaps have a really specific index for this query like :
index (realm, date)
You can try the sheety index
index (realm, date, userId)
but not even sure it will be faster that the previous one.
Related
I have a mysql table that holds about 8 Million Records and I need to run some analytics on it to get averages as shown in below table definition and query. The result contains hourly analytics (avg of a parameter value) for the last 1 year data.
MySQL Server Version : 8.0.15
Table:
create table `temp_data` (
`dateLogged` datetime NOT NULL,
`paramName` varchar(30) NOT NULL,
`paramValue` float NOT NULL,
`sensorId` varchar(20) NOT NULL,
`locationCode` varchar(30) NOT NULL,
PRIMARY KEY (`sensorId`,`paramName`,`dateLogged`),
KEY `summary` (`locationCode`,`paramName`,`dateLogged`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
Query: The below query transposes row based parameters into columns and while doing so computes the average of param values
SELECT dateLogged,
ROUND(avg( ROUND(IF(paramName = 'temp1', paramValue, NULL),2) ),2) AS T1,
ROUND(avg( ROUND(IF(paramName = 'temp2', paramValue, NULL),2) ),2) AS T2,
ROUND(avg( ROUND(IF(paramName = 'temp3', paramValue, NULL),2) ),2) AS T3,
ROUND(avg( ROUND(IF(paramName = 'temp4', paramValue, NULL),2) ),2) as T4
FROM temp_data where locationCode='A123' and paramName in ('temp1','temp2','temp3','temp4')
group by dateLogged order by dateLogged;
Result:
+---------------------+--------+---------+-------+-------+
| date | T1 | T2 | T3 | T4 |
+---------------------+--------+---------+-------+-------+
| 2018-12-01 00:00:00 | 95.46 | 99.12 | 96.44 | 95.86 |
| 2018-12-01 01:00:00 | 100.38 | 101.09 | 99.56 | 99.70 |
| 2018-12-01 02:00:00 | 101.41 | 102.08 | 99.47 | 99.88 |
| 2018-12-01 03:00:00 | 98.79 | 100.47 | 98.59 | 99.75 |
| 2018-12-01 04:00:00 | 98.23 | 100.58 | 98.38 | 98.93 |
| 2018-12-01 05:00:00 | 101.03 | 101.80 | 99.37 | 99.88 |
... ... ... ... ...
+---------------------+--------+---------+---------+-----+
Problem:
Now there are over 8 Million records in the table and the query takes approximately 35 to 40 seconds to execute.
Looking for suggestions on how to improve the query performance and hopefully, bring it down to under 10 seconds.
Note:
The table has data for up to 1 year and data beyond that is archived and deleted
Result of describe:
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
| 1 | SIMPLE | temp_data | NULL | ref | PRIMARY,summary | summary | 53 | const | 3524800 | 50.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+-----------+------------+------+-----------------+---------+---------+-------+---------+----------+--------------------------------------------------------+
As temp1 -> temp4 are fixed we can use generated columns to index this:
alter table temp_data add p1234 bool as (paramName IN ('temp1','temp2','temp3','temp4')) NOT NULL,
ADD KEY s1234 (locationCode, p1234, paramName, paramValue, dateLogged)
Then change the query too:
SELECT dateLogged, paramName,
ROUND(avg( ROUND(paramValue,2) ),2)
FROM temp_data where locationCode='A123' and p1234
group by dateLogged, paramName
order by dateLogged, paramName;
Handle the T1 -> T4 paramName formatting in the application code
I want to know, how can I define a limited range (max and min) for storing values in table?
I want this range: [min: -99 , max: 999]
Also here is some example:
CREATE TABLE MyTable(
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
code INT(11) NOT NULL
)
INSERT INTO MyTable (id,code)
VALUES (1, 42),
VALUES (2, -332),
VALUES (3, -83),
VALUES (4, 44324),
VALUES (5, 0),
VALUES (6, 999);
So, I want this output:
// MyTable
+----+------+
| id | code |
+----+------+
| 1 | 42 |
| 2 | -99 |
| 3 | -83 |
| 4 | 999 |
| 5 | 0 |
| 6 | 999 |
+----+------+
Well, How can I restrict my table for just accepting valid (in range) values or before storing, converts them to max/min value?
Or, If what I considered in above is not possible, Then how can I show the result as valid in range?
// MyTable
+----+------+
| id | code |
+----+------+
| 1 | 42 |
| 2 | -332 |
| 3 | -83 |
| 4 | 44324|
| 5 | 0 |
| 6 | 999 |
+----+------+
I want something like this query:
SELECT id, IF(LENGTH(code) > 999, '999', code)) AS FilteredCode FROM MyTable
// also I can not implement to filter negative values (-99) in my query
You can use the CHECK constraint like this:
CONSTRAINT code_fk CHECK (code BETWEEN -99 AND 999)
So your table will be created as:
CREATE TABLE MyTable(
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
code INT(11) NOT NULL,
CONSTRAINT code_fk CHECK (code BETWEEN -99 AND 999)
)
See the SQL FIDDLE DEMO
Using check constraint to do this below like:
CREATE TABLE MyTable(
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
code INT(11) NOT NULL,
CONSTRAINT code_fk CHECK (code BETWEEN -99 AND 999)
)
I have the following two MySQL tables which I need to join:
CREATE TABLE `tbl_L` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tbl_R` (
`datetime` datetime NOT NULL,
`value` decimal(14,8) DEFAULT NULL,
`series_id` int(11) NOT NULL,
PRIMARY KEY (`series_id`,`datetime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I need to select all the dates and values from tbl_L, but also the values in tbl_R that have the same datetime as an entry in tbl_L. A trivial join, like so:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
WHERE
tbl_L.series_id = 1 AND tbl_R.series_id = 2 ORDER BY tbl_L.datetime ASC
Won't work because it will only return datetime that are both in tbl_L and tbl_R (because the right table is mentioned in the WHERE clause).
Modifying the query to look like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
AND tbl_L.series_id = 1
ORDER BY tbl_L.datetime ASC;
Significantly slows it down (from a few milliseconds to a few long seconds).
Edit: and also doesn't actually work. I will clarify what I need to achieve:
Assume the following data in the tables:
mysql> SELECT * FROM tbl_R;
+---------------------+------------+-----------+
| datetime | value | series_id |
+---------------------+------------+-----------+
| 2013-02-20 19:21:00 | 5.87000000 | 2 |
| 2013-02-20 19:22:00 | 5.90000000 | 2 |
| 2013-02-20 19:23:00 | 5.80000000 | 2 |
| 2013-02-20 19:25:00 | 5.65000000 | 2 |
+---------------------+------------+-----------+
4 rows in set (0.00 sec)
mysql> SELECT * FROM tbl_L;
+---------------------+-------------+-----------+
| datetime | value | series_id |
+---------------------+-------------+-----------+
| 2013-02-20 19:21:00 | 13.16000000 | 1 |
| 2013-02-20 19:23:00 | 13.22000000 | 1 |
| 2013-02-20 19:24:00 | 13.14000000 | 1 |
| 2013-02-20 19:25:00 | 13.04000000 | 1 |
+---------------------+-------------+-----------+
4 rows in set (0.00 sec)
Again, I need all entries in tbl_L joined with the entries in tbl_R that match in terms of datetime, otherwise NULL.
My output should look like this:
+---------------------+-------------+-------------+
| datetime | val_L | val_R |
+---------------------+-------------+-------------+
| 2013-02-20 19:21:00 | 13.16000000 | 5.870000000 |
| 2013-02-20 19:23:00 | 13.22000000 | 5.800000000 |
| 2013-02-20 19:24:00 | 13.14000000 | NULL |
| 2013-02-20 19:25:00 | 13.04000000 | 5.650000000 |
+---------------------+-------------+-------------+
Thanks again!
You can get the data you want by moving only the condition for tbl_R into the join's ON clause like this:
SELECT tbl_L.datetime AS datetime, tbl_L.value AS val_L, tbl_R.value AS val_R
FROM tbl_L
LEFT JOIN tbl_R
ON tbl_L.datetime = tbl_R.datetime
AND tbl_R.series_id = 2
WHERE
tbl_L.series_id = 1 ORDER BY tbl_L.datetime ASC
Also, there is no index for the query to use on tbl_L. Adding an index on tbl_L.series_id will help the query's performance.
I'm using MySQL5 and I currently have a query that gets me the info I need but I feel like it could be improved in terms of performance.
Here's the query I built (roughly following this guide) :
SELECT d.*, dc.date_change, dc.cwd, h.name as hub
FROM livedata_dom AS d
LEFT JOIN ( SELECT dc1.*
FROM livedata_domcabling as dc1
LEFT JOIN livedata_domcabling AS dc2
ON dc1.dom_id = dc2.dom_id AND dc1.date_change < dc2.date_change
WHERE dc2.dom_id IS NULL
ORDER BY dc1.date_change desc) AS dc ON (d.id = dc.dom_id)
LEFT JOIN livedata_hub AS h ON (d.id = dc.dom_id AND dc.hub_id = h.id)
WHERE d.cluster = 'localhost'
GROUP BY d.id;
EDIT: Using ORDER BY + GROUP BY to avoid getting multiple dom entries in case 'domcabling' has an entry with null date_change and another one with a date for the same 'dom'.
I feel like I'm killing a mouse with a bazooka. This query takes more than 3 seconds with only about 5k entries in 'livedata_dom' and 'livedata_domcabling'. Also, EXPLAIN tells me that 2 filesorts are used:
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
| 1 | PRIMARY | d | ALL | NULL | NULL | NULL | NULL | 3 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | h | eq_ref | PRIMARY | PRIMARY | 4 | dc.hub_id | 1 | |
| 2 | DERIVED | dc1 | ALL | NULL | NULL | NULL | NULL | 4 | Using filesort |
| 2 | DERIVED | dc2 | ref | livedata_domcabling_dc592d9 | livedata_domcabling_dc592d9 | 4 | live.dc1.dom_id | 2 | Using where; Not exists |
+----+-------------+------------+--------+-----------------------------+-----------------------------+---------+-----------------+------+----------------------------------------------+
How could I change this query to make it more efficient?
Using the dummy data (provided below), this is the expected result:
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
| id | mb_id | prod_id | string | position | name | cluster | date_change | cwd | hub |
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
| 249 | 47 | 47 | 47 | 47 | SuperDOM47 | localhost | NULL | NULL | NULL |
| 250 | 48 | 48 | 48 | 48 | SuperDOM48 | localhost | 2014-04-16 05:23:00 | 32A | megahub01 |
| 251 | 49 | 49 | 49 | 49 | SuperDOM49 | localhost | NULL | 22B | megahub01 |
+-----+-------+---------+--------+----------+------------+-----------+---------------------+------+-----------+
Basically I need 1 row for every 'dom' entry, with
the 'domcabling' record with the highest date_change
if record does not exist, I need null fields
ONE entry may have a null date_change field per dom (null datetime field considered older than any other datetime)
the name of the 'hub', when a 'domcabling' entry is found, null otherwise
CREATE TABLE + dummy INSERT for the 3 tables:
livedata_dom (about 5000 entries)
CREATE TABLE `livedata_dom` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`mb_id` varchar(12) NOT NULL,
`prod_id` varchar(8) NOT NULL,
`string` int(11) NOT NULL,
`position` int(11) NOT NULL,
`name` varchar(30) NOT NULL,
`cluster` varchar(9) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `mb_id` (`mb_id`),
UNIQUE KEY `prod_id` (`prod_id`),
UNIQUE KEY `name` (`name`),
UNIQUE KEY `livedata_domgood_string_7bff074107b0e5a0_uniq` (`string`,`position`,`cluster`)
) ENGINE=InnoDB AUTO_INCREMENT=5485 DEFAULT CHARSET=latin1;
INSERT INTO `livedata_dom` VALUES (251,'49','49',49,49,'SuperDOM49','localhost'),(250,'48','48',48,48,'SuperDOM48','localhost'),(249,'47','47',47,47,'SuperDOM47','localhost');
livedata_domcabling (about 10000 entries and growing slowly)
CREATE TABLE `livedata_domcabling` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`dom_id` int(11) NOT NULL,
`hub_id` int(11) NOT NULL,
`cwd` varchar(3) NOT NULL,
`date_change` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `livedata_domcabling_dc592d9` (`dom_id`),
KEY `livedata_domcabling_4366aa6e` (`hub_id`),
CONSTRAINT `dom_id_refs_id_73e89ce0c50bf0a6` FOREIGN KEY (`dom_id`) REFERENCES `livedata_dom` (`id`),
CONSTRAINT `hub_id_refs_id_179c89d8bfd74cdf` FOREIGN KEY (`hub_id`) REFERENCES `livedata_hub` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5397 DEFAULT CHARSET=latin1;
INSERT INTO `livedata_domcabling` VALUES (1,251,1,'22B',NULL),(2,250,1,'33A',NULL),(6,250,1,'32A','2014-04-16 05:23:00'),(5,250,1,'22B','2013-05-22 00:00:00');
livedata_hub (about 100 entries)
CREATE TABLE `livedata_hub` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(14) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=98 DEFAULT CHARSET=latin;
INSERT INTO `livedata_hub` VALUES (1,'megahub01');
Try this rewriting (tested in SQL-Fiddle:
SELECT
d.*, dc.date_change, dc.cwd, h.name as hub
FROM
livedata_dom AS d
LEFT JOIN
livedata_domcabling as dc
ON dc.id =
( SELECT id
FROM livedata_domcabling AS dcc
WHERE dcc.dom_id = d.id
ORDER BY date_change DESC
LIMIT 1
)
LEFT JOIN
livedata_hub AS h
ON dc.hub_id = h.id
WHERE
d.cluster = 'localhost' ;
And index on (dom_id, date_change) would help efficiency.
I'm not sure about the selectivity of d.cluster = 'localhost' (how many rows of the livedata_dom table match this condiiton?) but adding an index on (cluster) might help as well.
set #rn := 0, #dom_id := 0;
select d.*, dc.date_change, dc.cwd, h.name as hub
from
livedata_dom d
left join (
select
hub_id, date_change, cwd, dom_id,
if(#dom_id = dom_id, #rn := #rn + 1, #rn := 1) as rn,
#dom_id := dom_id as dm_id
from
livedata_domcabling
order by dom_id, date_change desc
) dc on d.id = dc.dom_id
left join
livedata_hub h on h.id = dc.hub_id
where rn = 1 or rn is null
order by dom_id
The data you posted does not have the dom_id 249. And the #250 has one null date so it comes first. So your result does not reflect what I understand form your question.
I currently try to optimize a MySQL query which runs a little slow on tables with 10,000+ rows.
CREATE TABLE IF NOT EXISTS `person` (
`_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`firstname` varchar(255) NOT NULL,
`lastname` varchar(255) NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `person_cars` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`_oid` char(8) NOT NULL,
`idx` varchar(255) NOT NULL,
`val` blob NOT NULL,
PRIMARY KEY (`_id`),
KEY `_oid` (`_oid`),
KEY `idx` (`idx`),
KEY `val` (`val`(64))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
# Insert some 10000+ rows…
INSERT INTO `person` (`_oid`,`firstname`,`lastname`)
VALUES
('1', 'John', 'Doe'),
('2', 'Jack', 'Black'),
('3', 'Jim', 'Kirk'),
('4', 'Forrest', 'Gump');
INSERT INTO `person_cars` (`_oid`,`idx`,`val`)
VALUES
('1', '0', 'BMW'),
('1', '1', 'PORSCHE'),
('2', '0', 'BMW'),
('3', '1', 'MERCEDES'),
('3', '0', 'TOYOTA'),
('3', '1', 'NISSAN'),
('4', '0', 'OLDMOBILE');
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`,
`_person_cars`.`cars[0]`,
`_person_cars`.`cars[1]`
FROM `person` `_person`
LEFT JOIN (
SELECT `_person`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person` `_person`
JOIN `person_cars` `_person_cars` ON `_person`.`_oid` = `_person_cars`.`_oid`
GROUP BY `_person`.`_oid`
) `_person_cars` ON `_person_cars`.`_oid` = `_person`.`_oid`
WHERE `cars[0]` = 'BMW' OR `cars[1]` = 'BMW';
The above SELECT query takes ~170ms on my virtual machine running MySQL 5.1.53. with approx. 10,000 rows in each of the two tables.
When I EXPLAIN the above query, results differ depending on how many rows are in each table:
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 1 | PRIMARY | _person | ALL | _oid | NULL | NULL | NULL | 4 | Using where; Using join buffer |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 7 | Using temporary; Using filesort |
| 2 | DERIVED | _person | index | _oid | _oid | 24 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+--------------+-------+---------------+------+---------+------+------+---------------------------------------------+
Some 10,000 rows give the following result:
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6613 | Using where |
| 1 | PRIMARY | _person | ref | _oid | _oid | 24 | _person_cars._oid | 10 | |
| 2 | DERIVED | _person_cars | ALL | _oid | NULL | NULL | NULL | 9913 | Using temporary; Using filesort |
| 2 | DERIVED | _person | ref | _oid | _oid | 24 | test._person_cars._oid | 10 | Using index |
+----+-------------+--------------+------+---------------+------+---------+------------------------+------+---------------------------------+
Things get worse when I leave out the WHERE clause or when I LEFT JOIN another table similar to person_cars.
Does anyone have an idea how to optimize the SELECT query to make things a little faster?
It's slow because this will force three full table scans on persons that then get joined together:
LEFT JOIN (
...
GROUP BY `_person`.`_oid` -- the group by here
) `_person_cars` ...
WHERE ... -- and the where clauses on _person_cars.
Considering the where clauses the left join is really an inner join, for one. And you could shove the conditions before the join with persons actually occurs. That join is also needlessly applied twice.
This will make it faster, but if you've an order by/limit clause it will still lead to a full table scan on persons (i.e. still not good) because of the group by in the subquery:
JOIN (
SELECT `_person_cars`.`_oid`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) AS `cars[0]`,
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) AS `cars[1]`
FROM `person_cars`
GROUP BY `_person_cars`.`_oid`
HAVING IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=0, `_person_cars`.`val`, NULL)), NULL) = 'BMW' OR
IFNULL(GROUP_CONCAT(IF(`_person_cars`.`idx`=1, `_person_cars`.`val`, NULL)), NULL) = 'BMW'
) `_person_cars` ... -- smaller number of rows
If you apply an order by/limit, you'll get better results with two queries, i.e.:
SELECT `_person`.`_oid`,
`_person`.`firstname`,
`_person`.`lastname`
FROM `_person`
JOIN `_person_cars`
ON `_person_cars`.`_oid` = `_person`.`_oid`
AND `_person_cars`.`val` = 'BMW'
GROUP BY -- pre-sort the result before grouping, so as to not do the work twice
`_person`.`lastname`,
`_person`.`firstname`,
-- eliminate users with multiple BMWs
`_person`.`_oid`
ORDER BY `_person`.`lastname`,
`_person`.`firstname`,
`_person`.`_oid`
LIMIT 10
And then select the cars with an IN () clause using the resulting ids.
Oh, and your vals column probably should be a varchar.
Check This
SELECT
p._oid AS oid,
p.firstname AS firstname,
p.lastname AS lastname,
pc.val AS car1,
pc2.val AS car2
FROM person AS p
LEFT JOIN person_cars AS pc
ON pc._oid = p._oid
AND pc.idx = 0
LEFT JOIN person_cars AS pc2
ON pc2._oid = p._oid
AND pc2.idx = 1
WHERE pc.val = 'BMW'
OR pc2.val = 'BWM'