How to select column from 2nd level depth subquery? - mysql

I have found a few answers how to select a column from subquery, but it doesn't seem to be working if there is a subquery inside subquery.
My table relationships are as follow (simplified, with example):
Plans
| id | role_id
| --------------
| 4 | 2
Roles
| id | name
----------------
| 2 | Operator
Assignments
| role_name | user_id
| ---------------------
| Operator | 12
Table Plans have foreign key role_id to table Roles column id. And Table Roles have foreign key id to table Assignments column role_name.
I'm writing a query that would select Plans but instead of role_id I would show user_id from table Assignments.
I'm pretty close with my current work but I'm stuck at final step:
SELECT assigned_user, plan.id, plan.role_id
FROM `Plans` AS plan
LEFT JOIN (
SELECT roleTable.id, roleTable.name
FROM `Roles` AS roleTable
INNER JOIN (
SELECT base_assign.user_id AS assigned_user, base_assign.role_name
FROM `Assignments` AS base_assign) AS roleAssign
ON roleTable.name = roleAssign.role_name
) AS role
ON (plan.assignee_role = role.id)
Column assigned_user on SELECT part (line 1) is not found but I tried by adding alias inside of subquery SELECT. How I can select base_assign.user_id? Note that I (most likely) cannot modify database structure as it was given like this in the first place.

You are creating subqueries too excessively and you don't even need subqueries for this. Subqueries can be materialized by optimzer in MySQL which can cause performance issues.
See https://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html
Try this using simple left joins:
select a.user_id as assigned_user,
p.id,
p.role_id
from `Plans` p
left join `Roles` r on p.assignee_role = r.id
left join `Assignments` a on r.name = a.role_name;
You query wasn't working because you were not selecting the column while joining.
Your somewhat "fixed" code:
select assigned_user,
plan.id,
plan.role_id
from `Plans` as plan
left join (
select roleAssign.assigned_user roleTable.id,
roleTable.name
from `Roles` as roleTable
inner join (
select base_assign.user_id as assigned_user,
base_assign.role_name
from `Assignments` as base_assign
) as roleAssign on roleTable.name = roleAssign.role_name
) as role on (plan.assignee_role = role.id)

You don't need subqueries for this. It is a little confusing what you are trying to do. The following will return all plans, even those with no roles and no assignments:
SELECT b.user_id as assigned_user, p.id, p.role_id
FROM Plans p LEFT JOIN
Roles r
ON p.role_id = r.id LEFT JOIN
Assignments a
ON a.name = r.name;
Subqueries are bad for two reasons. In MySQL, the are materialized. That means extra overhead for reading and writing the tables. Plus, you lose any indexes on them.
In your case, the subqueries just add layer upon layer of naming, and the names really do not help make your query more understandable.

Related

Understanding use of multiple SUMs with LEFT JOINS in mysql

Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.
You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |
When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;
i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.

SQL join, group and concat to get matching data and also view (null)

I'm having trouble getting the desired output.
This is the SQLfiddle I put together of my scenario:
http://sqlfiddle.com/#!9/9beffe/111
The output I get is as seen in the SQLfiddle.
The desired output should be:
user_name participants_project participants_user
Username 1 1 1
Username 2 1 (null)
Username 3 1 3
Username 4 1 (null)
So I want it to always show all 4 users, that is matching the project specified, and the participants_user should be defined if the user/project exists in the participants db and if not defined it should get the user info with null data for project, participants tables columns.
How do I achieve it?
All you need is below -
SELECT DISTINCT u.user_name, pp.participants_project, pp.participants_user
FROM `users` u
LEFT JOIN `participants` pp ON u.user_id = pp.participants_user
AND pp.participants_project = 1
ORDER BY u.user_name
Here is the fiddle.
With the fact that Username 2 and Username 4 is not having any project, So column participants_project would also have null values.
I think what you need is this:
SELECT `users`.`user_name`, GROUP_CONCAT(`participants_project`), `participants_user` from `users`
LEFT JOIN `participants` on users.user_id = `participants`.`participants_user`
LEFT JOIN `projects` on `participants`.`participants_project` =`projects`.`project_id`
GROUP BY `users`.`user_id`
In the SQL code snippet, I added JOIN table for the users table and went through the participants table this will always list the users whether they have project or not, and in case you want to omit the users if they are not participating in a project just change 'LEFT JOIN' to be 'RIGHT JOIN' like this:
SELECT `users`.`user_name`, GROUP_CONCAT(`participants_project`), `participants_user` from `users`
LEFT JOIN `participants` on users.user_id = `participants`.`participants_user`
LEFT JOIN `projects` on `participants`.`participants_project` =`projects`.`project_id` GROUP BY `users`.`user_id`
I suggest that you include Primary key and foreign key in the schema as it speeds up the query execution time and also you will have more consistent data persistence in you database tables.

How to optimize execution plan for query with multiple outer joins to huge tables, group by and order by clauses?

I have the following database (simplified):
CREATE TABLE `tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`manufacture` varchar(100) NOT NULL,
`date_last_activity` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `manufacture` (`manufacture`),
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`),
KEY `date_last_activity` (`date_last_activity`),
) ENGINE=InnoDB AUTO_INCREMENT=401353 DEFAULT CHARSET=utf8
CREATE TABLE `tracking_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tracking_id` int(11) NOT NULL,
`tracking_object_id` varchar(100) NOT NULL,
`tracking_type` int(11) NOT NULL COMMENT 'Its used to specify the type of each item, e.g. car, bike, etc',
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `tracking_id` (`tracking_id`),
KEY `tracking_object_id` (`tracking_object_id`),
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1299995 DEFAULT CHARSET=utf8
CREATE TABLE `cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`car_id` varchar(255) NOT NULL COMMENT 'It must be VARCHAR, because the data is coming from external source.',
`manufacture` varchar(255) NOT NULL,
`car_text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`date_order` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`deleted` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `car_id` (`car_id`),
KEY `sort_field` (`date_order`)
) ENGINE=InnoDB AUTO_INCREMENT=150000025 DEFAULT CHARSET=utf8
This is my "problematic" query, that runs extremely slow.
SELECT sql_no_cache `t`.*,
count(`t`.`id`) AS `cnt_filtered_items`
FROM `tracking` AS `t`
INNER JOIN `tracking_items` AS `ti` ON (`ti`.`tracking_id` = `t`.`id`)
LEFT JOIN `cars` AS `c` ON (`c`.`car_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 1)
LEFT JOIN `bikes` AS `b` ON (`b`.`bike_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 2)
LEFT JOIN `trucks` AS `tr` ON (`tr`.`truck_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 3)
WHERE (`t`.`manufacture` IN('1256703406078',
'9600048390403',
'1533405067830'))
AND (`c`.`car_text` LIKE '%europe%'
OR `b`.`bike_text` LIKE '%europe%'
OR `tr`.`truck_text` LIKE '%europe%')
GROUP BY `t`.`id`
ORDER BY `t`.`date_last_activity` ASC,
`t`.`id` ASC
LIMIT 15
This is the result of EXPLAIN for above query:
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | t | index | PRIMARY,manufacture,manufacture_date_last_activity,date_last_activity | PRIMARY | 4 | NULL | 400,000 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ti | ref | tracking_id,tracking_object_id,tracking_id_tracking_object_id | tracking_id | 4 | table.t.id | 1 | NULL |
| 1 | SIMPLE | c | eq_ref | car_id | car_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | b | eq_ref | bike_id | bike_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | t | eq_ref | truck_id | truck_id | 767 | table.ti.tracking_object_id | 1 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
What is the problem this query is trying to solve?
Basically, I need to find all records in tracking table that may be associated with records in tracking_items (1:n) where each record in tracking_items may be associated with record in left joined tables. The filtering criteria is crucial part in the query.
What is the problem I have with the query above?
When there's order by and group by clauses the query runs extremely slow, e.g. 10-15 seconds to complete for the above configuration. However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds).
What I've already tried?
I've tried to used a FULLTEXT index, but it didn't help much, as the results evaluated by the LIKE statemenet are narrowed by the JOINs using indexes.
I've tried to use WHERE EXISTS (...) to find if there are records in left joined tables, but unfortunately without any luck.
Few notes about relations between these tables:
tracking -> tracking_items (1:n)
tracking_items -> cars (1:1)
tracking_items -> bikes (1:1)
tracking_items -> trucks (1:1)
So, I'm looking for a way to optimize that query.
Bill Karwin suggests the query might perform better if it used an index with a leading column of manufacture. I second that suggestion. Especially if that's very selective.
I also note that we're doing a GROUP BY t.id, where id is the PRIMARY KEY of the table.
No columns from any tables other than tracking are referenced in the SELECT list.
This suggests we're really only interested in returning rows from t, and not on creating duplicates due to multiple outer joins.
Seems like the COUNT() aggregate has the potential to return an inflated count, if there are multiple matching rows in tracking_item and bikes,cars,trucks. If there's three matching rows from cars, and four matching rows from bikes, ... the COUNT() aggregate is going to return a value of 12, rather than 7. (Or maybe there is some guarantee in the data such that there won't ever be multiple matching rows.)
If the manufacture is very selective, and that returns a reasonably small set of rows from tracking, if the query can make use of an index ...
And since we aren't returning any columns from any tables other than tracking, apart from a count or related items ...
I would be tempted to test correlated subqueries in the SELECT list, to get the count, and filter out the zero count rows using a HAVING clause.
Something like this:
SELECT SQL_NO_CACHE `t`.*
, ( ( SELECT COUNT(1)
FROM `tracking_items` `tic`
JOIN `cars` `c`
ON `c`.`car_id` = `tic`.`tracking_object_id`
AND `c`.`car_text` LIKE '%europe%'
WHERE `tic`.`tracking_id` = `t`.`id`
AND `tic`.`tracking_type` = 1
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tib`
JOIN `bikes` `b`
ON `b`.`bike_id` = `tib`.`tracking_object_id`
AND `b`.`bike_text` LIKE '%europe%'
WHERE `tib`.`tracking_id` = `t`.`id`
AND `tib`.`tracking_type` = 2
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tit`
JOIN `trucks` `tr`
ON `tr`.`truck_id` = `tit`.`tracking_object_id`
AND `tr`.`truck_text` LIKE '%europe%'
WHERE `tit`.`tracking_id` = `t`.`id`
AND `tit`.`tracking_type` = 3
)
) AS cnt_filtered_items
FROM `tracking` `t`
WHERE `t`.`manufacture` IN ('1256703406078', '9600048390403', '1533405067830')
HAVING cnt_filtered_items > 0
ORDER
BY `t`.`date_last_activity` ASC
, `t`.`id` ASC
We'd expect that the query could make effective use of an index on tracking with leading column of manufacture.
And on the tracking_items table, we want an index with leading columns of type and tracking_id. And including tracking_object_id in that index would mean the query could be satisfied from the index, without visiting the underlying pages.
For the cars, bikes and trucks tables the query should make use of an index with leading column of car_id, bike_id, and truck_id respectively. There's no getting around a scan of the car_text, bike_text, truck_text columns for the matching string... best we can do is narrow down the number rows that need to have that check performed.
This approach (just the tracking table in the outer query) should eliminate the need for the GROUP BY, the work required to identify and collapse duplicate rows.
BUT this approach, replacing joins with correlated subqueries, is best suited to queries where there is a SMALL number of rows returned by the outer query. Those subqueries get executed for every row processed by the outer query. It's imperative that those subqueries to have suitable indexes available. Even with those tuned, there is still potential for horrible performance for large sets.
This does still leave us with a "Using filesort" operation for the ORDER BY.
If the count of related items should be the product of a multiplication, rather than addition, we could tweak the query to achieve that. (We'd have to muck with the return of zeros, and the condition in the HAVING clause would need to be changed.)
If there wasn't a requirement to return a COUNT() of related items, then I would be tempted to move the correlated subqueries from the SELECT list down into EXISTS predicates in the WHERE clause.
Additional notes: seconding the comments from Rick James regarding indexing... there appears to be redundant indexes defined. i.e.
KEY `manufacture` (`manufacture`)
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`)
The index on the singleton column isn't necessary, since there is another index that has the column as the leading column.
Any query that can make effective use of the manufacture index will be able to make effective use of the manufacture_date_last_activity index. That is to say, the manufacture index could be dropped.
The same applies for the tracking_items table, and these two indexes:
KEY `tracking_id` (`tracking_id`)
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
The tracking_id index could be dropped, since it's redundant.
For the query above, I would suggest adding a covering index:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`,`tracking_object_id`)
-or- at a minimum, a non-covering index with those two columns leading:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`)
The EXPLAIN shows you are doing an index-scan ("index" in the type column) on the tracking table. An index-scan is pretty much as costly as a table-scan, especially when the index scanned is the PRIMARY index.
The rows column also shows that this index-scan is examining > 355K rows (since this figure is only a rough estimate, it's in fact examining all 400K rows).
Do you have an index on t.manufacture? I see two indexes named in the possible keys that might include that column (I can't be sure solely based on the name of the index), but for some reason the optimizer isn't using them. Maybe the set of values you search for is matched by every row in the table anyway.
If the list of manufacture values is intended to match a subset of the table, then you might need to give a hint to the optimizer to make it use the best index. https://dev.mysql.com/doc/refman/5.6/en/index-hints.html
Using LIKE '%word%' pattern-matching can never utilize an index, and must evaluate the pattern-match on every row. See my presentation, Full Text Search Throwdown.
How many items are in your IN(...) list? MySQL sometimes has problems with very long lists. See https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html#equality-range-optimization
P.S.: When you ask a query optimization question, you should always include the SHOW CREATE TABLE output for each table referenced in the query, so folks who answer don't have to guess at what indexes, data types, constraints you currently have.
First of all: Your query makes assumptions about string contents, which it shouldn't. What may car_text like '%europe%' indicate? Something like 'Sold in Europe only' maybe? Or Sold outside Europe only? Two possible strings with contradictory meanings. So if you assume a certain meaning once you find europe in the string, then you should be able to introduce this knowledge in the database - with a Europe flag or a region code for instance.
Anyway, you are showing certain trackings with their Europe transportation count. So select trackings, select transportation counts. You can either have the aggregation subquery for transportation counts in your SELECT clause or in your FROM clause.
Subquery in SELECT clause:
select
t.*,
(
select count(*)
from tracking_items ti
where ti.tracking_id = t.id
and (tracking_type, tracking_object_id) in
(
select 1, car_id from cars where car_text like '%europe%'
union all
select 2, bike_id from bikes where bike_text like '%europe%'
union all
select 3, truck_id from trucks where truck_text like '%europe%'
)
from tracking t
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
Subquery in FROM clause:
select
t.*, agg.total
from tracking t
left join
(
select tracking_id, count(*) as total
from tracking_items ti
and (tracking_type, tracking_object_id) in
(
select 1, car_id from cars where car_text like '%europe%'
union all
select 2, bike_id from bikes where bike_text like '%europe%'
union all
select 3, truck_id from trucks where truck_text like '%europe%'
)
group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
Indexes:
tracking(manufacture, date_last_activity, id)
tracking_items(tracking_id, tracking_type, tracking_object_id)
cars(car_text, car_id)
bikes(bike_text, bike_id)
trucks(truck_text, truck_id)
Sometimes MySQL is stronger on simple joins than on anything else, so it may be worth a try to blindly join transportation records and only later see whether it's car, bike or truck:
select
t.*, agg.total
from tracking t
left join
(
select
tracking_id,
sum((ti.tracking_type = 1 and c.car_text like '%europe%')
or
(ti.tracking_type = 2 and b.bike_text like '%europe%')
or
(ti.tracking_type = 3 and t.truck_text like '%europe%')
) as total
from tracking_items ti
left join cars c on c.car_id = ti.tracking_object_id
left join bikes b on c.bike_id = ti.tracking_object_id
left join trucks t on t.truck_id = ti.tracking_object_id
group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
If my guess is correct and cars, bikes, and trucks are independent from each other (i.e. a particular pre-aggregate result would only have data from one of them). You might be better off UNIONing three simpler sub-queries (one for each).
While you cannot do much index-wise about LIKEs involving leading wildcards; splitting it into UNIONed queries could allow avoid evaluating p.fb_message LIKE '%Europe%' OR p.fb_from_name LIKE '%Europe% for all the cars and bikes matches, and the c conditions for all the b and t matches, and so on.
When there's order by and group by clauses the query runs extremely slow, e.g. 10-15 seconds to complete for the above configuration. However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds).
This is interesting... generally the best optimization technique I know is to make good use of temporary tables, and it sounds like it will work really well here. So you would first create the temporary table:
create temporary table tracking_ungrouped (
key (id)
)
select sql_no_cache `t`.*
from `tracking` as `t`
inner join `tracking_items` as `ti` on (`ti`.`tracking_id` = `t`.`id`)
left join `cars` as `c` on (`c`.`car_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 1)
left join `bikes` as `b` on (`b`.`bike_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 2)
left join `trucks` as `tr` on (`tr`.`truck_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 3)
where
(`t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')) and
(`c`.`car_text` like '%europe%' or `b`.`bike_text` like '%europe%' or `tr`.`truck_text` like '%europe%');
and then query it for the results you need:
select t.*, count(`t`.`id`) as `cnt_filtered_items`
from tracking_ungrouped t
group by `t`.`id`
order by `t`.`date_last_activity` asc, `t`.`id` asc
limit 15;
ALTER TABLE cars ADD FULLTEXT(car_text)
then try
select sql_no_cache
`t`.*, -- If you are not using all, spell out the list
count(`t`.`id`) as `cnt_filtered_items` -- This does not make sense
-- and is possibly delivering an inflated value
from `tracking` as `t`
inner join `tracking_items` as `ti` ON (`ti`.`tracking_id` = `t`.`id`)
join -- not LEFT JOIN
`cars` as `c` ON `c`.`car_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 1
where `t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')
AND MATCH(c.car_text) AGAINST('+europe' IN BOOLEAN MODE)
group by `t`.`id` -- I don't know if this is necessary
order by `t`.`date_last_activity` asc, `t`.`id` asc
limit 15;
to see if it will correctly give you a suitable 15 cars.
If that looks OK, then combine the three together:
SELECT sql_no_cache
t2.*,
-- COUNT(*) -- this is probably broken
FROM (
( SELECT t.id FROM ... cars ... ) -- the query above
UNION ALL -- unless you need UNION DISTINCT
( SELECT t.id FROM ... bikes ... )
UNION ALL
( SELECT t.id FROM ... trucks ... )
) AS u
JOIN tracking AS t2 ON t2.id = u.id
ORDER BY t2.date_last_activity, t2.id
LIMIT 15;
Note that the inner SELECTs only deliver t.id, not t.*.
Anoter index needed:
ti: (tracking_type, tracking_object_id) -- in either order
Indexes
When you have INDEX(a,b), you don't also need INDEX(a). (This won't help the query in question, but it will help disk space and INSERT performance.)
When I see PRIMARY KEY(id), UNIQUE(x), I look for any good reason not to get rid of id and change to PRIMARY KEY(x). Unless there is something significant in the 'simplification' of the schema, such a change would help. Yeah, car_id is bulky, etc, but it is a big table and the extra lookup (from index BTree to data BTree) is hurting, etc.
I think it is very unlikely that KEYsort_field(date_order) will ever be used. Either drop it (saving a few GB) or combine it in some useful way. Let's see the query in which you think it might be useful. (Again, a suggestion that is not directly relevant to this Question.)
re Comment(s)
I made some substantive changes to my formulation.
My formulation has 4 GROUP BYs, 3 in the 'derived' table (ie, FROM ( ... UNION ... )), and one outside. Since the outer part is limited to 3*15 rows, I do not worry about performance there.
Further note that the derived table delivers only t.id, then re-probes tracking to get the other columns. This lets the derived table run much faster, but at a small expense of the extra JOIN outside.
Please elaborate on the intent of the COUNT(t.id); it won't work in my formulation, and I don't know what it is counting.
I had to get rid of the ORs; they are the secondary performance killer. (The first killer is LIKE '%...'.)
SELECT t.*
FROM (SELECT * FROM tracking WHERE manufacture
IN('1256703406078','9600048390403','1533405067830')) t
INNER JOIN (SELECT tracking_id, tracking_object_id, tracking_type FROM tracking_items
WHERE tracking_type IN (1,2,3)) ti
ON (ti.tracking_id = t.id)
LEFT JOIN (SELECT car_id, FROM cars WHERE car_text LIKE '%europe%') c
ON (c.car_id = ti.tracking_object_id AND ti.tracking_type = 1)
LEFT JOIN (SELECT bike_id FROM bikes WHERE bike_text LIKE '%europe%') b
ON (b.bike_id = ti.tracking_object_id AND ti.tracking_type = 2)
LEFT JOIN (SELECT truck_id FROM trucks WHERE truck_text LIKE '%europe%') tr
ON (tr.truck_id = ti.tracking_object_id AND ti.tracking_type = 3)
ORDER BY t.date_last_activity ASC, t.id ASC
The subqueries perform faster when it comes to join and if they are going to filter out lot of records.
The subquery of tracking table will filter out lot of other unwanted manufacture and results in a smaller table t to be joined.
Similarly applied the condition for the tracking_items table as we are interested in only tracking_types 1,2 and 3; to create a smaller table ti. If there are a lot of tracking_objects, you can even add the tracking object filter in this subquery.
Similar approaches to tables cars, bikes, trucks with their condition for their respective text to contain europe helps us to create smaller tables c,b,tr respectively.
Also removing the group by t.id as t.id is unique and we are performing inner join and left join on that or resulting table, as there is no need.
Lastly I am only selecting the required columns from each of the tables that are necessary, which will also reduce the load on the memory space and also runtime.
Hope this helps. Please let me know your feedback and run statistics.
I'm not sure it will work, how about applying filter on each table (cars, bikes, and trucks) in ON clause,before joining, it should filter out rows?

SQL Genius need .. Complex MySQL query

I am trying to optimise my php by doing as much work on the MySQL server as possible. I have this sql query which is pulling data out of a leads table, but at the same time joining two tags tables to combine the result. I am looking to add a company which is linked through a relations table.
So the table that holds the relationship between the two is relations_value which simply states (I add example data)
parenttable (companies) | parentrecordid (10) | childtable (leads) | childrecordid (1)
the companies table has quite a few columns but the only two relevant are;
id (10) | companyname (my company name)
So this query currently grabs everything I need but I want to bring the companyname into the query:
SELECT leads.id,
GROUP_CONCAT(c.tag ORDER BY c.tag) AS tags,
leads.status,
leads.probability
FROM `gs_db_1002`.leads
LEFT JOIN ( SELECT *
FROM tags_module
WHERE tagid IN ( SELECT id
FROM tags
WHERE moduleid = 'leads' ) ) as b
ON leads.id = b.recordid
LEFT JOIN `gs_db_1002`.tags as c
ON b.tagid = c.id
GROUP BY leads.id,
leads.status,
leads.probability
I need to be able to go into the relations_values table and pull parenttable and parentrecordid by selecting childtable = leads and childrecordid = 1 and somehow join these so that I am able to get companyname as a column in the above query...
Is this possible?
I have created a sqlfiddle: sqlfiddle.com/#!2/023fa/2 So I am looking to add companies.companyname as column to the query.
I don't know what your primary keys and foreign keys are that link each table together.. if you could give a better understanding of what ID's are linked to eachother it would make this a lot easier... however i did something that does return the correct result... but since all of the ID's are = 1 then it could be incorrect.
SELECT
leads.id, GROUP_CONCAT(c.tag ORDER BY c.tag) AS tags,
leads.status, leads.probability, companyname
FROM leads
LEFT JOIN (
SELECT * FROM tags_module WHERE tagid IN (
SELECT id FROM tags WHERE moduleid = 'leads' )
) as b ON leads.id = b.recordid
LEFT JOIN tags as c ON b.tagid = c.id
LEFT JOIN relations_values rv on rv.id = b.recordid
LEFT JOIN companies c1 on c1.createdby = rv.parentrecordid
GROUP BY leads.id,leads.status, leads.probability

SQL query using outer join and limiting child records for each parent

I'm having trouble figuring out how to structure a SQL query. Let's say we have a User table and a Pet table. Each user can have many pets and Pet has a breed column.
User:
id | name
______|________________
1 | Foo
2 | Bar
Pet:
id | owner_id | name | breed |
______|________________|____________|_____________|
1 | 1 | Fido | poodle |
2 | 2 | Fluffy | siamese |
The end goal is to provide a query that will give me all the pets for each user that match the given where clause while allowing sort and limit parameters to be used. So the ability to limit each user's pets to say 5 and sorted by name.
I'm working on building these queries dynamically for an ORM so I need a solution that works in MySQL and Postgresql (though it can be two different queries).
I've tried something like this which doesn't work:
SELECT "user"."id", "user"."name", "pet"."id", "pet"."owner_id", "pet"."name",
"pet"."breed"
FROM "user"
LEFT JOIN "pet" ON "user"."id" = "pet"."owner_id"
WHERE "pet"."id" IN
(SELECT "pet"."id" FROM "pet" WHERE "pet"."breed" = 'poodle' LIMIT 5)
In Postgres (8.4 or later), use the window function row_number() in a subquery:
SELECT user_id, user_name, pet_id, owner_id, pet_name, breed
FROM (
SELECT u.id AS user_id, u.name AS user_name
, p.id AS pet_id, owner_id, p.name AS pet_name, breed
, row_number() OVER (PARTITION BY u.id ORDER BY p.name, pet_id) AS rn
FROM "user" u
LEFT JOIN pet p ON p.owner_id = u.id
AND p.breed = 'poodle'
) sub
WHERE rn <= 5
ORDER BY user_name, user_id, pet_name, pet_id;
When using a LEFT JOIN, you can't combine that with WHERE conditions on the left table. That forcibly converts the LEFT JOIN to a plain [INNER] JOIN (and possibly removes rows from the result you did not want removed). Pull such conditions up into the join clause.
The way I have it, users without pets are included in the result - as opposed to your query stub.
The additional id columns in the ORDER BY clauses are supposed to break possible ties between non-unique names.
Never use a reserved word like user as identifier.
Work on your naming convention. id or name are terrible, non-descriptive choices, even if some ORMs suggest this nonsense. As you can see in the query, it leads to complications when joining a couple of tables, which is what you do in SQL.
Should be something like pet_id, pet, user_id, username etc. to begin with.
With a proper naming convention we could just SELECT * in the subquery.
MySQL does not support window functions, there are fidgety substitutes ...
SELECT user.id, user.name, pet.id, pet.name, pet.breed, pet.owner_id,
SUBSTRING_INDEX(group_concat(pet.owner_id order by pet.owner_id DESC), ',', 5)
FROM user
LEFT JOIN pet on user.id = pet.owner_id GROUP BY user.id
Above is rough/untested, but this source has exactly what you need, see step 4. also you don't need any of those " 's.