Using the GROUP BY command, it is possible to LEFT JOIN multiple tables and still get the desired number of rows from the first table.
For example,
SELECT b.title
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
However, since behind the scenes MYSQL is doing a cartesian product on the tables, if you include more than one SUM command you get incorrect values based on all the hidden rows. (The problem is explained fairly well here.)
SELECT b.title,SUM(o.id) as sales,SUM(a.id) as authors
FROM books `b`
LEFT JOIN orders `o`
ON o.bookid = b.id
LEFT JOIN authors `a`
ON b.authorid = a.id
GROUP BY b.id
There are a number of answers on SO about this, most using sub-queries in the JOINS but I am having trouble applying them to this fairly simple case.
How can you adjust the above so that you get the correct SUMs?
Edit
Example
books
id|title|authorid
1|Huck Finn|1
2|Tom Sawyer|1
3|Python Cookbook|2
orders
id|bookid
1|1
2|1
3|2
4|2
5|3
6|3
authors
id|author
1|Twain
2|Beazley
2|Jones
The "correct answer" for total # of authors of the Python Cookbook is 2. However, because there are two joins and the overall dataset is expanded by the join on number of orders, SUM(a.id) will be 4.
You are correct that by joining multiple tables you would not get the expected results.
But in this case you should use COUNT() instead of SUM() and count the distinct orders or authors.
Also by your design you should count the names of the authors and not the ids of the table authors:
SELECT b.title,
COUNT(DISTINCT o.id) as sales,
COUNT(DISTINCT a.author) as authors
FROM books `b`
LEFT JOIN orders `o` ON o.bookid = b.id
LEFT JOIN authors `a` ON b.authorid = a.id
GROUP BY b.id, b.title
See the demo.
Results:
| title | sales | authors |
| --------------- | ----- | ------- |
| Huck Finn | 2 | 1 |
| Tom Sawyer | 2 | 1 |
| Python Cookbook | 2 | 2 |
When dealing with separate aggregates, it is good style to aggregate before joining.
Your data model is horribly confusing, making it look like a book is written by one author only (referenced by books.authorid), while this "ID" is not an author's ID at all.
Your main problem is: You don't count! We count with COUNT. But you are mistakenly adding up ID values with SUM.
Here is a proper query, where I am aggregating before joining and using alias names to fight confusion and thus enhance the query's readability and maintainability.
SELECT
b.title,
COALESCE(o.order_count, 0) AS sales,
COALESCE(a.author_count, 0) AS authors
FROM (SELECT title, id AS book_id, authorid AS author_group_id FROM books) b
LEFT JOIN
(
SELECT id as author_group_id, COUNT(*) as author_count
FROM authors
GROUP BY id
) a ON a.author_group_id = b.author_group_id
LEFT JOIN
(
SELECT bookid AS book_id, COUNT(*) as order_count
FROM orders
GROUP BY bookid
) o ON o.book_id = b.book_id
ORDER BY b.title;
i don't think that your query would work like you eexspected.
Assume one book could have 3 authors.
For Authors:
So you would have three rows for that book in your books table,each one for every Author.
So a
SUM(b.authorid)
gives you the correct answer in your case.
For Orders:
you must use a subselect like
LEFT JOIN (SELECT SUM(id) o_sum,bookid FROM orders GROUP BY bookid) `o`
ON o.bookid = b.id
You should really reconsider your approach with books and authors.
I'm having trouble getting the desired output.
This is the SQLfiddle I put together of my scenario:
http://sqlfiddle.com/#!9/9beffe/111
The output I get is as seen in the SQLfiddle.
The desired output should be:
user_name participants_project participants_user
Username 1 1 1
Username 2 1 (null)
Username 3 1 3
Username 4 1 (null)
So I want it to always show all 4 users, that is matching the project specified, and the participants_user should be defined if the user/project exists in the participants db and if not defined it should get the user info with null data for project, participants tables columns.
How do I achieve it?
All you need is below -
SELECT DISTINCT u.user_name, pp.participants_project, pp.participants_user
FROM `users` u
LEFT JOIN `participants` pp ON u.user_id = pp.participants_user
AND pp.participants_project = 1
ORDER BY u.user_name
Here is the fiddle.
With the fact that Username 2 and Username 4 is not having any project, So column participants_project would also have null values.
I think what you need is this:
SELECT `users`.`user_name`, GROUP_CONCAT(`participants_project`), `participants_user` from `users`
LEFT JOIN `participants` on users.user_id = `participants`.`participants_user`
LEFT JOIN `projects` on `participants`.`participants_project` =`projects`.`project_id`
GROUP BY `users`.`user_id`
In the SQL code snippet, I added JOIN table for the users table and went through the participants table this will always list the users whether they have project or not, and in case you want to omit the users if they are not participating in a project just change 'LEFT JOIN' to be 'RIGHT JOIN' like this:
SELECT `users`.`user_name`, GROUP_CONCAT(`participants_project`), `participants_user` from `users`
LEFT JOIN `participants` on users.user_id = `participants`.`participants_user`
LEFT JOIN `projects` on `participants`.`participants_project` =`projects`.`project_id` GROUP BY `users`.`user_id`
I suggest that you include Primary key and foreign key in the schema as it speeds up the query execution time and also you will have more consistent data persistence in you database tables.
I have the following database (simplified):
CREATE TABLE `tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`manufacture` varchar(100) NOT NULL,
`date_last_activity` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `manufacture` (`manufacture`),
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`),
KEY `date_last_activity` (`date_last_activity`),
) ENGINE=InnoDB AUTO_INCREMENT=401353 DEFAULT CHARSET=utf8
CREATE TABLE `tracking_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tracking_id` int(11) NOT NULL,
`tracking_object_id` varchar(100) NOT NULL,
`tracking_type` int(11) NOT NULL COMMENT 'Its used to specify the type of each item, e.g. car, bike, etc',
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `tracking_id` (`tracking_id`),
KEY `tracking_object_id` (`tracking_object_id`),
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1299995 DEFAULT CHARSET=utf8
CREATE TABLE `cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`car_id` varchar(255) NOT NULL COMMENT 'It must be VARCHAR, because the data is coming from external source.',
`manufacture` varchar(255) NOT NULL,
`car_text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`date_order` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`deleted` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `car_id` (`car_id`),
KEY `sort_field` (`date_order`)
) ENGINE=InnoDB AUTO_INCREMENT=150000025 DEFAULT CHARSET=utf8
This is my "problematic" query, that runs extremely slow.
SELECT sql_no_cache `t`.*,
count(`t`.`id`) AS `cnt_filtered_items`
FROM `tracking` AS `t`
INNER JOIN `tracking_items` AS `ti` ON (`ti`.`tracking_id` = `t`.`id`)
LEFT JOIN `cars` AS `c` ON (`c`.`car_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 1)
LEFT JOIN `bikes` AS `b` ON (`b`.`bike_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 2)
LEFT JOIN `trucks` AS `tr` ON (`tr`.`truck_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 3)
WHERE (`t`.`manufacture` IN('1256703406078',
'9600048390403',
'1533405067830'))
AND (`c`.`car_text` LIKE '%europe%'
OR `b`.`bike_text` LIKE '%europe%'
OR `tr`.`truck_text` LIKE '%europe%')
GROUP BY `t`.`id`
ORDER BY `t`.`date_last_activity` ASC,
`t`.`id` ASC
LIMIT 15
This is the result of EXPLAIN for above query:
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | t | index | PRIMARY,manufacture,manufacture_date_last_activity,date_last_activity | PRIMARY | 4 | NULL | 400,000 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ti | ref | tracking_id,tracking_object_id,tracking_id_tracking_object_id | tracking_id | 4 | table.t.id | 1 | NULL |
| 1 | SIMPLE | c | eq_ref | car_id | car_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | b | eq_ref | bike_id | bike_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | t | eq_ref | truck_id | truck_id | 767 | table.ti.tracking_object_id | 1 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
What is the problem this query is trying to solve?
Basically, I need to find all records in tracking table that may be associated with records in tracking_items (1:n) where each record in tracking_items may be associated with record in left joined tables. The filtering criteria is crucial part in the query.
What is the problem I have with the query above?
When there's order by and group by clauses the query runs extremely slow, e.g. 10-15 seconds to complete for the above configuration. However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds).
What I've already tried?
I've tried to used a FULLTEXT index, but it didn't help much, as the results evaluated by the LIKE statemenet are narrowed by the JOINs using indexes.
I've tried to use WHERE EXISTS (...) to find if there are records in left joined tables, but unfortunately without any luck.
Few notes about relations between these tables:
tracking -> tracking_items (1:n)
tracking_items -> cars (1:1)
tracking_items -> bikes (1:1)
tracking_items -> trucks (1:1)
So, I'm looking for a way to optimize that query.
Bill Karwin suggests the query might perform better if it used an index with a leading column of manufacture. I second that suggestion. Especially if that's very selective.
I also note that we're doing a GROUP BY t.id, where id is the PRIMARY KEY of the table.
No columns from any tables other than tracking are referenced in the SELECT list.
This suggests we're really only interested in returning rows from t, and not on creating duplicates due to multiple outer joins.
Seems like the COUNT() aggregate has the potential to return an inflated count, if there are multiple matching rows in tracking_item and bikes,cars,trucks. If there's three matching rows from cars, and four matching rows from bikes, ... the COUNT() aggregate is going to return a value of 12, rather than 7. (Or maybe there is some guarantee in the data such that there won't ever be multiple matching rows.)
If the manufacture is very selective, and that returns a reasonably small set of rows from tracking, if the query can make use of an index ...
And since we aren't returning any columns from any tables other than tracking, apart from a count or related items ...
I would be tempted to test correlated subqueries in the SELECT list, to get the count, and filter out the zero count rows using a HAVING clause.
Something like this:
SELECT SQL_NO_CACHE `t`.*
, ( ( SELECT COUNT(1)
FROM `tracking_items` `tic`
JOIN `cars` `c`
ON `c`.`car_id` = `tic`.`tracking_object_id`
AND `c`.`car_text` LIKE '%europe%'
WHERE `tic`.`tracking_id` = `t`.`id`
AND `tic`.`tracking_type` = 1
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tib`
JOIN `bikes` `b`
ON `b`.`bike_id` = `tib`.`tracking_object_id`
AND `b`.`bike_text` LIKE '%europe%'
WHERE `tib`.`tracking_id` = `t`.`id`
AND `tib`.`tracking_type` = 2
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tit`
JOIN `trucks` `tr`
ON `tr`.`truck_id` = `tit`.`tracking_object_id`
AND `tr`.`truck_text` LIKE '%europe%'
WHERE `tit`.`tracking_id` = `t`.`id`
AND `tit`.`tracking_type` = 3
)
) AS cnt_filtered_items
FROM `tracking` `t`
WHERE `t`.`manufacture` IN ('1256703406078', '9600048390403', '1533405067830')
HAVING cnt_filtered_items > 0
ORDER
BY `t`.`date_last_activity` ASC
, `t`.`id` ASC
We'd expect that the query could make effective use of an index on tracking with leading column of manufacture.
And on the tracking_items table, we want an index with leading columns of type and tracking_id. And including tracking_object_id in that index would mean the query could be satisfied from the index, without visiting the underlying pages.
For the cars, bikes and trucks tables the query should make use of an index with leading column of car_id, bike_id, and truck_id respectively. There's no getting around a scan of the car_text, bike_text, truck_text columns for the matching string... best we can do is narrow down the number rows that need to have that check performed.
This approach (just the tracking table in the outer query) should eliminate the need for the GROUP BY, the work required to identify and collapse duplicate rows.
BUT this approach, replacing joins with correlated subqueries, is best suited to queries where there is a SMALL number of rows returned by the outer query. Those subqueries get executed for every row processed by the outer query. It's imperative that those subqueries to have suitable indexes available. Even with those tuned, there is still potential for horrible performance for large sets.
This does still leave us with a "Using filesort" operation for the ORDER BY.
If the count of related items should be the product of a multiplication, rather than addition, we could tweak the query to achieve that. (We'd have to muck with the return of zeros, and the condition in the HAVING clause would need to be changed.)
If there wasn't a requirement to return a COUNT() of related items, then I would be tempted to move the correlated subqueries from the SELECT list down into EXISTS predicates in the WHERE clause.
Additional notes: seconding the comments from Rick James regarding indexing... there appears to be redundant indexes defined. i.e.
KEY `manufacture` (`manufacture`)
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`)
The index on the singleton column isn't necessary, since there is another index that has the column as the leading column.
Any query that can make effective use of the manufacture index will be able to make effective use of the manufacture_date_last_activity index. That is to say, the manufacture index could be dropped.
The same applies for the tracking_items table, and these two indexes:
KEY `tracking_id` (`tracking_id`)
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
The tracking_id index could be dropped, since it's redundant.
For the query above, I would suggest adding a covering index:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`,`tracking_object_id`)
-or- at a minimum, a non-covering index with those two columns leading:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`)
The EXPLAIN shows you are doing an index-scan ("index" in the type column) on the tracking table. An index-scan is pretty much as costly as a table-scan, especially when the index scanned is the PRIMARY index.
The rows column also shows that this index-scan is examining > 355K rows (since this figure is only a rough estimate, it's in fact examining all 400K rows).
Do you have an index on t.manufacture? I see two indexes named in the possible keys that might include that column (I can't be sure solely based on the name of the index), but for some reason the optimizer isn't using them. Maybe the set of values you search for is matched by every row in the table anyway.
If the list of manufacture values is intended to match a subset of the table, then you might need to give a hint to the optimizer to make it use the best index. https://dev.mysql.com/doc/refman/5.6/en/index-hints.html
Using LIKE '%word%' pattern-matching can never utilize an index, and must evaluate the pattern-match on every row. See my presentation, Full Text Search Throwdown.
How many items are in your IN(...) list? MySQL sometimes has problems with very long lists. See https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html#equality-range-optimization
P.S.: When you ask a query optimization question, you should always include the SHOW CREATE TABLE output for each table referenced in the query, so folks who answer don't have to guess at what indexes, data types, constraints you currently have.
First of all: Your query makes assumptions about string contents, which it shouldn't. What may car_text like '%europe%' indicate? Something like 'Sold in Europe only' maybe? Or Sold outside Europe only? Two possible strings with contradictory meanings. So if you assume a certain meaning once you find europe in the string, then you should be able to introduce this knowledge in the database - with a Europe flag or a region code for instance.
Anyway, you are showing certain trackings with their Europe transportation count. So select trackings, select transportation counts. You can either have the aggregation subquery for transportation counts in your SELECT clause or in your FROM clause.
Subquery in SELECT clause:
select
t.*,
(
select count(*)
from tracking_items ti
where ti.tracking_id = t.id
and (tracking_type, tracking_object_id) in
(
select 1, car_id from cars where car_text like '%europe%'
union all
select 2, bike_id from bikes where bike_text like '%europe%'
union all
select 3, truck_id from trucks where truck_text like '%europe%'
)
from tracking t
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
Subquery in FROM clause:
select
t.*, agg.total
from tracking t
left join
(
select tracking_id, count(*) as total
from tracking_items ti
and (tracking_type, tracking_object_id) in
(
select 1, car_id from cars where car_text like '%europe%'
union all
select 2, bike_id from bikes where bike_text like '%europe%'
union all
select 3, truck_id from trucks where truck_text like '%europe%'
)
group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
Indexes:
tracking(manufacture, date_last_activity, id)
tracking_items(tracking_id, tracking_type, tracking_object_id)
cars(car_text, car_id)
bikes(bike_text, bike_id)
trucks(truck_text, truck_id)
Sometimes MySQL is stronger on simple joins than on anything else, so it may be worth a try to blindly join transportation records and only later see whether it's car, bike or truck:
select
t.*, agg.total
from tracking t
left join
(
select
tracking_id,
sum((ti.tracking_type = 1 and c.car_text like '%europe%')
or
(ti.tracking_type = 2 and b.bike_text like '%europe%')
or
(ti.tracking_type = 3 and t.truck_text like '%europe%')
) as total
from tracking_items ti
left join cars c on c.car_id = ti.tracking_object_id
left join bikes b on c.bike_id = ti.tracking_object_id
left join trucks t on t.truck_id = ti.tracking_object_id
group by tracking_id
) agg on agg.tracking_id = t.id
where manufacture in ('1256703406078', '9600048390403', '1533405067830')
order by date_last_activity, id;
If my guess is correct and cars, bikes, and trucks are independent from each other (i.e. a particular pre-aggregate result would only have data from one of them). You might be better off UNIONing three simpler sub-queries (one for each).
While you cannot do much index-wise about LIKEs involving leading wildcards; splitting it into UNIONed queries could allow avoid evaluating p.fb_message LIKE '%Europe%' OR p.fb_from_name LIKE '%Europe% for all the cars and bikes matches, and the c conditions for all the b and t matches, and so on.
When there's order by and group by clauses the query runs extremely slow, e.g. 10-15 seconds to complete for the above configuration. However, if I omit any of these clauses, the query is running pretty quick (~0.2 seconds).
This is interesting... generally the best optimization technique I know is to make good use of temporary tables, and it sounds like it will work really well here. So you would first create the temporary table:
create temporary table tracking_ungrouped (
key (id)
)
select sql_no_cache `t`.*
from `tracking` as `t`
inner join `tracking_items` as `ti` on (`ti`.`tracking_id` = `t`.`id`)
left join `cars` as `c` on (`c`.`car_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 1)
left join `bikes` as `b` on (`b`.`bike_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 2)
left join `trucks` as `tr` on (`tr`.`truck_id` = `ti`.`tracking_object_id` AND `ti`.`tracking_type` = 3)
where
(`t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')) and
(`c`.`car_text` like '%europe%' or `b`.`bike_text` like '%europe%' or `tr`.`truck_text` like '%europe%');
and then query it for the results you need:
select t.*, count(`t`.`id`) as `cnt_filtered_items`
from tracking_ungrouped t
group by `t`.`id`
order by `t`.`date_last_activity` asc, `t`.`id` asc
limit 15;
ALTER TABLE cars ADD FULLTEXT(car_text)
then try
select sql_no_cache
`t`.*, -- If you are not using all, spell out the list
count(`t`.`id`) as `cnt_filtered_items` -- This does not make sense
-- and is possibly delivering an inflated value
from `tracking` as `t`
inner join `tracking_items` as `ti` ON (`ti`.`tracking_id` = `t`.`id`)
join -- not LEFT JOIN
`cars` as `c` ON `c`.`car_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 1
where `t`.`manufacture` in('1256703406078', '9600048390403', '1533405067830')
AND MATCH(c.car_text) AGAINST('+europe' IN BOOLEAN MODE)
group by `t`.`id` -- I don't know if this is necessary
order by `t`.`date_last_activity` asc, `t`.`id` asc
limit 15;
to see if it will correctly give you a suitable 15 cars.
If that looks OK, then combine the three together:
SELECT sql_no_cache
t2.*,
-- COUNT(*) -- this is probably broken
FROM (
( SELECT t.id FROM ... cars ... ) -- the query above
UNION ALL -- unless you need UNION DISTINCT
( SELECT t.id FROM ... bikes ... )
UNION ALL
( SELECT t.id FROM ... trucks ... )
) AS u
JOIN tracking AS t2 ON t2.id = u.id
ORDER BY t2.date_last_activity, t2.id
LIMIT 15;
Note that the inner SELECTs only deliver t.id, not t.*.
Anoter index needed:
ti: (tracking_type, tracking_object_id) -- in either order
Indexes
When you have INDEX(a,b), you don't also need INDEX(a). (This won't help the query in question, but it will help disk space and INSERT performance.)
When I see PRIMARY KEY(id), UNIQUE(x), I look for any good reason not to get rid of id and change to PRIMARY KEY(x). Unless there is something significant in the 'simplification' of the schema, such a change would help. Yeah, car_id is bulky, etc, but it is a big table and the extra lookup (from index BTree to data BTree) is hurting, etc.
I think it is very unlikely that KEYsort_field(date_order) will ever be used. Either drop it (saving a few GB) or combine it in some useful way. Let's see the query in which you think it might be useful. (Again, a suggestion that is not directly relevant to this Question.)
re Comment(s)
I made some substantive changes to my formulation.
My formulation has 4 GROUP BYs, 3 in the 'derived' table (ie, FROM ( ... UNION ... )), and one outside. Since the outer part is limited to 3*15 rows, I do not worry about performance there.
Further note that the derived table delivers only t.id, then re-probes tracking to get the other columns. This lets the derived table run much faster, but at a small expense of the extra JOIN outside.
Please elaborate on the intent of the COUNT(t.id); it won't work in my formulation, and I don't know what it is counting.
I had to get rid of the ORs; they are the secondary performance killer. (The first killer is LIKE '%...'.)
SELECT t.*
FROM (SELECT * FROM tracking WHERE manufacture
IN('1256703406078','9600048390403','1533405067830')) t
INNER JOIN (SELECT tracking_id, tracking_object_id, tracking_type FROM tracking_items
WHERE tracking_type IN (1,2,3)) ti
ON (ti.tracking_id = t.id)
LEFT JOIN (SELECT car_id, FROM cars WHERE car_text LIKE '%europe%') c
ON (c.car_id = ti.tracking_object_id AND ti.tracking_type = 1)
LEFT JOIN (SELECT bike_id FROM bikes WHERE bike_text LIKE '%europe%') b
ON (b.bike_id = ti.tracking_object_id AND ti.tracking_type = 2)
LEFT JOIN (SELECT truck_id FROM trucks WHERE truck_text LIKE '%europe%') tr
ON (tr.truck_id = ti.tracking_object_id AND ti.tracking_type = 3)
ORDER BY t.date_last_activity ASC, t.id ASC
The subqueries perform faster when it comes to join and if they are going to filter out lot of records.
The subquery of tracking table will filter out lot of other unwanted manufacture and results in a smaller table t to be joined.
Similarly applied the condition for the tracking_items table as we are interested in only tracking_types 1,2 and 3; to create a smaller table ti. If there are a lot of tracking_objects, you can even add the tracking object filter in this subquery.
Similar approaches to tables cars, bikes, trucks with their condition for their respective text to contain europe helps us to create smaller tables c,b,tr respectively.
Also removing the group by t.id as t.id is unique and we are performing inner join and left join on that or resulting table, as there is no need.
Lastly I am only selecting the required columns from each of the tables that are necessary, which will also reduce the load on the memory space and also runtime.
Hope this helps. Please let me know your feedback and run statistics.
I'm not sure it will work, how about applying filter on each table (cars, bikes, and trucks) in ON clause,before joining, it should filter out rows?
I'm having trouble figuring out how to structure a SQL query. Let's say we have a User table and a Pet table. Each user can have many pets and Pet has a breed column.
User:
id | name
______|________________
1 | Foo
2 | Bar
Pet:
id | owner_id | name | breed |
______|________________|____________|_____________|
1 | 1 | Fido | poodle |
2 | 2 | Fluffy | siamese |
The end goal is to provide a query that will give me all the pets for each user that match the given where clause while allowing sort and limit parameters to be used. So the ability to limit each user's pets to say 5 and sorted by name.
I'm working on building these queries dynamically for an ORM so I need a solution that works in MySQL and Postgresql (though it can be two different queries).
I've tried something like this which doesn't work:
SELECT "user"."id", "user"."name", "pet"."id", "pet"."owner_id", "pet"."name",
"pet"."breed"
FROM "user"
LEFT JOIN "pet" ON "user"."id" = "pet"."owner_id"
WHERE "pet"."id" IN
(SELECT "pet"."id" FROM "pet" WHERE "pet"."breed" = 'poodle' LIMIT 5)
In Postgres (8.4 or later), use the window function row_number() in a subquery:
SELECT user_id, user_name, pet_id, owner_id, pet_name, breed
FROM (
SELECT u.id AS user_id, u.name AS user_name
, p.id AS pet_id, owner_id, p.name AS pet_name, breed
, row_number() OVER (PARTITION BY u.id ORDER BY p.name, pet_id) AS rn
FROM "user" u
LEFT JOIN pet p ON p.owner_id = u.id
AND p.breed = 'poodle'
) sub
WHERE rn <= 5
ORDER BY user_name, user_id, pet_name, pet_id;
When using a LEFT JOIN, you can't combine that with WHERE conditions on the left table. That forcibly converts the LEFT JOIN to a plain [INNER] JOIN (and possibly removes rows from the result you did not want removed). Pull such conditions up into the join clause.
The way I have it, users without pets are included in the result - as opposed to your query stub.
The additional id columns in the ORDER BY clauses are supposed to break possible ties between non-unique names.
Never use a reserved word like user as identifier.
Work on your naming convention. id or name are terrible, non-descriptive choices, even if some ORMs suggest this nonsense. As you can see in the query, it leads to complications when joining a couple of tables, which is what you do in SQL.
Should be something like pet_id, pet, user_id, username etc. to begin with.
With a proper naming convention we could just SELECT * in the subquery.
MySQL does not support window functions, there are fidgety substitutes ...
SELECT user.id, user.name, pet.id, pet.name, pet.breed, pet.owner_id,
SUBSTRING_INDEX(group_concat(pet.owner_id order by pet.owner_id DESC), ',', 5)
FROM user
LEFT JOIN pet on user.id = pet.owner_id GROUP BY user.id
Above is rough/untested, but this source has exactly what you need, see step 4. also you don't need any of those " 's.