My Odd SubSelect, Need a LEFT JOIN Improvement - mysql

Here is a sample SQL dump: https://gist.github.com/JREAM/99287d033320b2978728
I have a SELECT that grabs a bundle of users.
I then do a foreach loop to attach all the associated tree_processes to that user.
So I end up doing X Queries: users * tree.
Wouldn't it be much more efficient to fetch the two together?
I've thought about doing a LEFT JOIN Subselect, but I'm having a hard time getting it correct.
Below I've done a query to select the correct data in the SELECT, however I would have to do this for all 15 rows and it seems like a TERRIBLE waste of memory.
This is my dirty Ateempt:
-
SELECT
s.id,
s.firstname,
s.lastname,
s.email,
(
SELECT tp.id FROM tree_processes AS tp
JOIN tree AS t ON (
t.id = tp.tree_id
)
WHERE subscribers_id = s.id
ORDER BY tp.id DESC
LIMIT 1
) AS newest_tree_id,
#
# Don't want to have to do this below for every row
(
SELECT t.type FROM tree_processes AS tp
JOIN tree AS t ON (
t.id = tp.tree_id
)
WHERE subscribers_id = s.id
ORDER BY tp.id DESC
LIMIT 1
) AS tree_type
FROM subscribers AS s
INNER JOIN scenario_subscriptions AS ss ON (
ss.subscribers_id = s.id
)
WHERE ss.scenarios_id = 1
AND ss.completed != 1
AND ss.purchased_exit != 1
AND deleted != 1
GROUP BY s.id
LIMIT 0, 100
This is my LEFT JOIN attempt, but I am having trouble getting the SELECT values
SELECT
s.id,
s.firstname,
s.lastname,
s.email,
freshness.id,
# freshness.subscribers_id < -- Cant get multiples out of the LEFT join
FROM subscribers AS s
INNER JOIN scenario_subscriptions AS ss ON (
ss.subscribers_id = s.id
)
LEFT JOIN ( SELECT tp.id, tp.subscribers_id AS tp FROM tree_processes AS tp
JOIN tree AS t ON (
t.id = tp.tree_id
)
ORDER BY tp.id DESC
LIMIT 1 ) AS freshness
ON (
s.id = subscribers_id
)
WHERE ss.scenarios_id = 1
AND ss.completed != 1
AND ss.purchased_exit != 1
AND deleted != 1
GROUP BY s.id
LIMIT 0, 100

In the LEFT JOIN you are using 'freshness' as the table alias. This in you select you need to additionally state what column(s) you want from it. Since there is only one column (id) you need to add:
freshness.id
to the select clause.
Your ON clause of the left join looks pretty dodgy too. Maybe freshness.id = ss.subscribers_id?
Cheers -

Related

MySQL LEFT JOIN only last row painfully slow

I have three tables, which looks like this
forts
|id|lat|lon|
fort_sightings
|id|fort_id|team|
fort_raids
|id|fort_id|raid_level|
I need a query that fetches all the rows from forts and then select the latest information from fort_sightings and fort_raids, if any. There might be several rows where fort_id has the same value, so I need the latest information.
Currently, I have this, which might not be the prettiest
SELECT
*
FROM
forts c
LEFT JOIN fort_sightings o ON o.id = (
SELECT
id
FROM
fort_sightings
WHERE
fort_id = c.id
ORDER BY
id DESC
LIMIT 1
)
LEFT JOIN fort_raids r ON r.id = (
SELECT
id
FROM
fort_raids
WHERE
fort_id = c.id
ORDER BY
id DESC
LIMIT 1
)
But it's painfully slow, the query takes over 10 seconds. There's only ~350 rows in forts, so it really shouldn't take this long. I believe it's from all the SELECT queries in the JOIN, but I don't know any alternative.
EXPLAIN
This is your query:
SELECT *
FROM forts c LEFT JOIN
fort_sightings o
ON o.id = (SELECT fs2.id
FROM fort_sightings fs2
WHERE fs.fort_id = c.id
ORDER BY fs2.id DESC
LIMIT 1
) LEFT JOIN
fort_raids r
ON r.id = (SELECT fr2.id
FROM fort_raids fr2
WHERE fr2.fort_id = c.id
ORDER BY fr2.id DESC
LIMIT 1
);
I find it strangely structured. But, see if this works:
fort_sightings(fort_id, id)
fort_raids(fort_id, id)
It is important that the fort_id be the first key in the indexes.
If this doesn't work, then you might need to modify the query.
Aside from the index recommendation from Gordon, you are doing a correlated subquery which means that for every record in the forts table, you are re-running the query to the sightings and raids. I would change slightly to pull all max() per fort from each THEN join... like
SELECT
c.id,
c.lat,
c.lon,
fs2.team,
fr2.raid_level
FROM
forts c
LEFT JOIN
( select fs.fort_id, max( fs.id ) as MaxSightID
from fort_sightings fs
group by fs.fort_id ) S
on c.id = s.fort_id
LEFT JOIN fort_sightings fs2
on s.MaxSightID = fs2.id
LEFT JOIN
( select fr.fort_id, max( fr.id ) as MaxRaidID
from fort_raids fs
group by fr.fort_id ) R
on c.ID = r.fort_id
LEFT JOIN fort_raids fr2
on r.MaxRaidID = fr2.id
The second part of the left-joins goes back to the original raid and sighting tables to pull the corresponding team and raid level in the final results if any such are found.
If you need the most efficient query, you should add columns latest_fort_sighting_id latest_fort_raid_id to table forts. MySQL does not have powerful features such as Materialized Views or Hash Joins like PostgreSQL, we need to handle them manually. Do not forget using transaction for updates.
If you limit range of forts, alternatively, you can run optimized query only using LEFT JOIN.
select - SQL join: selecting the last records in a one-to-many relationship - Stack Overflow
SELECT forts.*, fs1.team, fr1.raid_level FROM forts
LEFT JOIN fort_sightings fs1 ON fs1.fort_id = forts.id
LEFT JOIN fort_sightings fs2 ON fs2.fort_id = forts.id AND fs1.id < fs2.id
LEFT JOIN fort_raids fr1 ON fr1.fort_id = forts.id
LEFT JOIN fort_raids fr2 ON fr2.fort_id = forts.id AND fr1.id < fr2.id
WHERE fs2.id IS NULL AND fr2.id IS NULL AND forts.id > 5 ORDER BY forts.id LIMIT 5;

How can I optimise an extremely slow MySQL query that uses COUNT DISTINCT

I have a very slow MySQL query that I would like to optimise.
The query is taking 66.2070 seconds to return 5 results from tables containing around 200 rows.
The database tables store users, experiments (A/B tests), goals (page URLs), visits (page visits) and conversions (clicks a goal's URL). The visit and conversion tables both have a combination column that records if version A or B of a page was visited or a conversion came from version A or B. Combinations are stored in the db as 1 or 2.
I'm trying to get a list of a user's experiments with the number of visits and conversions for each combination.
For some relationships I'm using composite primary keys, which does make the joins more complicated. I doubt it but could this be the cause of the problem?
How can I rewrite this query to make it run in a reasonable time, at least less than a second?
Here's my database schema:
and her's my query:
SELECT e.id AS id,
e.name AS name,
e.status AS status,
e.created AS created,
Count(DISTINCT v1.id) AS visits1,
Count(DISTINCT v2.id) AS visits2,
Count(DISTINCT c1.id) AS conversions1,
Count(DISTINCT c2.id) AS conversions2
FROM experiment e
LEFT JOIN visit v1
ON ( v1.experiment_id = e.id
AND v1.user_id = e.user_id
AND v1.combination = 1 )
LEFT JOIN visit v2
ON ( v2.experiment_id = e.id
AND v2.user_id = e.user_id
AND v2.combination = 2 )
LEFT JOIN goal g
ON ( g.experiment_id = e.id
AND g.user_id = e.user_id
AND g.principal = 1 )
LEFT JOIN conversion c1
ON ( c1.experiment_id = e.id
AND c1.user_id = e.user_id
AND c1.goal_id = g.id
AND c1.combination = 1 )
LEFT JOIN conversion c2
ON ( c2.experiment_id = e.id
AND c2.user_id = e.user_id
AND c2.goal_id = g.id
AND c2.combination = 2 )
WHERE e.user_id = 25
GROUP BY e.id
ORDER BY e.created DESC
LIMIT 5
The resulting table should look something like this:
You should do the aggregations before doing the joins, to avoid getting large intermediate results. I think the logic is
SELECT e.id, e.name, e.status, e.created,
v.visits1, v.visits2, g.conversions1, g.conversions2
FROM experiment e LEFT JOIN
(SELECT experiment_id, user_id,
SUM(combination = 1) as visits1,
SUM(combination = 2) as visits2
FROM visits
WHERE combination IN (1, 2)
GROUP BY experiment_id, user_id
) v
ON v.experiment_id = e.id AND
v.user_id = e.user_id LEFT JOIN
(SELECT g.experiment_id, g.user_id,
SUM(c.combination = 1) as conversions1,
SUM(c.combination = 2) as conversions2
FROM goal g LEFT JOIN
conversion c
ON c.experiment_id = g.experiment_id AND
c.user_id = g.user_id AND
c.goal_id = g.id
WHERE g.principal = 1
GROUP BY g.experiment_id, g.user_id
) g
ON g.experiment_id = e.id AND
g.user_id = e.user_id LEFT JOIN
WHERE e.user_id = 25
ORDER BY e.created DESC
LIMIT 5 ;
There are further optimizations for this. For instance, an index on experiment(user_id, created, id).
For your question about the drawback of using composite keys I found this:
Drawback of composite keys
I can't currently test ur database but use the EXPLAIN syntax in mysql to find out what is wrong with the perfomance of ur query:
MySQL docs about EXPLAIN and optimizing ur query with EXPLAIN

mysql query optimization steps or how to optimze query

I don't know much about query optimization but I know the order in which queries get executed
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
This the query I had written
SELECT
`main_table`.forum_id,
my_topics.topic_id,
(
SELECT MAX(my_posts.post_id) FROM my_posts WHERE my_topics.topic_id = my_posts.topic_id
) AS `maxpostid`,
(
SELECT my_posts.admin_user_id FROM my_posts WHERE my_topics.topic_id = my_posts.topic_id ORDER BY my_posts.post_id DESC LIMIT 1
) AS `admin_user_id`,
(
SELECT my_posts.user_id FROM my_posts WHERE my_topics.topic_id = my_posts.topic_id ORDER BY my_posts.post_id DESC LIMIT 1
) AS `user_id`,
(
SELECT COUNT(my_topics.topic_id) FROM my_topics WHERE my_topics.forum_id = main_table.forum_id ORDER BY my_topics.forum_id DESC LIMIT 1
) AS `topicscount`,
(
SELECT COUNT(my_posts.post_id) FROM my_posts WHERE my_topics.topic_id = my_posts.topic_id ORDER BY my_topics.topic_id DESC LIMIT 1
) AS `postcount`,
(
SELECT CONCAT(admin_user.firstname,' ',admin_user.lastname) FROM admin_user INNER JOIN my_posts ON my_posts.admin_user_id = admin_user.user_id WHERE my_posts.post_id = maxpostid ORDER BY my_posts.post_id DESC LIMIT 1
) AS `adminname`,
(
SELECT forum_user.nick_name FROM forum_user INNER JOIN my_posts ON my_posts.user_id = forum_user.user_id WHERE my_posts.post_id = maxpostid ORDER BY my_posts.post_id DESC LIMIT 1
) AS `nickname`,
(
SELECT CONCAT(ce1.value,' ',ce2.value) AS fullname FROM my_posts INNER JOIN customer_entity_varchar AS ce1 ON ce1.entity_id = my_posts.user_id INNER JOIN customer_entity_varchar AS ce2 ON ce2.entity_id=my_posts.user_id WHERE (ce1.attribute_id = 1) AND (ce2.attribute_id = 2) AND my_posts.post_id = maxpostid ORDER BY my_posts.post_id DESC LIMIT 1
) AS `fullname`
FROM `my_forums` AS `main_table`
LEFT JOIN `my_topics` ON main_table.forum_id = my_topics.forum_id
WHERE (forum_status = '1')
And now I want to know if there is any way to optimize it ? Because all the logic is written in Select section not From, but I don't know how to write the same logic in From section of the query ?
Does it make any difference or both are same ?
Thanks
Correlated subqueries should really be a last resort, they often end up being executed RBAR, and given that a number of your subqueries are very similar, trying to get the same result using joins is going to result in a lot less table scans.
The first thing I note is that all of your subqueries include the table my_posts, and most contain ORDER BY my_posts.post_id DESC LIMIT 1, those that don't have a count with no group by so the order and limit are redundant anyway, so my first step would be to join to my_posts:
SELECT *
FROM my_forums AS f
LEFT JOIN my_topics AS t
ON f.forum_id = t.forum_id
LEFT JOIN
( SELECT topic_id, MAX(post_id) AS post_id
FROM my_posts
GROUP BY topic_id
) AS Maxp
ON Maxp.topic_id = t.topic_id
LEFT JOIN my_posts AS p
ON p.post_id = Maxp.post_id
WHERE forum_status = '1';
Here the subquery just ensures you get the latest post per topic_id. I have shortened your table aliases here for my convenience, I am not sure why you would use a table alias that is longer than the actual table name?
Now you have the bulk of your query you can start adding in your columns, in order to get the post count, I have added a count to the subquery Maxp, I have also had to add a few more joins to get some of the detail out, such as names:
SELECT f.forum_id,
t.topic_id,
p.post_id AS `maxpostid`,
p.admin_user_id,
p.user_id,
t2.topicscount,
maxp.postcount,
CONCAT(au.firstname,' ',au.lastname) AS adminname,
fu.nick_name AS nickname
CONCAT(ce1.value,' ',ce2.value) AS fullname
FROM my_forums AS f
LEFT JOIN my_topics AS t
ON f.forum_id = t.forum_id
LEFT JOIN
( SELECT topic_id,
MAX(post_id) AS post_id,
COUNT(*) AS postcount
FROM my_posts
GROUP BY topic_id
) AS Maxp
ON Maxp.topic_id = t.topic_id
LEFT JOIN my_posts AS p
ON p.post_id = Maxp.post_id
LEFT JOIN admin_user AS au
ON au.admin_user_id = p.admin_user_id
LEFT JOIN forum_user AS fu
ON fu.user_id = p.user_id
LEFT JOIN customer_entity_varchar AS ce1
ON ce1.entity_id = p.user_id
AND ce1.attribute_id = 1
LEFT JOIN customer_entity_varchar AS ce2
ON ce2.entity_id = p.user_id
AND ce2.attribute_id = 2
LEFT JOIN
( SELECT forum_id, COUNT(*) AS topicscount
FROM my_topics
GROUP BY forum_id
) AS t2
ON t2.forum_id = f.forum_id
WHERE forum_status = '1';
I am not familiar with your schema so the above may need some tweaking, but the principal remains - use JOINs over sub-selects.
The next stage of optimisation I would do is to get rid of your customer_entity_varchar table, or at least stop using it to store things as basic as first name and last name. The Entity-Attribute-Value model is an SQL antipattern, if you added two columns, FirstName and LastName to your forum_user table you would immediately lose two joins from your query. I won't get too involved in the EAV vs Relational debate as this has been extensively discussed a number of times, and I have nothing more to add.
The final stage would be to add appropriate indexes, you are in the best decision to decide what is appropriate, I'd suggest you probably want indexes on at least the foreign keys in each table, possibly more.
EDIT
To get one row per forum_id you would need to use the following:
SELECT f.forum_id,
t.topic_id,
p.post_id AS `maxpostid`,
p.admin_user_id,
p.user_id,
MaxT.topicscount,
maxp.postcount,
CONCAT(au.firstname,' ',au.lastname) AS adminname,
fu.nick_name AS nickname
CONCAT(ce1.value,' ',ce2.value) AS fullname
FROM my_forums AS f
LEFT JOIN
( SELECT t.forum_id,
COUNT(DISTINCT t.topic_id) AS topicscount,
COUNT(*) AS postCount,
MAX(t.topic_ID) AS topic_id
FROM my_topics AS t
INNER JOIN my_posts AS p
ON p.topic_id = p.topic_id
GROUP BY t.forum_id
) AS MaxT
ON MaxT.forum_id = f.forum_id
LEFT JOIN my_topics AS t
ON t.topic_ID = Maxt.topic_ID
LEFT JOIN
( SELECT topic_id, MAX(post_id) AS post_id
FROM my_posts
GROUP BY topic_id
) AS Maxp
ON Maxp.topic_id = t.topic_id
LEFT JOIN my_posts AS p
ON p.post_id = Maxp.post_id
LEFT JOIN admin_user AS au
ON au.admin_user_id = p.admin_user_id
LEFT JOIN forum_user AS fu
ON fu.user_id = p.user_id
LEFT JOIN customer_entity_varchar AS ce1
ON ce1.entity_id = p.user_id
AND ce1.attribute_id = 1
LEFT JOIN customer_entity_varchar AS ce2
ON ce2.entity_id = p.user_id
AND ce2.attribute_id = 2
WHERE forum_status = '1';

SQL query to check if value doesn't exist in another table

I have a SQL query which does most of what I need it to do but I'm running into a problem.
There are 3 tables in total. entries, entry_meta and votes.
I need to get an entire row from entries when competition_id = 420 in the entry_meta table and the ID either doesn't exist in votes or it does exist but the user_id column value isn't 1.
Here's the query I'm using:
SELECT entries.* FROM entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
WHERE 1=1
AND ( ( entry_meta.meta_key = 'competition_id' AND CAST(entry_meta.meta_value AS CHAR) = '420') )
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
LIMIT 0, 25;
The votes table has 4 columns. vote_id, entry_id, user_id, value.
One option I was thinking of was to SELECT entry_id FROM votes WHERE user_id = 1 and include it in an AND clause in my query. Is this acceptable/efficient?
E.g.
AND entries.ID NOT IN (SELECT entry_id FROM votes WHERE user_id = 1)
A left join with an appropriate where clause might be useful:
SELECT
entries.*
FROM
entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
LEFT JOIN votes ON entries.ID = votes.entry_id
WHERE 1=1
AND (
entry_meta.meta_key = 'competition_id'
AND CAST(entry_meta.meta_value AS CHAR) = '420')
AND votes.entry_id IS NULL -- This will remove any entry with votes
)
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
Here's an implementation of Andrew's suggestion to use exists / not exists.
select
e.*
from
entries e
join entry_meta em on e.ID = em.entry_id
where
em.meta_key = 'competition_id'
and cast(em.meta_value as char) = '420'
and (
not exists (
select 1
from votes v
where
v.entry_id = e.ID
)
or exists (
select 1
from votes v
where
v.entry_id = e.ID
and v.user_id != 1
)
)
group by e.ID
order by e.submission_date desc
limit 0, 25;
Note: it's generally not a good idea to put a function inside a where clause (due to performance reasons), but since you're also joining on IDs you should be OK.
Also, The left join suggestion by Barranka may cause the query to return more rows than your are expecting (assuming that there is a 1:many relationship between entries and votes).

mysql: multiple join problem

Im trying to select a table with multiple joins, one for the number of comments using COUNT and one to select the total vote value using SUM, the problem is that the two joins affect each other, instead of showing:
3 votes 2 comments
I get 3 * 2 = 6 votes and 2 * 3 comments
This is the query I'm using:
SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
What you're doing is an SQL antipattern that I call Goldberg Machine. Why make the problem so much harder by forcing it to be done in a single SQL query?
Here is how I would really solve this problem:
SELECT t.*, COUNT(c.id) as comments
FROM topics t
LEFT JOIN comments c ON c.topic_id = t.id
WHERE t.id = 9;
SELECT t.*, SUM(v.vote) as votes
FROM topics t
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9;
As you have found, combining these two into one query results in a Cartesian product. There may be clever and subtle ways to force it to give you the correct answer in one query, but what happens when you need a third statistic? It's much simpler to do it in two queries.
SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
GROUP BY t.id
or perhaps
SELECT `topics`.*,
(
SELECT COUNT(*)
FROM `comments`
WHERE `topic_id` = `topics`.`id`
) AS `num_comments`,
(
SELECT IFNULL(SUM(`vote`), 0)
FROM `votes`
WHERE `topic_id` = `topics`.`id`
) AS `vote_total`
FROM `topics`
WHERE `id` = 9
SELECT t.*, COUNT(DISTINCT c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9