WHERE IN pulls all rows in a table - mysql

I'm trying to use WHERE IN in a query and it's going very slowly. When running an explain, it turns out it's actually pulling all the rows at first, then sorting by IN.
Here's the query.
SELECT a.*, b.*
FROM table_a a
INNER JOIN table_b b
ON b.name = a.name
WHERE a.id IN (1,2,3,4,5);
In real life, there's 40-50 ids in the IN statement, but when I run an explain, it pulls hundreds of thousands of results at first.
What's an alternative I can use to this?

Related

MySQL performance comparison between joining table and derived table

I have this two queries following and noticed they have a huge performance difference
Query1
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN tableB as b on a.id = b.aId
GROUP BY a.id
Query2
SELECT count(distinct b.id) FROM tableA as a
LEFT JOIN (SELECT * FROM tableB) as b on a.id = b.aId
GROUP BY a.id
the queries are basically joining one table to another and I noticed that Query1 takes about 80ms whereas Query2 takes about 2sec with thousands of data in my system. Could anyone explain me why this happens ? and if it's a wise choice to use only Query2 style whenever I am forced to use it ? or is there a better way to do the same thing but better than Query2 ?
When you replace tableB with (SELECT * FROM tableB) you are forcing the query engine to materialize a subquery, or intermediate table result. In other words, in the second query, you aren't actually joining directly to tableB, you are joining to some intermediate table. As a result of this, any indices which might have existed on tableB to make the query faster would not be available. Based on your current example, I see no reason to use the second version.
Under certain conditions you might be forced to use the second version though. For example, if you needed to transform tableB in some way, you might need a subquery to do that.

Can't understand. Is this a subquery?

I have something in a query that I have to edit, that I don't understand.
There are 4 tables that are joined: tickets, tasks, tickets_users, users. The whole query is not important, but you have an example at the end of the post. What bugs me is this kind of code used many times in relation to other tables:
(SELECT name
FROM users
WHERE users.id=tickets_users.users_id
) AS RequesterName,
Is this a subquery with the tables users and tickets_users joined? What is this?
WHERE users.id=tickets_users.users_id
If this was a join I would have expected to see:
ON users.id = tickets_users.users_id
And how is this different from a typical join? Just use the same column definition: users.name and just join with the users table.
Can anyone enlighten me on the advanced SQL querying prowess of the original author?
The query looks like this:
SELECT
description,
(SELECT name
FROM users
WHERE users.id = tickets_users.users_id) AS RequesterName,
(SELECT description
FROM tickets
WHERE tickets.id = ticket_tasks.tickets_id) AS TicketDescription,
ticket_tasks.content AS TaskDescription
FROM
ticket_tasks
RIGHT JOIN
tickets ON ticket_tasks.tickets_id = tickets.id
INNER JOIN
tickets_users ON tickets_users.tickets_id = tickettasks.tickets_id
Thanks,
This is what is called a correlated subquery. To describe it in simple terms its doing a select inside a select.
However doing this more than once in ANY query is not recommended AT ALL.. the performance issue with this will be huge.
A correlated subquery will return a row by row comparison for each row of the select... if that doesnt make sense then think of it this way...
SELECT
id,
(SELECT id FROM tableA AS ta WHERE ta.id > t.id)
FROM
tableB AS t;
This will do for each row in tableB, every row in tableA will be selected and compared to tableB id.
NOTE:
If you have 100 rows in all 4 tables and you do a correlated subquery for each one then you are doing 100*100*100*100 row comparisons. thats 100,000,000 (one hundred million) comparisons!
A correlated subquery is NOT a join, but rather a subquery..
SELECT *
FROM
(SELECT id FROM t -- this is a subquery
) AS temp
However, JOINs are different... generally you can do it one of these two ways
This is the faster way
SELECT *
FROM t
JOIN t1 ON t1.id = t.id
This is the slower way
SELECT *
FROM t, t1
WHERE t1.id = t.id
what the second join is doing is making the Cartesian Product of the two tables and then filtering out the extra stuff in the WHERE clause as opposed to the first JOIN that filters as it joins.
For the different types of joins theres a few and all are useful in their prospective actions..
INNER JOIN (same as JOIN)
LEFT JOIN
RIGHT JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
In mysql FULL JOIN or FULL OUTER JOIN does not exist.. so in order to do a FULL join you need to combine a LEFT and RIGHT join. See this link for a better understanding of what joins do with Venn diagrams LINK
REMEMBER this is for SQL so it includes the FULL joins as well. those don't work in MySQL.

What happend first in mysql: join or where

Let's say I have two tables A and B and the following query:
select *
from A
inner join B on A.id = B.id
Where A.id = 5
Does mysql first performs the join or the where?
Edit:
Cause if for example A contains 1000 rows, after the where condition it'll contain only 1 row.
Performing join on a 1 row table is much more efficient so it seems like performing the where first and only then the join is more efficient.
The join happens before the where, however...
The where clause is a filter for all rows returned by the join, but the optimizer will recognise that if an index exists on A.id, it will be used to retrieve rows from A that match, then the join will happen, then theoretically the where clause will filter the results, but again the optimizer will recognise that the condition will already be met so it will skip it as a filter.
All that said, the optimizer will always return the same result as would be returned without the optimizer.

With ActiveRecord, how do I get all records from database with only one associated record?

With ActiveRecord, how do I get all records from database with only one associated record?
For example: I have two tables, table_a has many related records in table_b.
I want to make an ActiveRecord query that will fetch me only the table_a query objects that have *only one associated records in table_b*.
How would I go about doing this?
Thanks!
Well I'm not aware of any nice and easy way to do this in rails, but I have got together a raw sql query that you can fire using ActiveRecord::Base.connection.execute. Let me tell you that it's probably the most ugly query I have ever built, so sorry about that :-)
SELECT ta.* FROM table_a AS ta INNER JOIN table_b AS tb ON ta.id = tb.table_a_id WHERE
tb.id IN ( SELECT temp.id FROM (
SELECT id, count(*) AS count FROM table_b GROUP BY table_a_id
) AS temp WHERE temp.count=1 ) GROUP BY ta.id;

MySQL query calculated column

I have these queries:
SELECT a.Id, a.Name, (SELECT COUNT(Id) FROM b WHERE b.IdTableA = a.Id) AS Num
FROM a
ORDER BY a.Name
table b has a FK on table a (IdTableA)
In this case, is it efficient? Is there any other way to do this?
The other question is:
SELECT client.Id, client.Name
,(SELECT SUM(projects) FROM projects WHERE IdClient = client.Id) AS projects
FROM client
What about this one?
Sometimes we need to use more than one calculated column (SELECT SUM), even 10 or 15.
We very are worried about performance since the table projects could have more than 500K records.
I've read that storing those SUMS in a table and update that table when the data changes could be better for performance. But this goes against normalization...
Please help me with both queries...
Thanks
SELECT a.Id, a.Name, (SELECT COUNT(Id) FROM b WHERE b.IdTableA = a.Id) AS Num
FROM a
ORDER BY a.Name
can possible be rewritten as
SELECT a.Id, a.Name, COUNT(b.Id) AS Num
FROM a JOIN b ON b.IdTableA = a.Id
GROUP BY a.Id, a.Name
ORDER BY a.Name
which carries less risk of being wrongly executed by MySQL.
Storing sums of data for easy retrieval is acceptable when you have a lot more reads than writes (or when writes are allowed to be slow, but reads have to be fast). Usually you use a data-warehouse for this though: the warehouse stores the aggregate data, and your OLTP database stores the individual rows.