Improve simple SQL select performance - mysql

I have two tables with some data ( > 300_000 rows) and this simple query is taking ~1 seconds.
Any idea to make it faster?
SELECT a.*
FROM a
INNER JOIN b on (a.b_id = b.id)
WHERE b.some_int_column = 2
ORDER BY a.id DESC
LIMIT 0,10
Both, a.b_id and b.some_int_column have indexes. Also, a.id and i.id are integer primary keys.
When I try a explain, it says first it is using some_int_column index, with temporary and filesort.
If I do this same query, but ordering by b.id ASC it takes ~0.2 ms instead (I know this is because in such case I'm ordering by first explain row), but I really need to order by a table.
Is there something I am missing?

For this query:
SELECT a.*
FROM a INNER JOIN
b
ON a.b_id = b.id
WHERE b.some_int_column = 2
ORDER BY a.id DESC
LIMIT 0, 10;
The optimal indexes are likely to be b(some_int_column, id), and a, b_id, id).
You might find that this version has better performance with these indexes:
SELECT a.*
FROM a
WHERE EXISTS (SELECT 1
FROM b
WHERE a.b_id = b.id AND b.some_int_column = 2
)
ORDER BY a.id DESC
LIMIT 0, 10;
For this query, the indexes should be a(id, b_id) and b(id, some_int_column).

SELECT a.*
FROM b
INNER JOIN a on (b.id = a.b_id)
WHERE b.some_int_column = 2
ORDER BY a.id DESC
LIMIT 0,10
Try this. Because your are filtering on a column in table B, not a column in Table A. This may reduce the volume of data read. Depending on the sql optimizer it may match up all records in the join and then filter out those =2. But reversing it, the optimizer may only match up records in table b to a that are = 2 in your where clause.

Related

How to optimize limit offset when I join multiple tables?

Here's the format of mysql code
select a,b,c
from table1
left join table2 on x=y
left join table3 on m=n
limit 100000, 10
I know know to optimize limit when I have a large offset. But I couldn't find the solution to optimize the one with multiple tables, is there any way to make my query faster?
First of all, offsets and limits are unpredictable unless you include ORDER BY clauses in your query. Without ORDER BY, your SQL server is allowed to return result rows in any order it chooses.
Second, Large offsets and small limits are a notorious query-performance antipattern. There's not much you can to do make the problem go away.
To get decent performance, it's helpful to rethink why you want to use this kind of access pattern, and then try to use WHERE filters on some indexed column value.
For example, let's say you're doing this kind of thing.
select a.user_id, b.user_email, c.user_account
from table1 a
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
limit whatever
Let's say you're paginating the query so you get fifty users at a time. Then you can start with a last_seen_user_id variable in your program, initialized to -1.
Your query looks like this:
select a.user_id, b.user_email, c.user_account
from (
select user_id
from table1
where user_id > ?last_seen_user_id?
order by user_id
limit 50
) u
join table1 a on u.user_id = a.user_id
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
order by a.user_id
Then, when you retrieve that result, set your last_seen_user_id to the value from the last row in the result.
Run the query again to get the next fifty users. If table1.user_id is a primary key or a unique index, this will be fast.

MySQL, limit one table in join

I have table A and B. A has one column a_id. B has two columns b_id and a_id (a_id is foreign key here). A-B is 1-n relationship. Want to SELECT a_id of A with LIMIT, at the same time return all b_id that associated with those selected a_id. Without LIMIT it can be done by
SELECT A.a_id, B.b_id FROM A LEFT JOIN B ON A.a_id = B.a_id;
But how can I LIMIT only A without LIMIT the final result.
How about
SELECT * FROM
(SELECT A.a_id FROM A LIMIT 10) AS ALIMIT
LEFT JOIN B ON ALIMIT.a_id = B.a_id;

how to change this query to use join?

SELECT breakgame, Streak,
((SELECT (maxGameId - gameId) as gameGap
FROM game_result
WHERE game_result.breakgame >= kokopam.game_streak.breakgame
ORDER BY gameId DESC LIMIT 1)/ Streak) as nowWeight
FROM kokopam.game_streak, (SELECT max(gameId) as maxGameId FROM game_result ORDER BY gameId DESC LIMIT 1) maxGameId
WHERE breakgame>= 2
how to change this query to use join?
please help me
In first place, you should've a condition in the "where" clause that states the ID that the rows share.
Anyway, the method you're using works the same as inner join.
Select *
From tableA a, tableB b
Where a.id=b.id
Is the same as
Select *
From tableA a
Inner join tableB b on b.id=a.id
I could help you a bit more if you specify what you were trying do in the query and the columns that the tables have.

Speeding up select where column condition exists in another table without duplicates

If I have the following two tables:
Table "a" with 2 columns: id (int) [Primary Index], column1 [Indexed]
Table "b" with 3 columns: id_table_a (int),condition1 (int),condition2 (int) [all columns as Primary Index]
I can run the following query to select rows from Table a where Table b condition1 is 1
SELECT a.id FROM a WHERE EXISTS (SELECT 1 FROM b WHERE b.id_table_a=a.id && condition1=1 LIMIT 1) ORDER BY a.column1 LIMIT 50
With a couple hundred million rows in both tables this query is very slow. If I do:
SELECT a.id FROM a INNER JOIN b ON a.id=b.id_table_a && b.condition1=1 ORDER BY a.column1 LIMIT 50
It is pretty much instant but if there are multiple matching rows in table b that match id_table_a then duplicates are returned. If I do a SELECT DISTINCT or GROUP BY a.id to remove duplicates the query becomes extremely slow.
Here is an SQLFiddle showing the example queries: http://sqlfiddle.com/#!9/35eb9e/10
Is there a way to make a join without duplicates fast in this case?
*Edited to show that INNER instead of LEFT join didn't make much of a difference
*Edited to show moving condition to join did not make much of a difference
*Edited to add LIMIT
*Edited to add ORDER BY
You can try with inner join and distinct
SELECT distinct a.id
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
but using distinct on select * be sure you don't distinct id that return wrong result in this case use
SELECT distinct col1, col2, col3 ....
FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
You could also add a composite index with use also condtition1 eg: key(id, condition1)
if you can you could also perform a
ANALYZE TABLE table_name;
on both the table ..
and another technique is try to reverting the lead table
SELECT distinct a.id
FROM b INNER JOIN a ON a.id=b.id_table_a AND b.condition1=1
Using the most selective table for lead the query
Using this seem different the use of index http://sqlfiddle.com/#!9/35eb9e/15 (the last add a using where)
# USING DISTINCT TO REMOVE DUPLICATES without col and order
EXPLAIN
SELECT DISTINCT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
;
It looks like I found the answer.
SELECT a.id FROM a
INNER JOIN b ON
b.id_table_a=a.id &&
b.condition1=1 &&
b.condition2=(select b.condition2 from b WHERE b.id_table_a=a.id && b.condition1=1 LIMIT 1)
ORDER BY a.column1
LIMIT 5;
I don't know if there is a flaw in this or not, please let me know if so. If anyone has a way to compress this somehow I will gladly accept your answer.
SELECT id FROM a INNER JOIN b ON a.id=b.id_table_a AND b.condition1=1
Take the condition into the ON clause of the join, that way the index of table b can get used to filter. Also use INNER JOIN over LEFT JOIN
Then you should have less results which have to be grouped.
Wrap the fast version in a query that handles de-duping and limit:
SELECT DISTINCT * FROM (
SELECT a.id
FROM a
JOIN b ON a.id = b.id_table_a && b.condition1 = 1
) x
ORDER BY column1
LIMIT 50
We know the inner query is fast. The de-duping and ordering has to happen somewhere. This way it happens on the smallest rowset possible.
See SQLFiddle.
Option 2:
Try the following:
Create indexes as follows:
create index a_id_column1 on a(id, column1)
create index b_id_table_a_condition1 on b(a_table_a, condition1)
These are covering indexes - ones that contain all the columns you need for the query, which in turn means that index-only access to data can achieve the result.
Then try this:
SELECT * FROM (
SELECT a.id, MIN(a.column1) column1
FROM a
JOIN b ON a.id = b.id_table_a
AND b.condition1 = 1
GROUP BY a.id) x
ORDER BY column1
LIMIT 50
Use your fast query in a subselect and remove the duplicates in the outer select:
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
WHERE b.id_table_a > :offset
ORDER BY a.column1
LIMIT 50
) sub
Because of removing duplicates you might get less than 50 rows. Just repeat the query until you get anough rows. Start with :offset = 0. Use the last ID from last result as :offset in the following queries.
If you know your statistics, you can also use two limits. The limit in the inner query should be high enough to return 50 distinct rows with a probability which is high enough for you.
SELECT DISTINCT sub.id
FROM (
SELECT a.id
FROM a
INNER JOIN b ON a.id=b.id_table_a && b.condition1=1
ORDER BY a.column1
LIMIT 1000
) sub
LIMIT 50
For example: If you have an average of 10 duplicates per ID, LIMIT 1000 in the inner query will return an average of 100 distinct rows. Its very unlikely that you get less than 50 rows.
If the condition2 column is a boolean, you know that you can have a maximum of two duplicates. In this case LIMIT 100 in the inner query would be enough.

Limiting rows from Joined Subquery

If I run an inline select subquery, I can filter the rows to line up with the root query. E.G.
A.Field1, A.Field2, (SELECT B.Field1 FROM tblB as B WHERE B.AID = A.ID ORDER BY B.DateAdded LIMIT 1)
FROM tblA as A
But If I try to move that select subquery to a joined subquery I can't use the where criteria (WHERE B.AID = A.ID) and there's no way to limit tblB fields to only matching tblA's row.
So what is the correct way to modify the query so I can select B.Field1, B.Field2, etc. when dealing with a 1:M?
sqlfiddle