mysql multiple joins with conditions on multiple table query performance - mysql

suppose tables:
main (1M rows):
id, a_id, b_id, c_id, d_id, amounts, year, main_col
a(500 rows):
id, a_col1, a_col2
b(2000 rows):
id, b_col1, b_col2, b_col3
c(500 rows):
id, c_col1, c_col2
d(1000 rows):
id, d_col1, d_col2
And we have query like:
select sum(amounts), main_col
join a on a.id = main.a_id
join b on b.id = main.b_id
join c on c.id = main.c_id
join d on d.id = main.d_id
where a.a_col1 in (...)
and a.a_col2 in (..)
and b.b_col1 in (...)
and b.b_col2 in (...)
and b.b_col3 in (..)
and c.c_col1 in (..)
and c.c_col2 in (..)
and d.d_col1 in (..)
and d.d_col2 in (..)
and year = 2011
group by main.main_col
any idea who to create index on the main table to improve the query performance ?
thanks
Update:
indexes are added to a,b,c,d tables for the columns show up in where
i've tried multiple column indexes on main table (a_id, b_id, c_id, d_id, main_col) which have the best performance than others like add individual indexes on the a_id, b_id... , but it still not fast enough for requirement, on query will take 7 seconds to run for the maximum situation

CREATE INDEX id_main ON main(id)
You could create multiple indexes on additional columns and see how that works for you.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

It really depends on specifics like how selective each join is, how big each table is, etc. In general it might be wise to add one or more indices on the foreign keys of your main table, but who could say from that limited info?
It might also help to get an idea of how your query is executing-- try looking at MySQL's "explain" or "explain extended".

Since the primary keys are already indexed by default, the join cannot be optimized by adding more indexes. So that leaves your where statements. So it could help adding indexes to the x.x_colN columns.

You should also have an index on your group_by I think

Related

how to avoid full table scan when joining on one foreign key

how to avoid full table scan when joining on one foreign in key below is my sql query when i use explain select it show the query is scanning all the table even with a where clause
SELECT message_recipients.id, message_recipients.user_type,
COALESCE(guardians.firstname, students.firstname)
FROM message_recipients
LEFT JOIN students ON message_recipients.user_id = students.student_id
LEFT JOIN guardians ON message_recipients.user_id = guardians.guardian_id
WHERE message_recipients.message_id = 2
Also i added index on the message_id column still the same here below is the image of the explain select may be am reading it wrong
the total rows in the table is 8 but the message_id = 2 is just 6 rows and if you check the image you can see its scanning all the 8 rows which is not suppose to be the big question is how do i optimize this to avoid full table scan thanks
One way to avoid a full scan is to use clustered queries with a "with" clause like this:
WITH TB_A AS (
SELEC
A.*
FROM
some_table A
WHERE
A.field_1 = 'some condition to filter first table'
)
, TB_B AS (
SELECT
B.*
FROM
other_table B
WHERE
B.field_2 = 'some condition to filter second table if you need that too'
)
SELECT
A.*
, B.*
FROM
TB_A A
INNER JOIN
TB_B B
ON A.field_1 = B.field_1 --Running without full scan on join and much faster
;
;D
A FOREIGN KEY creates an INDEX, but that INDEX may not be optimal. See if these 'composit' indexes are better:
message_recipients: INDEX(message_id, user_id, id, user_type)
guardians: INDEX(guardian_id, firstname)
students: INDEX(student_id, firstname)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.

What is better way to join in mysql?

I wanted to join 3 or more tables
table1 - 1 thousand record
table2 - 100 thousands record
table3 - 10 millions record
Which of the following is best(speed wise performance):-
Note: pk and fk are primary and foreign key for respective tables and FILTER_CONDITION1 and FILTER_CONDITION2 are respective restricting records query normally found in where
Case 1 :taking smaller tables first and joining larger one later
Select table1.*,table2.*,table3.*
from table1
join table2
on table1.fk = table2.pk and FILTER_CONDITION1
join table3
on table2.fk = table3.pk and FILTER_CONDITION2
Case 2
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk and FILTER_CONDITION2
join table1
on table1.fk = table2.pk and FILTER_CONDITION1
Case 3
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk
join table1
on table1.fk = table2.pk
where FILTER_CONDITION1 and FILTER_CONDITION2
The cases you show are equivalent. What you are describing is in the end the same query and will be seen by the database as such: the database will make a query plan.
The best thing you can do is use EXPLAIN and check out what your query actually does: this way you can see they will probably be run the same, AND if there might be a bottle neck in there.
As #Nanne updated in his answer that normally mysql do it its own (right ordering) but some time (rare case) mysql can read table join in wrong order and can kill query performance in this case you can follow below approach-
If you can filter data from your bulky tables like table2 and table3 (suppose you can get only 500 records after joining these tables and applying filter) then first you filter your data and then you can join that filtered data with your small table..in this way you can get performance but there can be various combinations, so you have to check by which join you can do more filteration..yes explain will help you to know it and index will help you to get filtered data.
After above approach you can say mysql to use ordering as you have in your query by syntax "SELECT STRAIGHT_JOIN....." same as some time mysql does not use proper index and we have to use force index

Query is slow, which fields to index

I have this query:
select * from
When I execute it, it takes ~45sec with 35k records. Every day I add 5k+ new records to gps_unit_location table. So table will grow.
My current indexes on all id's. Will adding any additional indexes would help me to improve the performance of this query?
thanks.
So,
be sure you have NOT NULL columns and indices on:
INDEX ON gps_unit_location.idgps_unit_location
INDEX ON user.iduser
INDEX ON user_to_gps_unit.iduser
INDEX ON user_to_gps_unit.idgps_unit
INDEX ON gps_unit.idgps_unit
INDEX ON gps_unit_location.idgps_unit
be sure you really need to select all the fields with that star *
try this query:
SELECT
`gps_unit`.`idgps_unit`,
`gps_unit`.`name as name`,
`gps_unit`.`notes as notes`,
`gps_unit`.`serial`,
`gps_unit_location`.`dt` as dt,
`gps_unit_location`.`idgps_unit_location`,
`gps_unit_location`.`lat`,
`gps_unit_location`.`long`,
`ip`,
`unique_id`,
`loc_age`,
`reason_code`,
`speed_kmh`,
`VehHdg`,
`Odometer`,
`event_time_gmt_unix`,
`switches`,
`engine_on_off`
FROM user
INNER JOIN user_to_gps_unit ON user.iduser = user_to_gps_unit.iduser
INNER JOIN gps_unit ON user_to_gps_unit.idgps_unit = gps_unit.idgps_unit
INNER JOIN gps_unit_location ON gps_unit.idgps_unit = gps_unit_location.idgps_unit
INNER JOIN
(SELECT
`gps_unit_location`.`idgps_unit`,
MAX(`gps_unit_location`.`dt`) dtmax
FROM `gps_unit_location`
GROUP BY 1
) r1 ON r1.idgps_unit = gps_unit_location.idgps_unit AND r1.dtmax = gps_unit_location.dt
WHERE
user.iduser = 14
On a side note, I think you don't need the unique indexes on the columns that are defined as primary keys, this causes write overhead on insert/update statements.
The generic answer is to index those columns that are used to join and constrain (ON and WHERE clauses). Use composite indexes (joins first, then constraints next with the lowest cardinality constraints first).
Oh, and make all your IDs 'unsigned'.

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

MySQL column definition

Is there a way, using MySQL 5.0, to define a column in a table that is to be calculated whenever a select is executed on that particular row? For example say I have two tables, A and B:
A:
ID
COMPUTED_FIELD = SUM(SELECT B.SOME_VALUE FROM B WHERE B.A_ID = ID)
B:
ID
A_ID
SOME_VALUE
In the example, when I run a select on A for a particular ID, I want it to return the sum of all values in Table B for that particular value of A.ID. I know how to do this using multiple separate queries and doing a group by A_ID, but I'm trying to streamline the process a little.
Yes. You cannot do that inside a table, but you can do it in a view.
CREATE VIEW A
AS
SELECT SUM(B.SOME_VALUE) AS COMPUTED_FIELD
FROM B
WHERE B.A_ID = 'id';
Obviously id needs to be whatever you are searching for.
You don't need table A in this case.
Tables cannot contain calculated values. Try using views. The manual entry has full details. Your end result will come out looking something like: CREATE VIEW A AS SELECT SUM('SOME_VALUE'), B.A_ID FROM B; (Not tested)
That said, I'm not sure how important it is for you to have an independent, unique ID for table A -- this isn't possible unless you add another table C to hold the foreign keys referenced by B.A_ID, and use table C as a reference in creating your view.
As txwikinger suggests, the best way to do this is set up A as a view, not a table. Views are, for all intents and purposes, a streamlined, reusable query. They're generally used when a)a common query has a computed column, or b)to abstract away complex joins that are often used.
To expand on the previous answer, in order to be able to query A for any ID, try this view:
CREATE VIEW A
AS
SELECT B.A_ID AS ID, SUM(B.SOME_VALUE) AS COMPUTED_FIELD
FROM B
GROUP BY B.A_ID;
Then you can select into it normally, for example:
SELECT A.ID, A.COMPUTED_FIELD
FROM A
WHERE A.ID IN (10, 30);
or
SELECT A.ID, A.COMPUTED_FIELD
FROM A
WHERE COMPUTED_FIELD < 5;