I'm happily taking any advice on this - be it rewriting the query, or setting up the tables differently.
What I have basically is three tables - a product table, a location table, and a condition table. The location tables store all the information about location throughout time, same with condition. The trick of this massive query is to pluck out the products with only their latest conditions and locations.
I took the general idea from this question: MySQL MIN/MAX returning proper value, but not the related record info
Is the answer just to store the current location and condition in the main product table, and keep these history tables, but not use them to search by? I like the idea of keeping them separate, but of course this query takes 50 seconds to run, which not practical at all.
SELECT
'$table' AS tablename,
$table.id,
product_name,
$table.status,
CL.event AS last_event,
CONCAT_WS(' ', CL.location, CL.floor, CL.bin, CL.bay) AS current_loc,
CC.status AS current_cond
FROM $table
LEFT OUTER JOIN
(SELECT DISTINCT
C.work_type,
C.work_id,
C.status,
C.inspected_timestamp
FROM
(SELECT
CONCAT(work_type, work_id) AS condition_id,
status,
MAX(inspected_timestamp) as current
FROM conditions
GROUP BY condition_id
) XC
JOIN conditions C
on CONCAT(C.work_type, C.work_id) = XC.condition_id
and C.inspected_timestamp = XC.current
) CC ON
$table.id = CC.work_id AND
CC.work_type = '$table'
LEFT OUTER JOIN
(SELECT DISTINCT
L.work_type,
L.work_id,
L.event,
L.location,
L.floor,
L.bin,
L.bay,
L.timestamp
FROM
(SELECT
CONCAT(work_type, work_id) AS location_id,
location,
MAX(timestamp) as current
FROM locations
GROUP BY location_id
) XL
JOIN locations L
on CONCAT(L.work_type, L.work_id) = XL.location_id
and L.timestamp = XL.current
) CL ON
$table.id = CL.work_id AND
CL.work_type = '$table'
HAVING last_event = 'Received'
I am adding here the results of EXTENDED EXPLAIN.
[0] => Array (
[id] => 1
[select_type] => PRIMARY
[table] => paintings
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 1159
[filtered] => 100.00
[Extra] => )
[1] => Array (
[id] => 1
[select_type] => PRIMARY
[table] =>
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 3211
[filtered] => 100.00
[Extra] => )
[2] => Array (
[id] => 1
[select_type] => PRIMARY
[table] =>
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 1870
[filtered] => 100.00
[Extra] => )
[3] => Array (
[id] => 4
[select_type] => DERIVED
[table] =>
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 1868
[filtered] => 100.00
[Extra] => Using temporary )
[4] => Array (
[id] => 4
[select_type] => DERIVED
[table] => L
[type] => ref
[possible_keys] => timestamp
[key] => timestamp
[key_len] => 8
[ref] => XL.current
[rows] => 5
[filtered] => 100.00
[Extra] => Using where )
[5] => Array (
[id] => 5
[select_type] => DERIVED
[table] => locations
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 3913
[filtered] => 100.00
[Extra] => Using temporary; Using filesort )
[6] => Array (
[id] => 2
[select_type] => DERIVED
[table] =>
[type] => ALL
[possible_keys] =>
[key] =>
[key_len] =>
[ref] =>
[rows] => 3191
[filtered] => 100.00
[Extra] => Using temporary )
[7] => Array (
[id] => 2
[select_type] => DERIVED
[table] => C
[type] => ref
[possible_keys] => inspected_timestamp
[key] => inspected_timestamp
[key_len] => 8
[ref] => XC.current
[rows] => 45
[filtered] => 100.00
[Extra] => Using where )
[8] => Array (
[id] => 3
[select_type] => DERIVED
[table] => conditions
[type] => index
[possible_keys] =>
[key] => work_type_2
[key_len] => 316
[ref] =>
[rows] => 3986
[filtered] => 100.00
[Extra] => Using index; Using temporary; Using filesort )
There are a few things you can do:
EXPLAIN PLAN on the query. See if there's a TABLE SCAN in there somewhere. That's the killer.
See if rearranging the query makes a difference in the EXPLAIN PLAN results. Filtering more records out early will decrease the time needed.
Check to make sure that columns in every WHERE clause have an index.
The more records involved, the longer the query. How much history are you retaining? How many rows are you talking about? You should have a policy that removes records older than your retention cutoff and puts them in a history or reporting schema.
Can you take advantage of triggers or views to pre-calculate any of these values?
I'm putting this in an answer purely due to the limit on comment length.
I looked at your query quite a while, and I think it is largely the nature of it, and the way it's been written that is causing the query to take so much time, but I don't see anything that seems obviously wrong either.
In the places where you are doing group bys in order to get a summary row, and then self joining those queries back, while I don't fully understand the design of your tables or the data, that is going to be costly, as the explain shows. So it is table scanning. You are also right that making temp tables and sorting those is even more costly.
So having those values pre-summarized and accessible in a summary table would help quite a bit if this is something where the time taken is simply unacceptable. When you look at the explain, please note the rows counts, as that should give you a good idea of whether or not what the query is doing is reasonable.
Also the having clause at the end by definition is not going to be optimized. If there is a way to move that to a where clause or as criteria in one of the joins, then you have a chance to improve the query plan significantly, but considering the cost of the summaries it will still take some time.
The only thing I can advise at this point is to break it down into small pieces and see if you can optimize the individual components and then reassemble.
As #gview explained, there are numerous things that are helping this query to being brutally slow. Besides all those mentioned in his answer, there is also the use of CONCAT() function in two tables where the results are later used to JOIN these two derived tables.
If you just want to show the rows of table product with only the latest related row in locationand latest related row in condition, you can use something like the following (this has only the logic for the latest condition, you'll need another similar LEFT JOIN for the latest location):
SELECT
t.id,
t.product_name,
t.status,
cc.status AS current_cond
FROM
$table AS t
LEFT OUTER JOIN
( SELECT c.*
FROM
conditions AS c
JOIN
( SELECT
work_id,
MAX(inspected_timestamp) as current_ts
FROM conditions mc
WHERE work_type = '$table'
GROUP BY condition_id
) AS mc
ON mc.work_id = c.work_id
AND mc.current_ts = c.inspected_timestamp
WHERE c.work_type = '$table'
) AS cc
ON cc.work_id = t.id
Related
I'm pretty out of practice with MySQL and PHP, but I have a project I'm working on for a friend that involves selecting data from two tables, and combining them into one result - seems simple.
Table 1 has 13 fields, but the important ones are id (auto-increment, primary key) and serial (unique). The rest are just ones like customer, description, etc. etc.
Pictures has 3 fields, picID (auto-increment, primary key), imagePath and serial
I need to retrieve all data from Table 1, and if there is a matching photo (identified by the same serial - only ever 1 photo possible per serial) in Pictures, then retrieve that data too. I then output the data from Table1, and use imagePath from Pictures to build an image in HTML if one has been uploaded.
The query I've been using is:
$sql = "SELECT * FROM Table1 LEFT JOIN Pictures ON Table1.serial = Pictures.serial ORDER BY Table1.serial";
Which seems perfect, EXCEPT if any row from Table1 does not have a photo match in Pictures, the serial is no longer returned with the rest of the data, although the remainder of the row is all correct.
I have looked into the different types of JOIN, and whether it's just UNION that I need, but I am a bit stumped. How should I query to get each row of Table1 plus Pictures.imagePath added on to the matching Table1 row, if it exists?
Thank you for your time!!!! :)
EDIT with dumped array output
Array
(
[0] => Array
(
[id] => 51
[0] => 51
[client] => Test Client
[1] => Test Client
[location] => Ayreford House
[2] => Ayreford House
[description] => Ceiling cavity of building XYZ
[3] => Ceiling cavity of building XYZ
[serial] =>
[4] => 18001
[blah] => 123456
[5] => 123456
[fw] => Wall
[6] => Wall
[pm] => Plasterboard
[7] => Plasterboard
[stuff] => ventilation ducting
[8] => ventilation ducting
[ref] => S1000-2018
[9] => S1000-2018
[otheref] => XTX-1325
[10] => XTX-1325
[notes] => Updated photo
[11] => Updated photo
[date] => 2018-06-28 21:37:49
[12] => 2018-06-28 21:37:49
[picID] =>
[13] =>
[imagePath] =>
[14] =>
[15] =>
)
It's doing that because both Table1 and Pictures have a column called serial and it drops the table names when it is generating the array keys. Probably its doing something like this internally:
$result = array()
$result[0] = Table1.serial;
$result['serial'] = Table1.serial;
$result[1] = Table1.client;
$result['client'] = Table1.client;
....
$result[14] = Pictures.serial;
$result['serial'] = Pictures.serial;
So you end up with only Picture.serial as the value for the key 'serial' in the resulting array.
One way to fix this would be to specify your columns explicitly and don't include Pictures.serial, like this:
SELECT
Table1.id,
Table1.client,
Table1.location,
Table1.description,
Table1.serial,
Pictures.notes
FROM
Table1
LEFT JOIN
Pictures ON Table1.serial = Pictures.serial
ORDER BY
Table1.serial
I have three tables:
Student - UPN, Name, Year, House
Seclusion_Status - ID, Arrived, FTE, Rebuild, DateTimeAdded, Staff, Student_UPN (fk), Comment
Period_Rating - ID, Slot_ID, Rating, Date, Seclusion_ID (fk)
Each student can have many entries in the Seclusion_Status table, and then there are also many entries in the Period_rating table, which is linked to the Seclusion_status table with the Seclusion_ID
I am running the following query to return a record from Seclusion_Status based on a date, and then all the records in the Period_rating table that relate to the Seclusion_status record.
$sql="SELECT * FROM Seclusion_Status
INNER JOIN Students ON Seclusion_Status.Student_UPN=Students.UPN
JOIN Period_Rating ON Seclusion_Status.ID=period_rating.Seclusion_ID
WHERE period_rating.Date = '$start'
GROUP BY period_rating.Seclusion_ID
ORDER BY Seclusion_Status.DateTimeAdded ASC";
$result=mysql_query($sql);
// Start looping rows in mysql database.
while($rows=mysql_fetch_array($result)){
The query is returning the Seclusion_Status record, and then the first record in Period_rating, but not the others.
Array
[0] => 348
[ID] => 157
[1] => Y
[Arrived] => Y
[2] => N
[FTE] => N
[3] =>
[Rebuild] =>
[4] =>
[Text] =>
[5] => 2016-03-04 09:30:50
[DateTimeAdded] => 2016-03-04 09:30:50
[6] => Mr S Holland
[Staff] => Mr S Holland
[7] => K80222800
[Student_UPN] => K8022280
[8] => Refusing instructions
[Incident] => Refusing instructions
[9] =>
[Period] =>
[10] =>
[Period_In_ID] =>
[11] => Not sitting properly in class despite being asked
[Comment] => Not sitting properly in class despite being asked
[12] => K80222800
[UPN] => K80222800
[13] => Student Name
[Name] => Student Name
[14] => Year 9
[Year] => Year 9
[15] => Acer
[House] => Acer
[16] => 157
[17] => P2
[Slot_ID] => P2
[18] =>
[Rating] =>
[19] => 2016-03-04
[Date] => 2016-03-04
[20] => 348
[Seclusion_ID] => 348
[21] => 1
[Status] => 1
The query is returning the Seclusion_Status record, and then the first record in Period_rating, but not the others.
You have a GROUP BY period_rating.Seclusion_ID that instructs mysql to return one record per Seclusion_ID. Take the group by clause out, and the query will return multiple records, that match.
The thing is if you use join, if data is there in all the three tables then only it will fetch.
If you want to fetch all the entries in the table you need to use outer join, unfortunately u cant use outer join in mysql.
but you can fetch results by unioning leftjoin results and right join results
I'm using CakePHP 2.5.2 and having a bit of trouble searching for data efficiently.
In my application I've 3 tables, teams, players, skills... In teams there are 80 records, players 2400 records, skills 2400 records... I want to calculate the average skill of a team...
//Team model
public $actsAs = array('Containable');
public $hasMany = array('Player');
//Player model
public $actsAs = array('Containable');
public $hasOne = array('Skill');
public $belongsTo = array('Team');
//Skill model
public $actsAs = array('Containable');
public $belongsTo = array('Player');
My research is:
$team = $this->Team->find('all', array(
'contain' => array(
'Player' => array(
'Skill'
)
),
));
$this->set('team', $team);
that gives the expected result:
Array
(
[0] => Array
(
[Team] => Array
(
[id] => 1
[name] => my_team_name
)
[Player] => Array
(
[0] => Array
(
[id] => 000000419
[name] => Name
[surname] => Surname
[age] => 21
[team_id] => 1
[Team_id] => 1
[Skill] => Array
(
[id] => 20
[player_id] => 000000419
[skill] => 599
)
), ecc.....
This structure use at least 1680 queries... that are too much for me...
I've tried an other way, that involve just one query, returns a bad data structure but all the information that i need (also redundant). unfortunately follow this way i can not iterate in View to display what i need.
$player = $this->Team->Player->find('all', array(
'contains' => array(
'Skill',
),
that returns
Array
(
[0] => Array
(
[Player] => Array
(
[id] => 000000400
[nome] => my_player_name
[cognome] => my_player_surname
[nation_id] => 380
[age] => 29
[team_id] => 2
)
[Team] => Array
(
[id] => 2
[nome] => my_team_name
)
[Skill] => Array
(
[id] => 1
[player_id] => 000000400
[average] => 632
)
)
ecc.
Is there a way to iterate in VIEV to get the average skill of every team? Any other solutions?
Thanks!
You can use my plugin to solve this issue if you can upgrade CakePHP to 2.6 or later. The plugin has a high compatibility with ContainableBehavior, but generates better queries.
I think that the find operation will execute only 2 queries then.
I would be happy if you try it.
https://github.com/chinpei215/cakephp-eager-loader
Usage
1. Enable EagerLoader plugin
// In your model
$actsAs = ['EagerLoader.EagerLoader'];
If you are afraid that loading my plugin breaks something somewhere, you can also enable it on the fly.
// On the fly
$this->Team->Behaviors->load('EagerLoader.EagerLoader');
2. Execute the same find operation
$this->Team->find('all', ['contain' => ['Player' => ['Skill']]]);
3. See the query log
You will see the query log such as the following:
SELECT ... FROM teams AS Team WHERE 1 = 1;
SELECT ... FROM players AS Player LEFT JOIN skills AS Skill ON Player.id = Skill.player_id WHERE Player.id IN ( ... );
if you feeling that query searching so many tables (ie, models) then
you can unbind those model, before performing search with find()
if you want to fetch some particular column of a table, then remove
others column by selecting "fields" in find().
At the moment "matrix_mct_versions" is a table with 73 entries. When I run this query the "version_count" always returns 73, ie the full number of rows. When I run the sub select query on its own i get the real count as per the com_ID param sent. I cannot see what I am doing wrong with this.. can anyone help?
SELECT
a_ID as com_ID,
option_number,
comment,
word_count,
gender,
sample,
(
SELECT
count(a_ID)
FROM
matrix_mct_versions
WHERE
com_ID = com_ID
) as version_count
FROM
matrix_mct
WHERE
attribute_number = :attribute_number AND
grade_number = :grade_number AND
attribute_type = :attribute_type
ORDER BY
option_number
Returns results like this:
[0] => Array
(
[com_ID] => 678
[option_number] => 1
[comment] => TODO primary function missing for controller y
[word_count] => 7
[gender] => 2
[sample] => 0
[version_count] => 73
)
[1] => Array
(
[com_ID] => 679
[option_number] => 2
[comment] => TODO make this green
[word_count] => 4
[gender] => 2
[sample] => 0
[version_count] => 73
)
[2] => Array
(
[com_ID] => 680
[option_number] => 3
[comment] => TODO make this better
[word_count] => 4
[gender] => 2
[sample] => 0
[version_count] => 73
)
At least one problem is your subquery. It is not correlated. I think you mean:
(SELECT count(a_ID)
FROM matrix_mct_versions
WHERE matrix_mct_versions.com_ID = matrix_mct.com_ID
) as version_count
The MySQL query I'm currently trying to perform is functionally equivalent to this:
SELECT small_table.A, small_table.B, small_table.C, huge_table.X, huge_table.Y
FROM small_table LEFT JOIN huge_table
ON small_table.D = huge_table.Z
WHERE small_table.E = 'blah'
except that the query doesn't appear to terminate (at least not within a reasonable amount of time), probably because the second table is huge (i.e. 7500 rows with a total size of 3 MB). Can I perform a functionally equivalent join in a reasonable amount of time, or do I need to introduce redundancy by adding columns from the huge table into the small table. (I'm a total beginner to SQL.)
The clause WHERE small_table.E = 'blah' is static and 'blah' never changes.
Here is the EXPLAIN output as requested:
Array ( [0] => Array ( [0] => 1 [id] => 1 [1] => SIMPLE [select_type] => SIMPLE [2] => small_table [table] => small_table [3] => ref [type] => ref [4] => E [possible_keys] => E [5] => E [key] => E [6] => 1 [key_len] => 1 [7] => const [ref] => const [8] => 1064 [rows] => 1064 [9] => Using where [Extra] => Using where ) [1] => Array ( [0] => 1 [id] => 1 [1] => SIMPLE [select_type] => SIMPLE [2] => huge_table [table] => huge_table [3] => eq_ref [type] => eq_ref [4] => PRIMARY [possible_keys] => PRIMARY [5] => PRIMARY [key] => PRIMARY [6] => 4 [key_len] => 4 [7] => my_database.small_table.D [ref] => my_database.small_table.D [8] => 1 [rows] => 1 [9] => [Extra] => ) )
A few things ...
1) Are you executing this query directly in MySQL (either Workbench GUI or command line), or is this query embedded in PHP code? Your EXPLAIN output seems to suggest PHP. If you haven't done so already, try executing the query directly in MySQL and take PHP out of the mix.
2) Your EXPLAIN output looks Ok, except I'm wondering about your WHERE clause with small_table.E = 'blah'. The EXPLAIN output shows that there's an index on column E but the key length = 1, which is not consistent to the comparison with 'blah'. What data type did you use for the column definition for small_table.E?
3) MySQL is estimating that it needs to scan 1064 rows in small_table. How many total rows are in small_table, and how many do you expect should match this particular query?