MySQL SELECT query that counts left joined rows takes too long - mysql

Does anyone know how to optimize this query?
SELECT planbook.*,
COUNT(pb_unit_id) AS total_units,
COUNT(pb_lsn_id) AS total_lessons
FROM planbook
LEFT JOIN planbook_unit ON pb_unit_pb_id = pb_id
LEFT JOIN planbook_lesson ON pb_lsn_pb_id = pb_id
WHERE pb_site_id = 1
GROUP BY pb_id
The slow part is getting the total number of matching units and lessons. I have indexes on the following fields (and others):
planbook.pb_id
planbook_unit.pb_unit_pb_id
planbook_lesson.pb_lsn_pb_id
My only objective is to get the total number of matching units and lessons along with the details of each planbook row.
However, this query is taking around 35 seconds. I have 1625 records in planbook, 13,693 records in planbook_unit, and 122,950 records in planbook_lesson.
Any suggestions?
Edit: Explain Results

SELECT planbook.*,
( SELECT COUNT(*) FROM planbook_unit
WHERE pb_unit_pb_id = planbook.pb_id ) AS total_units,
( SELECT COUNT(*) FROM planbook_lesson
WHERE pb_lsn_pb_id = planbook.pb_id ) AS total_lessons
FROM planbook
WHERE pb_site_id = 1
planbook: INDEX(pb_site_id)
planbook_unit: INDEX(pb_unit_pb_id)
planbook_lesson: INDEX(pb_lsn_pb_id)

Looking to your query
You should add and index for
table planbook column pb_site_id
and eventually a composite one for
table planbook column (pb_site_id, pd_id)

Related

MYSQL GROUP BY on 2 Tables with MAX

I have 2 mysql tables:
record table:
and
race table:
I want to select the records from the 1st table group by id_Race but only the MAX from column "secs".
I tried the following but didnt work:
$query = "SELECT rec.RecordsID,rec.id_Athlete,rec.date_record,rec.id_Race,rec.placeevent,rec.mins,rec.secs,rec.huns,rec.distance,rec.records_text,r.name,MAX(rec.secs)
FROM records AS rec INNER JOIN race AS r ON r.RaceID=rec.id_Race WHERE (id_Athlete=$u_athlete) GROUP BY rec.id_Race;";
($u_athlete is a variable i get from _SESSION)
Can you help me about that?
Thank you.
When you use an aggregation function like MAX and select all fields, you are forced to include all selected fields that are not affected by the MAX inside the GROUP BY clause.
Though you can use a window function like ROW_NUMBER that will group by specifically on id_Race and order by the secs column in a descendent way (so that the highest value of secs will be associated with row_number=1).
Afterwards you can select the rows which have row_number=1 and the id_Athlete you pass using the variable.
SELECT
rec.RecordsID,
rec.id_Athlete,
rec.date_record,
rec.id_Race,
rec.placeevent,
rec.mins,
rec.secs,
rec.huns,
rec.distance,
rec.records_text,
race.name,
FROM
(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id_race ORDER BY secs) rank
FROM
record
) rec
INNER JOIN
race race
ON
race.RaceID=rec.id_Race
WHERE
rec.rank = 1
AND
rec.id_Athlete = $u_athlete;

Calculating unmatching rows in partitioned table in hive

I have a use-case where I have to calculate unmatching rows(excluding matching records) from two different partition's from a partitioned hive table.
Let's suppose there is a partitioned table called test which is partitioned on column as_of_date. Now to get the unmatching rows I tried with two option-
1.)
select count(x.item_id)
from
(select coalesce(test_new.item_id, test_old.item_id) as item_id
from
(select item_id from test where as_of_date = '2019-03-10') test_new
full outer join
(select item_id from test where as_of_date = '2019-03-09') test_old
on test_new.item_id = test_old.item_id
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0)) as x;
2.) I am creating a view first and then querying on that
create view test_diff as
select coalesce(test_new.item_id, test_old.item_id) as item_id, coalesce(test_new.as_of_date, date_add(test_old.as_of_date, 1)) as as_of_date
from test test_new
full outer join test test_old
on (test_new.item_id = test_old.item_id and date_sub(test_new.as_of_date, 1) = test_old.as_of_date)
where coalesce(test_new.item_id,0) != coalesce(test_old.item_id,0);
Then I am using query
select count(distinct item_id) from test_diff where as_of_date = '2019-03-10';
Both the case are returning different count. In second option I am getting lesser count. Please provide any suggestion on why counts are different.
Assuming you taken care of test_new,test_old tables(filtered with as_of_date = '2019-03-10') in 2nd option.
1st option , you are using select clause count(X.item_id), where as 2nd option count(distinct). distinct might have reduced your item count in later option.

MySQL queries stuck in "sending data" for 30 seconds after migrating to RDS

This query (along with a few others I think have a related issue) did not take 30 seconds when MySQL was local on the same EC2 instance as the rest of the website. More like milliseconds.
Does anything look off?
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
LEFT JOIN chv_follows ON chv_follows.follow_followed_user_id =
chv_images.image_user_id
LEFT JOIN chv_follows_projects ON
chv_follows_projects.follows_project_project_id =
chv_images.image_project_id LEFT JOIN chv_projects ON
chv_projects.project_id = follows_project_project_id WHERE
chv_follows.follow_user_id='1' OR (follows_project_user_id = 1 AND
chv_projects.project_privacy = "public" AND
chv_projects.project_is_public_upload = 1) GROUP BY chv_images.image_id
ORDER BY chv_images.image_id DESC
LIMIT 0,15
And this is what EXPLAIN shows:
Thank you
Update: This query has the same issue. It does not have a GROUP BY.
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
ORDER BY chv_images.image_id DESC
LIMIT 0,15
That EXPLAIN shows several table-scans (type: ALL), so it's not surprising that it takes over 30 seconds.
Here's your EXPLAIN:
Notice the column rows shows an estimated 14420 rows read from the first table chv_images. It's doing a table-scan of all the rows.
In general, when you do a series of JOINs, you can multiple together all the values in the rows column of the EXPLAIN, and the final result is how many row-reads MySQL has to do. In this case it's 14420 * 2 * 1 * 1 * 2 * 1 * 916, or 52,834,880 row-reads. That should put into perspective the high cost of doing several table-scans in the same query.
You might help avoid those table-scans by creating some indexes on these tables:
ALTER TABLE chv_storages
ADD INDEX (storage_id);
ALTER TABLE chv_categories
ADD INDEX (category_id);
ALTER TABLE chv_likes
ADD INDEX (like_content_id, like_content_type, like_user_id);
Try creating those indexes and then run the EXPLAIN again.
The other tables are already doing lookups by primary key (type: eq_ref) or by secondary key (type: ref) so those are already optimized.
Your EXPLAIN shows your query uses a temporary table and filesort. You should reconsider whether you need the GROUP BY, because that's probably causing the extra work.
Another tip is to avoid using SELECT * because it might be forcing the query to read many extra columns that you don't need. Instead, explicitly name only the columns you need.
Is there any indexes in chv_images?
I propose:
CREATE INDEX idx_image_id ON chv_images (image_id);
(Bill's ideas are good. I'll take the discussion a different way...)
Explode-Implode -- If the LEFT JOINs match no more than 1 row, change, for example,
SELECT
...
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
into
SELECT ...,
( SELECT foo FROM chv_meta WHERE image_id = chv_images.image_id ) AS foo, ...
If that can be done for all the JOINs, you can get rid of GROUP BY. This will avoid the costly "explode-implode" where JOINs lead to more rows, then GROUP BY gets rid of the dups. (I suspect you can't move all the joins in.)
OR -> UNION -- OR is hard to optimize. Your query looks like a good candidate for turning into UNION, then making more indexes that will become useful.
WHERE chv_follows.follow_user_id='1'
OR (follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1
)
Assuming that follows_project_user_id is in `chv_images,
( SELECT ...
WHERE chv_follows.follow_user_id='1' )
UNION DISTINCT -- or ALL, if you are sure there won't be dups
( SELECT ...
WHERE follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1 )
Indexes needed:
chv_follows: (follow_user_id)
chv_projects: (project_privacy, project_is_public_upload) -- either order
But this has not yet handled the ORDER BY and LIMIT. The general pattern for such:
( SELECT ... ORDER BY ... LIMIT 15 )
UNION
( SELECT ... ORDER BY ... LIMIT 15 )
ORDER BY ... LIMIT 15
Yes, the ORDER BY and LIMIT are repeated.
That works for page 1. If you want the next 15 rows, see http://mysql.rjweb.org/doc.php/pagination#pagination_and_union
After building those two sub-selects, look at them; I think you will be able to optimize each one, and may need new indexes because the Optimizer will start with a different 'first' table.

How to Display 1 Column sum in other Column

I have a table that has different columns display different values.
I need to add a new column that displays sum of 1 column in each row of other column.
This is what i need to display.
I have written following query but its only displaying 1 in each row of last column.
select inStation.name TapInStation , outStation.name TapOutStation,
count(trx.passengerCount) PassengerCount, sum(trx.amount) Fare,
(select sum(passengerCount) from transactions iTrx
where iTrx.ID = trx.ID) PassengerPercent
from transactions trx
inner join
station inStation on inStation.ID = trx.fromStation
inner join
station outStation on outStation.ID = trx.toStation
GROUP BY
TapInStation, TapOutStation
If you want the total, then remove the correlation clause. This may do what you want:
select inStation.name as TapInStation , outStation.name as TapOutStation,
count(trx.passengerCount) as PassengerCount,
sum(trx.amount) as Fare,
(select sum(passengerCount) from transactions iTrx) as PassengerPercent
I'm not sure why you would called the "total" something like PassengerPercent, but this should return the overall total.
I also suspect that you might want a sum() for the previous expression.

Retrieving rows that have 2 columns matching and 1 different

Below is my table called 'datapoints'. I am trying to retrieve instances where there are different instances of 'sensorValue' for the same 'timeOfReading' and 'sensorNumber'.
For example:
sensorNumber sensorValue timeOfReading
5 5 6
5 5 6
5 6 10 <----same time/sensor diff value!
5 7 10 <----same time/sensor diff value!
Should output: sensorNumber:5, timeOfReading: 10 as a result.
I understand this is a duplicate question, in fact I have one of the links provided below for references - however none of the solutions are working as my query simply never ends.
Below is my SQL code:
SELECT table1.sensorNumber, table1.timeOfReading
FROM datapoints table1
WHERE (SELECT COUNT(*)
FROM datapoints table2
WHERE table1.sensorNumber = table2.sensorNumber
AND table1.timeOfReading = table1.timeOfReading
AND table1.sensorValue != table2.sensorValue) > 1
AND table1.timeOfReading < 20;
Notice I have placed a bound for timeOfReading as low as 20. I also tried setting a bound for both table1 and table 2 as well but the query just runs until timeout without displaying results no matter what I put...
The database contains about 700mb of data, so I do not think I can just run this on the entire DB in a reasonable amount of time, I am wondering if this is the culprit?
If so how could I properly limit my query to run a search efficiently? If not what am doing wrong that this is not working?
Select rows having 2 columns equal value
EDIT:
Error Code: 2013. Lost connection to MySQL server during query 600.000 sec
When I try to run the query again I get this error unless I restart
Error Code: 2006. MySQL server has gone away 0.000 sec
You can use a self-JOIN to match related rows in the same table.
SELECT DISTINCT t1.sensorNumber, t1.timeOfReading
FROM datapoints AS t1
JOIN datapoints AS t2
ON t1.sensorNumber = t2.sensorNumber
AND t1.timeOfReading = t2.timeOfReading
AND t1.sensorValue != t2.sensorValue
WHERE t1.timeOfReading < 20
DEMO
To improve performance, make sure you have a composite index on sensorNumber and timeOfReading:
CREATE INDEX ix_sn_tr on datapoints (sensorNumber, timeOfReading);
I think you have missed a condition. Add a not condition also to retrieve only instances with different values.
SELECT *
FROM new_table a
WHERE EXISTS (SELECT * FROM new_table b
WHERE a.num = b.num
AND a.timeRead = b.timeRead
AND a.value != b.value);
you can try this query
select testTable.* from testTable inner join (
SELECT sensorNumber,timeOfReading
FROM testTable
group by sensorNumber , timeOfReading having Count(distinct sensorValue) > 1) t
on
t.sensorNumber = testTable.sensorNumber and t.timeOfReading = testTable.timeOfReading;
here is sqlFiddle
This query will return the sensorNumber and the timeOfReading where there are different values of sensorValue:
select sensorNumber, timeOfReading
from tablename
group by sensorNumber, timeOfReading
having count(distinct sensorValue)>1
and this will return the actual records:
select t.*
from
tablename t inner join (
select sensorNumber, timeOfReading
from tablename
group by sensorNumber, timeOfReading
having count(distinct sensorValue)>1
) d on t.sensorNumber=d.sensorNumber and t.timeOfReading=d.timeOfReading
I would suggest you to add an index on sensorNumber, timeOfReading
alter table tablename add index idx_sensor_time (sensorNumber, timeOfReading)