How to optimize huge left join in mysql? - mysql

I have two tables in my mysql database:
1) Videos
+----+--------+----------+------+----------+
| id | title | category | year | director |
+----+--------+----------+------+----------+
| 1 | Title1 | Cat1 | 2021 | A.K. |
| 2 | Title2 | Cat2 | 2020 | B.C. |
| 3 | Title3 | Cat3 | 2000 | E.A. |
+----+--------+----------+------+----------+
2) Videos_insights
+----------+------------+-------+-------+----------+--------+
| video_id | date | views | likes | dislikes | shares |
+----------+------------+-------+-------+----------+--------+
| 1 | 2021-03-20 | 13 | 2 | 3 | 1 |
| 1 | 2021-03-19 | 35 | 1 | 3 | 3 |
| 1 | 2021-03-18 | 68 | 5 | 6 | 5 |
| 1 | 2021-03-15 | 86 | 3 | 0 | 1 |
| 2 | 2021-02-13 | 234 | 15 | 1 | 34 |
| 2 | 2021-02-12 | 55 | 15 | 2 | 4 |
| 2 | 2021-02-10 | 331 | 255 | 0 | 0 |
+----------+------------+-------+-------+----------+--------+
And I want to get videos that had between watched 2021-03-01 to 2021-03-31. So the result table should look like this:
+--------+-------------------------------------------+
| title | date_range |
+--------+-------------------------------------------+
| Title1 | ["2021-03-20 - 2021-03-18", "2021-03-15"] |
+--------+-------------------------------------------+
In my MySQL database, I have about 100 000 videos and each video has about 100 video_insight.
What is the best way to achieve the result table?
How to optimize? I mean I do not want to make every GET request left join? It would take too long and my server will burn out.

I would express the dates individually:
select v.id, v.title,
group_concat(date) as dates
from videos v join
video_insights vi
on vi.video_id = v.id
where vi.date >= '2021-03-01' and
vi.date < '2021-04-01'
group by v.id;
Note that a left join is not appropriate because you are filtering the values.
If you really want to get the ranges, then you can use window functions with a gaps-and-islands approach:
select v.id, v.title, group_concat(date_range)
from videos v join
(select vi.video_id,
concat_ws(' - ', min(vi.date), nullif(max(date), min(date))) as date_range
from (select vi.*,
dense_rank() over (partition by vi.video_id order by vi.date) as seqnum
from video_insights vi
where vi.date >= '2021-03-01' and
vi.date < '2021-04-01'
) vi
group by vi.video_id, date - interval seqnum day
) vi
on vi.video_id = v.id
group by v.id;

Related

Adding a moving average column to a table using values from previous 2 entries

I currently have the following simplified tables in my database. The points table contains rows of points awarded to each user for every bid form they have voted in.
I would like to add a column to this table that for each row, it shows the AVERAGE of the previous TWO points awarded to THAT user.
Users
+----+----------------------+
| id | name |
+----+----------------------+
| 1 | Flossie Schamberger |
| 2 | Lawson Graham |
| 3 | Hadley Reilly |
+----+----------------------+
Bid Forms
+----+-----------------+
| id | name |
+----+-----------------+
| 1 | Summer 2017 |
| 2 | Winter 2017 |
| 3 | Summer 2018 |
| 4 | Winter 2019 |
| 5 | Summer 2019 |
+----+-----------------+
Points
+-----+---------+--------------------+------------+------------+
| id | user_id | leave_bid_forms_id | bid_points | date |
+-----+---------+--------------------+------------+------------+
| 1 | 1 | 1 | 6 | 2016-06-19 |
| 2 | 2 | 1 | 8 | 2016-06-19 |
| 3 | 3 | 1 | 10 | 2016-06-19 |
| 4 | 1 | 2 | 4 | 2016-12-18 |
| 5 | 2 | 2 | 8 | 2016-12-18 |
| 6 | 3 | 2 | 4 | 2016-12-18 |
| 7 | 1 | 3 | 10 | 2017-06-18 |
| 8 | 2 | 3 | 12 | 2017-06-18 |
| 9 | 3 | 3 | 4 | 2017-06-18 |
| 10 | 1 | 4 | 4 | 2017-12-17 |
| 11 | 2 | 4 | 4 | 2017-12-17 |
| 12 | 3 | 4 | 2 | 2017-12-17 |
| 13 | 1 | 5 | 16 | 2018-06-17 |
| 14 | 2 | 5 | 12 | 2018-06-17 |
| 15 | 3 | 5 | 10 | 2018-06-17 |
+-----+---------+--------------------+------------+------------+
For each row in the points table I would like an average_points column to be calculated like follows.
The average point column is the average of that users PREVIOUS 2 points. So for the first entry in the table for each user, the average is obviously 0 because there were no previous points awarded to them.
The previous 2 points for each user should be determined using the date column.
The table below is what I would like to have as the final output.
For clarity, to the side of the table, I have added the calculation and numbers used to arrive at the value in the averaged_points column.
+-----+---------+--------------------+------------+-----------------+
| id | user_id | leave_bid_forms_id | date | averaged_points |
+-----+---------+--------------------+------------+-----------------+
| 1 | 1 | 1 | 2016-06-19 | 0 | ( 0 + 0 ) / 2
| 2 | 2 | 1 | 2016-06-19 | 0 | ( 0 + 0 ) / 2
| 3 | 3 | 1 | 2016-06-19 | 0 | ( 0 + 0 ) / 2
| 4 | 1 | 2 | 2016-12-18 | 3 | ( 6 + 0 ) / 2
| 5 | 2 | 2 | 2016-12-18 | 4 | ( 8 + 0 ) / 2
| 6 | 3 | 2 | 2016-12-18 | 5 | ( 10 + 0) / 2
| 7 | 1 | 3 | 2017-06-18 | 5 | ( 4 + 6 ) / 2
| 8 | 2 | 3 | 2017-06-18 | 8 | ( 8 + 8 ) / 2
| 9 | 3 | 3 | 2017-06-18 | 7 | ( 4 + 10) / 2
| 10 | 1 | 4 | 2017-12-17 | 7 | ( 10 + 4) / 2
| 11 | 2 | 4 | 2017-12-17 | 10 | ( 12 + 8) / 2
| 12 | 3 | 4 | 2017-12-17 | 4 | ( 4 + 4 ) / 2
| 13 | 1 | 5 | 2018-06-17 | 7 | ( 4 + 10) / 2
| 14 | 2 | 5 | 2018-06-17 | 8 | ( 4 + 12) / 2
| 15 | 3 | 5 | 2018-06-17 | 3 | ( 2 + 4 ) / 2
+-----+---------+--------------------+------------+-----------------+
I've been trying to use subqueries to solve this issue as AVG doesn't seem to be affected by any LIMIT clause I have.
So far I have come up with
select id, user_id, leave_bid_forms_id, `date`,
(
SELECT
AVG(bid_points)
FROM (
Select `bid_points`
FROM points as p2
ORDER BY p2.date DESC
Limit 2
) as thing
) AS average_points
from points as p1
This is in this sqlfiddle but to be honest I'm out of my depth here.
Am I on the right path? Wondering if someone would be able to show me where I need to tweak things please!
Thanks.
EDIT
Using the the answer below as a basis I was able to tweak the sql to work with the tables provided in the original sqlfiddle.
I have added that to this sqlfiddle to show it working
The corrected sql to match the code above is
select p.*,
IFNULL(( (coalesce(points_1, 0) + coalesce(points_2, 0)) /
( (points_1 is not null) + (points_2 is not null) )
),0) as prev_2_avg
from (select p.*,
(select p2.bid_points
from points p2
where p2.user_id = p.user_id and
p2.date < p.date
order by p2.date desc
limit 1
) as points_1,
(select p2.bid_points
from points p2
where p2.user_id = p.user_id and
p2.date < p.date
order by p2.date desc
limit 1, 1
) as points_2
from points as p
) p;
Although I am about to ask another question about the best way to make this dynamic with the number of previous poingt that need to be averaged.
You can use window functions, which were introduced in MySQL 8.
select p.*,
avg(points) over (partition by user_id
order by date
rows between 2 preceding and 1 preceding
) as prev_2_avg
from p;
In earlier versions, this is a real pain, because MySQL does not support nested correlation clauses. One method is with a separate column for each one:
select p.*,
( (coalesce(points_1, 0) + coalesce(points_2, 0)) /
( (points_1 is not null) + (points_2 is not null) )
) as prev_2_avg
from (select p.*,
(select p2.points
from points p2
where p2.user_id = p.user_id and
p2.date < p.date
order by p2.date desc
limit 1
) as points_1,
(select p2.points
from points p2
where p2.user_id = p.user_id and
p2.date < p.date
order by p2.date desc
limit 1, 1
) as points_2
from p
) p;

Subtract two columns of different tables with different number of rows

How can I write a single query that will give me SUM(Entrance.quantity) - SUM(Buying.quantity) group by product_id.
The problem is in rows that not exist in the first or second table. Is possible to do this?
Entrance:
+---+--------------+---------+
| id | product_id | quantity|
+---+--------------+---------+
| 1 | 234 | 15 |
| 2 | 234 | 35 |
| 3 | 237 | 12 |
| 4 | 237 | 18 |
| 5 | 101 | 10 |
| 6 | 150 | 12 |
+---+--------------+---------+
Buying:
+---+------------+-------------+
| id | product_id | quantity|
+---+------------+-------------+
| 1 | 234 | 10 |
| 2 | 234 | 20 |
| 3 | 237 | 10 |
| 4 | 237 | 10 |
| 5 | 120 | 15 |
+---+------------+------------+
Desired result:
+--------------+-----------------------+
| product_id | quantity_balance |
+--------------+-----------------------+
| 234 | 20 |
| 237 | 10 |
| 101 | 10 |
| 150 | 12 |
| 120 | -15 |
+--------------+-----------------------+
This is tricky, because products could be in one table but not the other. One method uses union all and group by:
select product_id, sum(quantity)
from ((select e.product_id, quantity
from entrance e
) union all
(select b.product_id, - b.quantity
from buying b
)
) eb
group by product_id;
SELECT product_id ,
( Tmp1.enterquantity - Tmp2.buyquantity ) AS Quantity_balance
FROM entrance e1
CROSS APPLY ( SELECT SUM(quantity) AS enterquantity
FROM Entrance e2
WHERE e1.product_id = e2.product_id
) Tmp1
CROSS APPLY ( SELECT SUM(quantity) AS buyquantity
FROM Buying b2
WHERE e1.product_id = b2.product_id
) Tmp2
GROUP BY Product_id,( Tmp1.enterquantity - Tmp2.buyquantity )

Select most recent MAX() and MIN() - WebSQL

i'm build an exercises web app and i'm working with two tables like this:
Table 1: weekly_stats
| id | code | type | date | time |
|----|--------------|--------------------|------------|----------|
| 1 | CC | 1 | 2015-02-04 | 19:15:00 |
| 2 | CC | 2 | 2015-01-28 | 19:15:00 |
| 3 | CPC | 1 | 2015-01-26 | 19:15:00 |
| 4 | CPC | 1 | 2015-01-25 | 19:15:00 |
| 5 | CP | 1 | 2015-01-24 | 19:15:00 |
| 6 | CC | 1 | 2015-01-23 | 19:15:00 |
| .. | ... | ... | ... | ... |
Table 2: global_stats
| id | exercise_number |correct | wrong |
|----|-----------------|--------|-----------|
| 1 | 138 | 1 | 0 |
| 2 | 246 | 1 | 0 |
| 3 | 988 | 1 | 10 |
| 4 | 13 | 5 | 0 |
| 5 | 5 | 4 | 7 |
| 6 | 5 | 4 | 7 |
| .. | ... | ... | ... |
What i would like is to get MAX(correct-wrong) and MIN(correct-wrong) and now i'm working with this query:
SELECT
exercise_number,
date,
time
FROM weekly_stats AS w JOIN global_stats AS g
ON w.id=g.id
WHERE correct - wrong = (SELECT MAX(correct - wrong) from global_stats)
UNION
SELECT
exercise_number,
date,
time
FROM weekly_stats AS w JOIN global_stats AS g
ON w.id=g.id
WHERE correct - wrong = (SELECT MIN(correct - wrong) from global_stats);
This query is working good, except for one thing: when "WHERE correct - wrong = (SELECT MIN(correct - wrong)[...]" selects more than one row, the row selected is the first but i would like to have returned the most recent (in other words: ordered by datetime(date, time)). Is it possible?
Thanks!
I think you can solve it like this:
SELECT * FROM (
SELECT
1 as sort_column,
exercise_number,
date,
time
FROM weekly_stats AS w JOIN global_stats AS g
ON w.id=g.id
WHERE correct - wrong = (SELECT MAX(correct - wrong) from global_stats)
ORDER BY date DESC, time DESC
LIMIT 1 ) as a
UNION
SELECT * FROM (
SELECT
2 as sort_column,
exercise_number,
date,
time
FROM weekly_stats AS w JOIN global_stats AS g
ON w.id=g.id
WHERE correct - wrong = (SELECT MIN(correct - wrong) from global_stats)
ORDER BY date DESC, time DESC
LIMIT 1) as b
ORDER BY sort_column;
Here is the documentation about how UNION works.

Get distinct records of two columns in a join with 6 columns

I have two MySQL tables SPONSORSHIPS and EVENTS. I want to display a list of SPONSORSHIPS sorted by the category of the events they sponsor, but to only show a sponsorship once under each event. Sample join table:
SPONSORSHIPS
sponsorhipid | sponsorid | eventid | date |
-------------|-----------|---------|------------|
1 | 3 | 20 | 06/01/2013 |
2 | 2 | 20 | 06/02/2013 |
3 | 3 | 20 | 06/03/2013 |
4 | 2 | 21 | 06/04/2013 |
EVENTS
eventid | name | premium |
--------|-----------|------------|
20 | Lunch | 0 |
21 | Dinner | 1 |
What I'd like to have as a result of the JOIN is:
sponsorhipid | sponsorid | eventid | date | name | premium |
-------------|-----------|---------|------------|---------| ---------|
1 | 3 | 20 | 06/01/2013 | Lunch | 0 |
2 | 2 | 20 | 06/02/2013 | Lunch | 0 |
4 | 2 | 21 | 06/04/2013 | Dinner | 1 |
I tried DISTINCT and GROUP BY but these collapse the events so if sponsor #2 sponsors two different events they'd still be shown only once. How can I achieve this? Here is my last SQL query:
SELECT DISTINCT (sponsorships.sponsorshipid), sponsorships.*, events.*
FROM events
INNER JOIN sponsorships
ON events.eventid = sponsorships.eventid
Thanks so much for any pointers!
You need to use nested sub-queries like this:
SELECT s.sponsorhipid, s.sponsorid, s.eventid, s.date
,e.name, e.premium
FROM EVENTS e
JOIN
(
SELECT s1.* FROM SPONSORSHIPS s1
JOIN
(
SELECT sponsorid, MIN(Date) As minDate
FROM SPONSORSHIPS
GROUP BY eventid,sponsorid
) s2
ON s1.sponsorid = s2.sponsorid
AND s1.date = s2.minDate
) s
ON e.eventid = s.eventid;
Output:
| SPONSORHIPID | SPONSORID | EVENTID | DATE | NAME | PREMIUM |
|--------------|-----------|---------|------------|--------|---------|
| 1 | 3 | 20 | 06/01/2013 | Lunch | 0 |
| 2 | 2 | 20 | 06/02/2013 | Lunch | 0 |
| 4 | 2 | 21 | 06/04/2013 | Dinner | 1 |
See this SQLFiddle

Select rows with alternate ordered field from another table

Given a *students_exam_rooms* table:
+------------+---------+---------+
| student_id | room_id | seat_no |
+------------+---------+---------+
| 1 | 30 | 1001 |
| 2 | 30 | 1002 |
| 3 | 31 | 2001 |
| 4 | 32 | 2002 |
| 5 | 33 | 3001 |
| 6 | 33 | 3002 |
| 7 | 34 | 4001 |
| 8 | 34 | 4002 |
+------------+---------+---------+
And *students_tbl*:
+------------+-------------+------+
| student_id | studen_name | year |
+------------+-------------+------+
| 1 | Eric | 1 |
| 2 | Mustafa | 1 |
| 3 | Michael | 2 |
| 4 | Andy | 2 |
| 5 | Rafael | 3 |
| 6 | Mark | 3 |
| 7 | Jack | 4 |
| 8 | peter | 4 |
+------------+-------------+------+
How can I select from *students_exam_rooms* ordering by *students_tbl.year* but with one after one like this:
+--------------+------+
| student_name | year |
+--------------+------+
| Eric | 1 |
| Michael | 2 |
| Rafael | 3 |
| Jack | 4 |
| Mustafa | 1 |
| Andy | 2 |
| Mark | 3 |
| Peter | 4 |
+--------------+------+
I'm assuming that you want to order by the "occurrence-count" of the year then the year, e.g. all the first-occurrences of all years first, sorted by year, then all second-occurrences of all years also sorted by year, and so on. That would be a perfect case for emulating other RDBMS' analytic / windowing functions:
select *
from (
select
s.studen_name,
s.year,
ser.*,
(
select 1 + count(*)
from students_tbl s2
where s.year = s2.year
and s.student_id > s2.student_id
) rank
from students_tbl s
JOIN students_exam_rooms ser
ON s.student_id = ser.student_id
) i_dont_really_want_to_name_this
order by rank, year
Here it is against a slightly tweaked version of JW's fiddle: http://www.sqlfiddle.com/#!2/27c91/1
Emulating Analytic (AKA Ranking) Functions with MySQL is a good article that gives more background and explanation.
try any of these below:
SELECT a.studen_name, a.year
FROM students_tbl a
INNER JOIN students_exam_rooms b
ON a.student_id = b.student_id
ORDER BY REVERSE(b.seat_no),
a.year
SQLFiddle Demo
by using Modulo
SELECT a.studen_name, a.year
FROM students_tbl a
INNER JOIN students_exam_rooms b
ON a.student_id = b.student_id
ORDER BY CASE WHEN MOD(b.seat_no, 2) <> 0 THEN 0 ELSE 1 END,
a.year
SQLFiddle Demo
Looks to me like you're trying to sort first by seat and then by year. Looking at your students_exam_rooms table, it looks like you started with a simple seat number and prepended year * 1000. So, if we omit the year, it looks like this:
> select * from fixed_students_exam_rooms;
+------------+---------+---------+
| student_id | room_id | seat_no |
+------------+---------+---------+
| 1 | 30 | 1 |
| 2 | 30 | 2 |
| 3 | 31 | 1 |
| 4 | 32 | 2 |
| 5 | 33 | 1 |
| 6 | 33 | 2 |
| 7 | 34 | 1 |
| 8 | 34 | 2 |
+------------+---------+---------+
And if you had that table, your query is simple:
select
student_name, year
from
modified_student_exame_rooms
left join students_tbl using (student_id)
order by
seat_no, year
;
Using the table as you currently have it, it's only slightly more complicated, assuming the "core seat number" doesn't excede 999.
select
student_name, year
from
modified_student_exame_rooms
left join students_tbl using (student_id)
order by
convert(substr(seat_no, 2), unsigned),
year
;