How to improve complex SQL query - mysql

I have a complex SQL query, this is analytics query for conversations from customers of a facebook fanpage, as bellow:
SELECT
SeriesTime AS Time,
FP.PageID AS PageID,
COALESCE(MAX(FC.Customers), 0) AS Customers,
COALESCE(MAX(FC.Conversations), 0) AS Conversations,
COALESCE(MAX(FCM.Conversations), 0) AS UpdatedConversations,
COALESCE(MAX(Phones), 0) AS Phones,
COALESCE(MAX(Missed), 0) AS Missed,
COALESCE(MAX(FCM.MessageTypes), 0) AS MessageConversations,
COALESCE(MAX(Total), 0) AS TotalMessage,
COALESCE(AVG(ResponseTime), 0) AS ResponseTime
FROM
GENERATE_SERIES(:Start, :End, :Interval :: INTERVAL) S (SeriesTime)
CROSS JOIN (
SELECT DISTINCT PageID FROM FacebookConversations
) FP
LEFT JOIN (
SELECT
FCM.PageID,
DATE_TRUNC(:Trunc, NULLIF(CreatedTime, '')::TIMESTAMP AT TIME ZONE 'Etc/GMT+7') AS Time,
COUNT(DISTINCT FCM.ConversationID) FILTER (WHERE TotalReplied = 0) AS Missed,
COUNT(DISTINCT FCM.ConversationID) AS Conversations,
COUNT(DISTINCT CASE WHEN FCM."type" = 'message' THEN FCM.ConversationID ELSE NULL END) AS MessageTypes,
COUNT(FCM.ID) AS Total,
AVG(EXTRACT(EPOCH FROM ResponseTime)) FILTER (WHERE IsReplied) AS ResponseTime,
COUNT(DISTINCT PhoneNumber) AS Phones
FROM (
SELECT
*,
COUNT(IsReplied) FILTER (WHERE IsReplied) OVER (PARTITION BY ConversationID) AS TotalReplied
FROM (
SELECT
ID,
PageID,
type,
ConversationID,
CreatedTime,
CreatedTime::TIMESTAMP AT TIME ZONE 'Etc/GMT+7' - LAG(CreatedTime::TIMESTAMP AT TIME ZONE 'Etc/GMT+7') OVER Ordered AS ResponseTime,
COALESCE((LAG("from") OVER Ordered <> "from") AND "from" = PageID, FALSE) AS IsReplied
FROM
FacebookConversationMessages
WINDOW Ordered AS (
PARTITION BY ConversationID ORDER BY CreatedTime::TIMESTAMP AT TIME ZONE 'Etc/GMT+7'
)
) FCM
) FCM
LEFT JOIN
ConversationPhones CP
ON
CP.ConversationMessageID = FCM.ID
GROUP BY
Time,
FCM.PageID
) FCM
ON
FCM.PageID = FP.PageID
AND
Time >= SeriesTime
AND
Time < SeriesTime + :Interval :: INTERVAL
LEFT JOIN (
SELECT
PageID,
DATE_TRUNC(:Trunc, NULLIF(CreatedTime, '')::TIMESTAMP AT TIME ZONE 'Etc/GMT+7') AS CreatedAt,
COUNT(DISTINCT "from") AS customers,
COUNT(*) AS Conversations
FROM
FacebookConversations
GROUP BY
CreatedAt,
PageID,
Type
) FC
ON
FC.PageID = FP.PageID
AND
CreatedAt >= SeriesTime
AND
CreatedAt < SeriesTime + :Interval :: INTERVAL
WHERE
FP.PageID = :PageID
GROUP BY
SeriesTime,
FP.PageID
ORDER BY
FP.PageID,
SeriesTime
On my localhost (with fewer data), it run quite fast, and return exactly what I want. But on server, it run very very SLOW. (normally it take about 5 minutes to complete :() Can any one tell me what parts make this SLOW?
Thank you very much!

Related

Subtracting or Adding data based on logtime of another table

So currently I have 2 tables called listings and logs table. The listings table holds a products reference number and it's current status. So suppose if it's status was Publish currently and it's sold, the status updates to Sold. Here the refno. in this table is unique since the status can change for 1 product.
Now I have another table called Logs table, this table records all the status changes that have happened for a particular product(referenced by refno) in a particular timeframe. Suppose the Product with refno. 5 was Publish on 1st October and Sold on 2nd October, The logs table will display as:
Refno
status_from
status_to
logtime
5
Stock
Publish
2021-10-01
5
Publish
Sold
2021-10-02
This is how my tables currently look like:
Listings table:('D'=>'Draft','N'=>'Action','Y'=>'Publish')
Logs Table which I'm getting using the following statement:
SELECT refno, logtime, status_from, status_to FROM (
SELECT refno, logtime, status_from, status_to, ROW_NUMBER() OVER(PARTITION BY refno ORDER BY logtime DESC)
AS RN FROM crm_logs WHERE logtime < '2021-10-12 00:00:00' ) r
WHERE r.RN = 1 UNION SELECT refno, logtime, status_from, status_to
FROM crm_logs WHERE logtime <= '2021-10-12 00:00:00' AND logtime >= '2015-10-02 00:00:00'
ORDER BY `refno` ASC
The logs table makes a new record every status change made and passes the current timestamp as the logtime, and the listings table changes/updates the status and updates its update_date. Now to get the total listings as of today I'm using the following statement:
SELECT SUM(status_to = 'D') AS draft, SUM(status_to = 'N') AS action, SUM(status_to = 'Y') AS publish FROM `crm_listings`
And this returns all the count data for status as of the current day.
Now this is where it gets confusing for me. So suppose today the count under action is 10 and tomorrow it'll be 15, and I want to retrieve the total that was present yesterday(10). So for this what I would've to do is take todays total(15) and subtract all the places where a product was changed to draft in between yesterday and today(Total count today in listing table - count(*) where status_to='Action' from logs table). Or vice versa, if yesterday it was 10 under action and today it is 5, it should add the values from the status_from column in logs table
Note: Refno isn't unique in my logs table since a product with the same refno can be marked as publish 1 day and unpublish another, but it is unique in my listings table.
Link to dbfiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=01cb3ccfda09f6ddbbbaf02ec92ca894
I am sure it can be simplifed or better. But its my query and logic :
I found status_changes per refno's and calculated total changes from the desired day to present :
select status_logs, sum(cnt_status) to_add from (
SELECT
status_to as status_logs, -1*count(*) cnt_status
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT
status_from, count(*) cnt_status_from
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_from ) total_changes
group by status_logs
I matched the keys between listings table and logs table by converting listings table keys :
select
case status
when 'D' THEN 'Draft'
when 'A' THEN 'Action'
when 'Y' THEN 'Publish'
when 'S' THEN 'Sold'
when 'N' THEN 'Let'
END status_l ,COUNT(*) c
from listings
group by status
I joined them and add the calculations to total sum of current data.
I had to use full outer join , so i have one left and one right join with the same subqueries.
Lastly I used distinct , since it will generate same result for each joined query and used ifnull to bring the other tables status to the other column .
select distinct IFNULL(status_l, status_logs) status, counts_at_2021_10_01
from (select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_10_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
left join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l
union all
select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_05_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
right join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l) l

Why integer cast is not working with integer group_concat() list?

I'm stuck at the query where I need to concat IDs of the table. And from that group of IDs, I need to fetch that rows in sub query. But when I try to do so, MySQL consider group_concat() as a string. So that condition becomes false.
select count(*)
from rides r
where r.ride_status = 'cancelled'
and r.id IN (group_concat(rides.id))
*************** Original Query Below **************
-- Daily Earnings for 7 days [Final]
select
group_concat(rides.id) as ids,
group_concat(ride_category.name) as rideType,
group_concat(ride_cars.amount + ride_cars.commission) as rideAmount ,
group_concat(ride_types.name) as carType,
count(*) as numberOfRides,
(
select count(*) from rides r where r.ride_status = 'cancelled' and r.id IN (group_concat(rides.id) )
) as cancelledRides,
(
select count(*) from rides r where r.`ride_status` = 'completed' and r.id IN (group_concat(rides.id))
) as completedRides,
group_concat(ride_cars.status) as status,
sum(ride_cars.commission) + sum(ride_cars.amount) as amount,
date_format(from_unixtime(rides.requested_at/1000 + rides.offset*60), '%Y-%m-%d') as requestedDate,
date_format(from_unixtime(rides.requested_at/1000 + rides.offset*60), '%V') as week
from
ride_cars,
rides,
ride_category,
ride_type_cars,
ride_types
where
ride_cars.user_id = 166
AND (rides.ride_status = 'completed' or. rides.ride_status = 'cancelled')
AND ride_cars.ride_id = rides.id
AND (rides.requested_at >= 1559347200000 AND requested_at < 1561852800000)
AND rides.ride_category = ride_category.id
AND ride_cars.car_model_id = ride_type_cars.car_model_id
AND ride_cars.ride_type_id = ride_types.id
group by
requestedDate;
Any solutions will be appreciated.
Try to replace the sub-query
(select count(*) from rides r where r.ride_status = 'cancelled' and r.id IN (group_concat(rides.id) )) as cancelledRides,
with below to count using SUM and CASE, it will make use of the GROUP BY
SUM(CASE WHEN rides.ride_status = 'cancelled' THEN 1 ELSE 0 END) as cancelledRides
and the same for completedRides
And move to using JOIN instead of implicit joins

user defined variable to store ranking gives wrong values if order by

mysql table: work
|id|user_id|created_at|realization|
I have been working on a sql query which calculates performance (realisation today / realization on the first day of the month) and sortes records based on performance.
Expected result:
|ranking|performance|user|
|1|0.88|36|
|2|0.712444111|444|
|3|0.711|1|
|4|0.33333|9|
|5|0.1006|29|
returned result:
|ranking|performance|user|
|4|0.88|36|
|2|0.712444111|444|
|5|0.711|1|
|3|0.33333|9|
|1|0.1006|29|
Here is my query:
SET #ranking := 0;
SELECT
#ranking := #ranking + 1 as ranking,
w1.user_id,
IFNULL(ROUND(w2.realization / w1.realization), 4), 0) AS performance
FROM work w1
JOIN (
SELECT min(created_at) AS first_month, max(created_at) AS last_month, user_id
FROM work
WHERE (DATE_FOMAT(NOW(), '%Y-%m') = DATE_FORMAT(created_at, '%Y-%m')
GROUP BY user_id
ORDER BY user_id
) AS w ON w1.user_id = w.user_id AND w1.created_at = w.first_month
JOIN work AS w2 ON w1.user_id = w2.user_id AND w2.created_at = w.last_month
ORDER BY performance DESC
UPDATE
Even if I try to wrap it this way, the rankings are not right
SET #ranking := 0;
SELECT #ranking := #ranking + 1 as ranking, a.user_id, a.performance
FROM (
SELECT
w1.user_id,
IFNULL(ROUND(w2.realization / w1.realization), 4), 0) AS performance
FROM work w1
JOIN (
SELECT min(created_at) AS first_month, max(created_at) AS last_month, user_id
FROM work
WHERE (DATE_FOMAT(NOW(), '%Y-%m') = DATE_FORMAT(created_at, '%Y- %m')
GROUP BY user_id
ORDER BY user_id
) AS w ON w1.user_id = w.user_id AND w1.created_at = w.first_month
JOIN work AS w2 ON w1.user_id = w2.user_id AND w2.created_at = w.last_month
ORDER BY performance DESC
) AS a

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID

Union with Count OR Join with Sum - MySQL

I want to combine three tables - date, lead and click - in a query.
The tables looks like this:
date:
|date|
lead:
id|time|commission
click:
id|time|commission
The table date is just storing dates and is used when getting dates with no click or lead.
So if we have the following data in the tables:
date:
2009-06-01
2009-06-02
2009-06-03
lead:
1|2009-06-01|400
2|2009-06-01|300
3|2009-06-03|350
click:
1|2009-06-01|1
2|2009-06-03|2
3|2009-06-03|2
4|2009-06-03|0
I would like to get date, number of click, commission generated by clicks (there are clicks that don't give commission), number of leads, commission generated by leads and total commission. So with the tables above I would like to get:
2009-06-01|1|1|2|700|701|
2009-06-02|0|0|0|0|0
2009-06-03|3|4|1|350|354|
I have tried with the following union:
SELECT
campaign_id,
commission_date,
SUM( click_commission ) AS click_commission,
click,
SUM( lead_commission ) AS lead_commission ,
lead,
SUM( total_commission ) as total_commission
FROM(
SELECT
click.campaign_id AS campaign_id,
DATE( click.time ) AS commission_date,
click.commission AS click_commission,
(SELECT count(click.id) from click GROUP BY date(click.time)) as click,
0 as lead_commission,
0 as lead,
click.commission AS total_commission
FROM click
UNION ALL
SELECT
lead.campaign_id AS campaign_id,
DATE( lead.time ) AS commission_date,
0 as click_commission,
0 as click,
lead.commission AS lead_commission,
lead.id as lead,
lead.commission AS total_commission
FROM lead
UNION ALL
SELECT
0 AS campaign_id,
date.date AS commission_date,
0 AS click_commission,
0 as click,
0 AS lead_commission,
0 as lead,
0 AS total_commission
FROM date
) AS foo
WHERE commission_date BETWEEN '2009-06-01' AND '2009-07-25'
GROUP BY commission_date
ORDER BY commission_date LIMIT 0, 10
But this does not work to count both the number of clicks and leads, the code above gives the right amount of clicks bot 0 on all leads. If I move the code around and put the select from the lead table I get the leads right bot 0 on all clicks. I have not been able to find a way to get both of the counts from the query.
So I tried a left-join instead:
SELECT
date.date as date,
count( DISTINCT click.id ) AS clicks,
sum(click.commission) AS click_commission,
count( lead.id ) AS leads,
sum(lead.commission) AS lead_commission
FROM date
LEFT JOIN click ON ( date.date = date( click.time ) )
LEFT JOIN lead ON ( date.date = date( lead.time ) )
GROUP BY date.date
LIMIT 0 , 30
The problem with this query is if there are more than one clicks or leads on a date it will return the expected value * 2. So on 2009-06-01 it will return 1400 instead on the expected 700 for lead commission.
So in the UNION I have problems with the count and in the left join it is the SUM that is not working.
I would really like to stick to the UNION if possible, but I haven't found a way to get both counts from it.
(This is a follow up to this earlier question, but since I didn't ask for the count in that I posted a new question.)
SELECT date,
COALESCE(lcomm, 0), COALESCE(lcnt, 0),
COALESCE(ccomm, 0), COALESCE(ccnt, 0),
COALESCE(ccomm, 0) + COALESCE(lcomm, 0),
COALESCE(ccnt, 0) + COALESCE(lcnt, 0)
LEFT JOIN
(
SELECT date, SUM(commission) AS lcomm, COUNT(*) AS lcnt
FROM leads
GROUP BY
date
) l
ON l.date = d.date
LEFT JOIN
(
SELECT date, SUM(commission) AS ccomm, COUNT(*) AS ccnt
FROM clicks
GROUP BY
date
) с
ON c.date = d.date
FROM date d
The code that I used, built from the suggestion from Quassnoi:
SELECT date,
COALESCE(ccomm, 0) AS click_commission, COALESCE(ccnt, 0) AS click_count,
COALESCE(lcomm, 0) AS lead_commision, COALESCE(lcnt, 0) AS lead_count,
COALESCE(ccomm, 0) + COALESCE(lcomm, 0) as total_commission
FROM date d
LEFT JOIN
(
SELECT DATE(time) AS lead_date, SUM(commission) AS lcomm, COUNT(*) AS lcnt
FROM lead
GROUP BY
lead_date
) l
ON lead_date = date
LEFT JOIN
(
SELECT DATE(time) AS click_date, SUM(commission) AS ccomm, COUNT(*) AS ccnt
FROM click
GROUP BY
click_date
) с
ON click_date = date