Mysql time calculation with join - mysql

I have two tables: sales, actions
Sales table:
id, datetime, status
--------------------
Actions table:
id, datetime, sales_id, action
------------------------------
There's a many-to-one relations ship between the actions and sales tables. For each sales record, there could be numerous actions. I am trying to determine, by each hour of the day, what the average time difference is between when sales records are first created, and when the first action record associated with it's respective sales record was created.
In other words, how fast (in hours) are sales agents responding to leads, based on what hour of the day the lead came in.
Here's what I tried:
SELECT
FROM_UNIXTIME(sales.datetime, '%H') as Hour,
count(actions.id) AS actions,
(MIN(actions.datetime) - sales.datetime) / 3600 as Lag
FROM
actions
INNER JOIN sales ON actions.sales_id = sales.id
group by Hour
I get what looks like reasonable hours numbers for 'Lag', but I am not convinced they're accurate:
Hour Actions Lag
00 66 11.0442
01 30 11.2758
02 50 8.2900
03 25 5.7492
.
.
.
23 77 34.4744
My question is, is this the correct way to get the values for the first action that was recorded for a given sales record? :
(MIN(actions.createDate) - sales.createDate) / 3600 as Lag

It should be:
MIN(actions.datetime - sales.datetime) / 3600 AS Lag
You way is getting the first action from any sale within the hour, and subtracting each sale's timestamp from its timestamp. You need to do the subtraction only within actions and sales that are joined by the ID.

This query has two layers, and it's helpful to crawl through them both.
The lowest layer should compute the lag time from sales.datetime to the earliest action.datetime for each row of sales. That will probably use a MIN() function.
The next layer will compute the statistics for those lag times, worked out in the lowest layer, by hour of the day. That will use an AVG() function.
Here's the lowest layer:
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds
FROM sales AS s
JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
The second part of that ON clause makes sure that you only consider actions taken after the sales order was entered. It may be unnecessary, but I thought I'd throw it in.
Here's the second layer.
SELECT HOUR(datetime) AS hour_Sale_entered,
COUNT(*) AS number_in_that_hour,
AVG(lag_seconds) / 3600.0 AS Lag_to_first_action
FROM (
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds
FROM sales AS s
JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
) AS d
GROUP BY HOUR(datetime)
ORDER BY HOUR(datetime)
See how there are two nested aggregations (GROUP BY) operations? The inner one identifies the first action, and the second one does the hourly averaging.
One more tidbit. If you want to include sales items that have not yet been acted on, you can do this:
SELECT HOUR(datetime) AS hour_Sale_entered,
COUNT(*) AS number_in_that_hour,
SUM(no_action) AS not_acted_upon_yet,
AVG(lag_seconds) / 3600.0 AS Lag_to_first_action
FROM (
SELECT s.id, s.datetime, s.status,
TIMEDIFF(SECOND, MIN(a.datetime), s.datetime) AS lag_seconds,
IFNULL(a.id,1,0) AS no_action
FROM sales AS s
LEFT JOIN actions AS a ON s.id = a.sales_id AND a.datetime > s.datetime
GROUP BY s.id, s.datetime, s.status
) AS d
GROUP BY HOUR(datetime)
ORDER BY HOUR(datetime)
The average of lag_seconds will still be correct, because the sales rows with no action rows will have NULL values for that, and AVG() ignores nulls.

Related

Retrieving top company for each quarter and corresponding revenue

Company_name
Quarter
Year
Revenue
TCS
Q1
2001
50
CTS
Q2
2010
60
ZOHO
Q2
2007
70
CTS
Q4
2015
90
This is my sample table where I store the names of the companies, quarters of the years, years and revenue for each year per a certain quarter.
I want to find the company with top revenue for each quarter, regardless of the year, and display its revenue too.
In the above case the resultant output should be something like this:
QUARTER
COMPANY_NAME
REVENUE
Q1
TCS
50
Q2
ZOHO
70
Q4
CTS
90
Here's what I've tried:
SELECT DISTINCT(C1.QUARTER),
C1.REVENUE
FROM COMPANY_REVENUE C1,
COMPANY_REVENUE C2
WHERE C1.REVENUE = GREATEST(C1.REVENUE, C2.REVENUE);
There are a couple of problems in your query, among which:
the fact that the DISTINCT keyword can be applied to full rows rather than single fields,
the SELF JOIN should be explicit, though most importantly it requires a matching condition, defined by an ON clause (e.g. SELECT ... FROM tab1 JOIN tab2 ON tab1.field = tab2.field WHERE ...)
Though probably you could solve your problem in another way.
Approach for MySQL 8.0
One way of computing values on partitions (in your case you want to partition on quarters only) is using window functions. In the specific case you can use ROW_NUMBER, which will compute a ranking over your revenues descendently for each selected partition. As long as you want the highest revenue for each quarter, you can select the row number equal to 1 for each quarter group.
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY Quarter
ORDER BY Revenue DESC
) AS rn
FROM tab
)
SELECT Quarter,
Company_name,
Revenue
FROM cte
WHERE rn = 1
Check the demo here.
Approach for MySQL 5.7
In this case you can use an aggregation function. As long as you want your max "Revenue" for each "Quarter", you need first to select the maximum value for each "Quarter", then you need to join back to your original table on two conditions:
table's quarter matches subquery quarter,
table's revenue matches subquery max revenue
SELECT tab.Quarter,
tab.Company_name,
tab.Revenue
FROM tab
INNER JOIN (SELECT Quarter,
MAX(Revenue) AS Revenue
FROM tab
GROUP BY Quarter ) max_revenues
ON tab.Quarter = max_revenues.Quarter
AND tab.Revenue = max_revenues.Revenue
Check the demo here.
Note: the second solution will find for each quarter all companies that have the maximum revenue for that quarter, which means that if two or more companies have the same maximum value, both will be returned. This won't happen for the first solution, as long as the ranking ensures only one (the ranked = 1) will be retrieved.
You can just use a cte:
with x as (
select Quarter, max(Revenue) as Revenue
from table
group by Quarter
)
select t.Company_name, x.Quarter, x.Revenue
from x
join table t
on x.Revenue = t.Revenue
and t.Quarter = x.Quarter;
see db<>fiddle.
First you select the max Revenue group by Quarter, then I'm joining to the table on the returned max(Revenue) but as #lemon pointed out in comments that's not enough because what would happen when there's two revenues on same company but different quarters it will return more rows as shown in this db<>fiddle.
So that's why I need to add the join on quarter so it will only return one result per quarter.
But if you're using a version of MySql that doesn't support cte you can use a subquery like:
select t.Company_name, x.Quarter, x.Revenue
from
(
select Quarter, max(Revenue) as Revenue
from test
group by Quarter
) x
join test t
on x.Quarter = t.Quarter
and x.Revenue = t.Revenue;
Try this,
SELECT quarter, company_name,max(revenue) FROM table_name GROUP BY quarter

MYSQL Finding how many users transacted per month

I have been tasked to find how many users performed a transaction in every month in 2020
I know i have two tables to work with.
Table Name: Receipts|Columns: receipt_id, collection_id, user_id, amount
Table Name: Games |Columns: game_id, collection_id, game_date_time
i tried this but I dont think it makes sense or works
select month(games.game_date_time) AS Month, sum(receipts.id) from bills
join games on bills.collection_id = games.collection_id
WHERE YEAR(games.game_date_time) = 2020
group by receipts.user_id, month(games.game_date_time)
order by month(games.game_date_time)
Use COUNT() to get a count, not SUM(). And if you want a count of users, without counting the same user twice, use COUNT(DISTINCT user_id), don't put user_id in the grouping.
SELECT MONTH(g.game_date_time) AS month, COUNT(DISTINCT r.user_id) AS users
FROM receipts AS r
JOIN games AS g ON r.collection_id = g.collection_id
WHERE YEAR(g.game_date_time) = 2020
GROUP BY month
ORDER BY month
find how many users performed a transaction in every month in 2020
SELECT COUNT(r.user_id)
FROM receipts AS r
JOIN games AS g USING (collection_id)
WHERE YEAR(g.game_date_time) = 2020
GROUP BY r.user_id
HAVING COUNT(DISTINCT MONTH(g.game_date_time)) = MONTH(CURRENT_DATE)
This query:
Selects rows for current year only.
For each user - calculates the amount of distinct months for payments for this user and compares with current month. If user has payments in each month (including current!) these values are equal.
Count the amount of users matched this condition.
PS. The query will fail in 2021 - for to receive correct info in future use
HAVING COUNT(DISTINCT MONTH(g.game_date_time)) = CASE YEAR(CURRENT_DATE)
WHEN 2020
THEN MONTH(CURRENT_DATE)
ELSE 12
END

How can I optimize the query below which uses three levels of select statements?

How to optimize the below query:
I have two tables, 'calendar_table' and 'consumption', Here I use this query to calculate monthly consumption for each year.
The calendar table has day, month and year for years 2005 - 2009 and consumption table has billed consumption data for monthly bill cycle. This query will count the number of days for each bill and use that the find the consumption for each month.
SELECT id,
date_from as bill_start_date,
theYear as Year,
MONTHNAME(STR_TO_DATE(theMonth, '%m')) as month,
sum(DaysOnBill),
TotalDaysInTheMonth,
sum(perDayConsumption * DaysOnBill) as EstimatedConsumption
FROM
(
SELECT
id,
date_from,
theYear,
theMonth, # use theMonth for displaying the month as a number
COUNT(*) AS DaysOnBill,
TotalDaysInTheMonth,
perDayConsumption
FROM
(
SELECT
c.id,
c.date_from as date_from,
ct.dt,
y AS theYear,
month AS theMonth,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
perDayConsumption
FROM
consumption AS c
INNER JOIN
calendar_table AS ct
ON ct.dt >= c.date_from
AND ct.dt<= c.date_to
) AS allDates
GROUP BY
id,
date_from,
theYear,
theMonth ) AS estimates
GROUP BY
id,
theYear,
theMonth;
It is taking around 1000 seconds to go through around 1 million records. Can something be done to make it faster?.
The query is a bit dubious pretending to do one grouping first and then building on that with another, which actually isn't the case.
First the bill gets joined with all its days. Then we group by bill plus month and year thus getting a monthly view on the data. This could be done in one pass, but the query is joining first and then using the result as a derived table which gets aggregated. At last the results are taken again and "another" group is built, which is actually the same as before (bill plus month and year) and some pseudo aggregations are done (e.g. sum(perDayConsumption * DaysOnBill) which is the same as perDayConsumption * DaysOnBill, as SUM sums one record only here).
This can simply written as:
SELECT
c.id,
c.date_from as bill_start_date,
ct.y AS Year,
MONTHNAME(STR_TO_DATE(ct.month, '%m')) as month,
COUNT(*) AS DaysOnBill,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
SUM(c.perDayConsumption) as EstimatedConsumption
FROM consumption AS c
INNER JOIN calendar_table AS ct ON ct.dt BETWEEN c.date_from AND c.date_to
GROUP BY
c.id,
ct.y,
ct.month;
I don't know if this will be faster or if MySQL's optimizer doesn't see through your query itself and boils it down to this anyhow.

MySQL calculating time difference when one data point is null

I have a query that runs on a patient record system:
SELECT WardTransactions.Id ID,
Genders.Description Gender,
Wards.Code Ward,
TIME_TO_SEC(TIMEDIFF(DischargeDateTime, AdmissionDateTime)) Duration
from WardTransactions
JOIN Wards on WardTransactions.WardId=Wards.Id
JOIN Demographics on WardTransactions.DemographicId=Demographics.Id
JOIN Genders on Demographics.GenderId=Genders.Id
JOIN Visits on WardTransactions.VisitId=Visits.Id
The Issue is that at the time the query is run, DischargeDateTime may be null as the patient is still in the ward. I need to include that record in the calculation, but have the DischargeDateTime set to the current time. I intend to use the Duration data in a jasperReports variable to calculate total, max, min and average times.
I am not sure how to build a query to resolve this issue.
SELECT WardTransactions.Id ID,
Genders.Description Gender,
Wards.Code Ward,
TIME_TO_SEC(TIMEDIFF(
(case WHEN DischargeDateTime IS NULL THEN NOW() ELSE DischargeDateTime END), AdmissionDateTime)) Duration
from WardTransactions
JOIN Wards on WardTransactions.WardId=Wards.Id
JOIN Demographics on WardTransactions.DemographicId=Demographics.Id
JOIN Genders on Demographics.GenderId=Genders.Id
JOIN Visits on WardTransactions.VisitId=Visits.Id
If I understand you, you want NOW() to be used as the discharge timestamp if the patient isn't yet discharged.
Use this expression:
TIME_TO_SEC(TIMEDIFF(IFNULL(DischargeDateTime,NOW()), AdmissionDateTime)) Duration
and you'll get what you need.

Calculate salary of tutor based on distinct sittings using mysql

I have the following table denoting a tutor teaching pupils in small groups. Each pupil has an entry into the database. A pupil may be alone or in a group. I wish to calculate the tutors "salary" as such: payment is based on time spent - this means that for each sitting (with one or more pupils) only one sitting will be calculated - distinct sittings! The start and end times are unix times.
<pre>
start end attendance
1359882000 1359882090 1
1359867600 1359867690 0
1359867600 1359867690 1
1359867600 1359867690 0
1360472400 1360477800 1
1360472400 1360477800 1
1359867600 1359867690 1
1359914400 1359919800 1
1360000800 1360006200 1
1360000800 1360006200 0
1360000800 1360006200 1
</pre>
This is what I tried: with no success - I can't get the right duration (number of hours for all distinct sittings)
SELECT YEAR(FROM_UNIXTIME(start)) AS year,
MONTHNAME(STR_TO_DATE(MONTH(FROM_UNIXTIME(start)), '%m')) AS month,
COUNT(DISTINCT start) AS sittings,
SUM(TRUNCATE((end-start)/3600, 1)) as duration
FROM schedules
GROUP BY
YEAR(FROM_UNIXTIME(start)),
MONTH(FROM_UNIXTIME(start))
Thanks for your proposals / support!
EDIT: Required results
Rate = 25
Year Month Sittings Duration Bounty
2013 February 2 2.2 2.2*25
2013 April 4 12.0 12.0*25
You could probably do something with subqueries, I've had a play with SQL fiddle, how does this look for you. Link to sql fiddle : http://sqlfiddle.com/#!2/50718c/3
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration_mins
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
end-start AS duration
FROM schedules s
) d
GROUP BY
year,
month
I couldn't really see where attendance comes into this at present, you didn't specify. The inner query is responsible for taking the schedules, extracting a start date, and a duration (in seconds).
The outer query then uses these derived values but groups them up to get the sums. You could elaborate from here i.e. maybe you only want to select where attendance > 0, or maybe you want to multiply by attendance.
In this next example I have done this, calculating the duration in hours instead, and calculating the applicable duration for where sessions have >1 attendance along with the appropriate bounty assuming bounty == hours * rate : http://sqlfiddle.com/#!2/50718c/21
SELECT
YEAR(d.date) AS year,
MONTH(d.date) AS month,
COUNT(*) AS sittings,
SUM(d.duration) AS duration,
SUM(
IF(d.attendance>0,1,0)
) AS sittingsWorthBounty,
SUM(
IF(d.attendance>0,d.duration,0)
) AS durationForBounty,
SUM(
IF(d.attendance>0,d.bounty,0)
) AS bounty
FROM (
SELECT
DATE(FROM_UNIXTIME(s.start)) AS date,
s.attendance,
(end-start)/3600 AS duration,
(end-start)/3600 * #rate AS bounty
FROM schedules s,
(SELECT #rate := 25) v
) d
GROUP BY
year,
month
The key point here, is that in the subquery you do all the calculation per-row. The main query then is responsible for grouping up the results and getting your totals. The IF statements in the outer query could easily be moved into the subquery instead, for example. I just included them like this so you could see where the values came from.