Displaying two sums from different unrelated tables - mysql

I have 2 tables. One contains expenses, the other contains earnings.
my_costs:
my_earnings:
From these two tables, I want to get data:
Grouping a period by month and year
The amount of expenses in the period
Average expenses per day in the period
Amount of earnings in the period
I am using the following query:
SELECT DATE_FORMAT(my_costs.DATE, "%M-%Y") AS Period, SUM(my_costs.sum) AS Costs,
ROUND(SUM(my_costs.sum)/DAY(LAST_DAY(my_costs.date)), 0) AS Average,
SUM(my_earnings.sum)
FROM my_costs
LEFT JOIN my_earnings ON DATE_FORMAT(my_costs.DATE, "%M-%Y") = DATE_FORMAT(my_earnings.date, "%M-%Y")
GROUP BY DATE_FORMAT(my_costs.DATE, "%M-%Y")
ORDER BY Period DESC
This request successfully gives me the period, costs and earnings. But the amount of earnings shown is incorrect. The numbers are much higher than expected, should be no more than 10,000 per month

Aggregating after joins can be inaccurate, because rows are duplicated or lost. I would recommend using union all and then aggregating:
SELECT DATE_FORMAT(ce.DATE, '%M-%M') AS Period,
SUM(ce.costs) AS Costs, SUM(ce.Expenses) as expenses,
ROUND(SUM(ce.sum)/DAY(LAST_DAY(ce.date)), 0) AS daily_Average
FROM ((SELECT date, sum as costs, 0 as expenses
FROM my_costs
) UNION ALL
(SELECT date, 0, sum
FROM my_earnings
)
) ce
GROUP BY period
ORDER BY MIN(ce.DATE) DESC;
Note that I changed the ORDER BY so the results are in time order.

Related

SQL query to find the cancellation rate of requests made between two dates using WITH

I'm trying to understand the right way to divide the count sums from two queries.
I'm teaching myself sql and practising it on line.
Question:
Write a SQL query to find the cancellation rate of requests made between 2017-08-01 and 2017-08-03. The cancellation rate is given by dividing the number of cancelled requests by the total number of rides each day. The result table should have 2 Columns, namely Day that shows each day and Cancellation Rate that provides the cancellation rate of that day.
Table is:
What I tried was:
count cancelled ride rates per date
count all ride requests per date
divide both the counts per date
with
cancelled_rides as
(select count(*) cancel_count, status, Request_id
from TRIPS
where status = 'cncld_driver'
group by state, Request_id)
all_rides as (
select count(*) day_count, status, Request_id
from TRIPS
group by state, Request_id) ,
select cancelled_rides.Request_id as DAY,
(cancelled_rides.cancel_count/all_rides.day_count) as 'Cancellation Rate'
FROM cancelled_rides, all_rides;
Does this look right? Note I purposefully ignored including date ranges as the table has only limited entries.
I do not see that a CTE helps at all for this query. Just use conditional aggregation:
select t.Request_id as day, count(*) as total,
sum( status = 'cncld_driver' ) as num_cancelled,
avg( status = 'cncld_driver' ) as cancellation_rate
from trips t
where request_id >= '2017-08-01' and
request_id < '2017-08-04'
group by request_id;
Calling a date "request_id" is rather confusing. You should have a request id that is unique for each row and a separate column with the date/time.

MySQL limit 5 per month

I try to show the 'top 5' per month of worked hours.
I have the following query:
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month,
ORDER BY
year,
month,
activity
I tried: limit 0,5 bit it gives me only the first 5 records of all. How can I show 5 records ordered by month?
In MySQL version 8.0.2 and above, we can utilize Window Functions. We can utilize Row_Number() window function to determine row numbers within a partition of concatenated expression of year and month. Ordering within the partition is done based on the descending order of activity.
We can then use this result-set as a Derived Table, and consider row number up-to 5. This will give us 5 rows per month, having top activity values.
SELECT dt.*
FROM
(
SELECT
concat(m.firstname, " ",m.lastname) AS name,
SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) AS activity,
month(start_activity) AS month,
year(start_activity) AS year,
ROW_NUMBER() OVER (PARTITION BY CONCAT(year(start_activity), month(start_activity))
ORDER BY SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(pl.end_activity,pl.start_activity)))) DESC) AS row_no
FROM
log AS pl
INNER JOIN
employee AS m
ON
m.employee = pl.employee
GROUP BY
name,
year,
month
) AS dt
WHERE dt.row_no <= 5
ORDER BY
dt.year,
dt.month,
dt.activity

How can I optimize the query below which uses three levels of select statements?

How to optimize the below query:
I have two tables, 'calendar_table' and 'consumption', Here I use this query to calculate monthly consumption for each year.
The calendar table has day, month and year for years 2005 - 2009 and consumption table has billed consumption data for monthly bill cycle. This query will count the number of days for each bill and use that the find the consumption for each month.
SELECT id,
date_from as bill_start_date,
theYear as Year,
MONTHNAME(STR_TO_DATE(theMonth, '%m')) as month,
sum(DaysOnBill),
TotalDaysInTheMonth,
sum(perDayConsumption * DaysOnBill) as EstimatedConsumption
FROM
(
SELECT
id,
date_from,
theYear,
theMonth, # use theMonth for displaying the month as a number
COUNT(*) AS DaysOnBill,
TotalDaysInTheMonth,
perDayConsumption
FROM
(
SELECT
c.id,
c.date_from as date_from,
ct.dt,
y AS theYear,
month AS theMonth,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
perDayConsumption
FROM
consumption AS c
INNER JOIN
calendar_table AS ct
ON ct.dt >= c.date_from
AND ct.dt<= c.date_to
) AS allDates
GROUP BY
id,
date_from,
theYear,
theMonth ) AS estimates
GROUP BY
id,
theYear,
theMonth;
It is taking around 1000 seconds to go through around 1 million records. Can something be done to make it faster?.
The query is a bit dubious pretending to do one grouping first and then building on that with another, which actually isn't the case.
First the bill gets joined with all its days. Then we group by bill plus month and year thus getting a monthly view on the data. This could be done in one pass, but the query is joining first and then using the result as a derived table which gets aggregated. At last the results are taken again and "another" group is built, which is actually the same as before (bill plus month and year) and some pseudo aggregations are done (e.g. sum(perDayConsumption * DaysOnBill) which is the same as perDayConsumption * DaysOnBill, as SUM sums one record only here).
This can simply written as:
SELECT
c.id,
c.date_from as bill_start_date,
ct.y AS Year,
MONTHNAME(STR_TO_DATE(ct.month, '%m')) as month,
COUNT(*) AS DaysOnBill,
DAY(LAST_DAY(ct.dt)) as TotalDaysInTheMonth,
SUM(c.perDayConsumption) as EstimatedConsumption
FROM consumption AS c
INNER JOIN calendar_table AS ct ON ct.dt BETWEEN c.date_from AND c.date_to
GROUP BY
c.id,
ct.y,
ct.month;
I don't know if this will be faster or if MySQL's optimizer doesn't see through your query itself and boils it down to this anyhow.

Moving average in SQL

I have a MySQL database populated with power consumption over 20 years.
I want to query the average of the power consumption over every month, from a given month.
For example with this database,
date power_consumption
2014/03/30 30
2014/04/30 40
2014/05/30 50
2014/06/30 20
The result would be, from 2014/04
month average_so_far_from_april_2014
2014/04 40.0
2014/05 45.0
2014/06 36.667
If I cannot achieve this in one query, what query should I go for to retrieve the most useful data for this task? (My naive approach is to query the whole table out and calculate the average in my application.)
Join the table of sub query against the consumption table which gets the unique months against the table of consumption, with a join condition that the year / month is less than or equal to the one from the sub query and use the AVG aggregate function on the power consumption from the table grouped by the year / month
Something like this:-
SELECT consumption_month,
AVG(b.power_consumption)
FROM
(
SELECT DISTINCT DATE_FORMAT(`date`, '%Y%m') AS consumption_month FROM consumption_table a
) a
INNER JOIN consumption_table b
ON consumption_month >= DATE_FORMAT(b.`date`, '%Y%m')
WHERE b.`date` >= '2014/04/01'
GROUP BY consumption_month
SQL fiddle:-
http://www.sqlfiddle.com/#!2/16588/2
If you only had one record per month you could simplify it more by just doing a join of the table against itself without the need for the sub query.
GROUP BY is for this kind of problems. The average is calculated for each distinct value of the expression the the GROUP BY clause.
SELECT DATE_FORMAT(date, '%Y/%m'), AVG(power_consumption)
FROM table_name
WHERE date > ...
GROUP BY DATE_FORMAT(date, '%y/%m')
ORDER BY DATE_FORMAT(date, '%y/%m')
You get the average for each month, DATE_FORMAT(date, '%y/%m') is year and month in format YYYY/MM

MySQL Aggregate function in other aggregate function

I'm having a table with posts. Like (id int, date datetime).
How can I select average posts per day count for each month with one sql request?
Thank you!
This should do it for you:
select month, avg(posts_per_day)
from (select day(date), month(date) as month, count(*) as posts_per_day
from posts group by 1,2) x
group by 1
Explanation: Because you are doing an aggregate on an aggregate, there is no getting around doing a query on a query:
The inner query calculates the number per day and captures the month.
The outer query averages this count , grouping by month.
You can get the number of posts per month like this:
SELECT COUNT(*) AS num_posts_per_month FROM table GROUP BY MONTH(date);
Now we need the number of days in a month:
SELECT COUNT(*) / DATEDIFF(MAKEDATE(YEAR(date), MONTH(date)) + INTERVAL 1 MONTH, MAKEDATE(YEAR(date), MONTH(date))) AS avg_over_month
FROM table GROUP BY MONTH(date);
This will get the average number of posts per day during the calendar month of the post. That is, averages during the current month will continue to rise until the end of the month. If you want real averages during the current month, you have to put in a conditional to get the true number of elapsed days.