I am working on a project where I need to write validation query to validate data.
so I have two tables 1. Input table(raw data) 2. Output table(Harmonized data)
Currently, as a validation query, I am using below two queries to fetch results & then copy both results into excel file to validate if there is any difference in data or not.
1 Query
Select Date,sum(Val),sum(Vol)
From Input_table
Group by Date
2 Query
Select Date,sum(Val),sum(Vol)
From Output_table
Group by Date
Is there any way where I can put both these results in one query and also create one calculated column like.... (sum(Input_table.VAL)-sum(Output_table.VAL)) as Validation_Check.
So output will be like:
Date | sum(Input_table.Val) | sum(Output_table.Val) | Validation_Check
thanks.
It looks like you need to full join your results like:
select
ifnull(I.[Date], O.[Date]) as Date,
I.Val as Input_Val,
O.Val as Output_Val,
ifnull(I.Val, 0) - ifnull(O.Val, 0) as Validation_Check
from
(
Select Date,sum(Val) as Val,sum(Vol) as Vol
From Input_table
Group by Date
) as I
full outer join
(
Select Date,sum(Val) as Val,sum(Vol) as Vol
From Output_table
Group by Date
) as O on O.[Date] = I.[Date]
Use UNION. This will join two query on the condition that the two query have the same datatypes in the columns.
Use this Statement:
select Date, sum(Val), sum(Vol) from (
Select Date,Val,Vol
From Input_table
union
Select Date,Val,Vol
From Input_table
)
Group by Date
This will concat the data of both tables in the inner select and then Group it to one result
SELECT Date, SUM(VAL) as SUM_VAL, SUM(VOL) as SUM_VOL, SUM(VAL-VOL) as Validation_Check from
(Select Date,val,vol
From Input_table
UNION ALL
Select Date,val, vol
From Output_table
) X
group by Date
I suggest using a UNION ALL instead of a UNION here since there may be similar results fetched from both queries.
For example, your query 1 has a result like
May 01, 2017 | 5 | 5
and your query 2 has a result with the same values
May 01, 2017 | 5 | 5
If you use union, you'd only get 1 instance of
May 01, 2017 | 5 | 5
instead of 2 instances of
May 01, 2017 | 5 | 5
May 01, 2017 | 5 | 5
If your MySQL Supports FULL JOIN then you can use
SELECT
IFNULL(a.Date, b.Date) AS Date,
SUM(IFNULL(a.Val, 0)) AS Input_Val_Sum,
SUM(IFNULL(b.Val, 0)) AS Output_Val_Sum,
SUM(IFNULL(a.Val, 0) - IFNULL(b.Val, 0)) AS Validation_Check
FROM Input_table AS a
FULL OUTER JOIN Output_table AS b
ON a.Date = b.Date
GROUP BY IFNULL(a.Date, b.Date)
Related
i have an orders table, and i need to fetch the orders record by month. but i have terms if there is no data in a month it should still show the data but forcing to zero like this:
what i have done is using my query:
select sum(total) as total_orders, DATE_FORMAT(created_at, "%M") as date
from orders
where is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
group by DATE_FORMAT(created_at, "%m")
but the result is only fetched the existed data:
can anyone here help me to create the exactly query?
Thank you so much
Whenever you're trying to use a value that doesn't exist in the table, one option is to use a reference; whether it's from a table or a query-generated value.
I'm guessing that in terms of date data, the column created_at in table orders may have a complete list all the 12 months in a year regardless of which year.
Let's assume that the table data for orders spans from 2019 to present date. With that you can simply create a 12 months reference table for a LEFT JOIN operation. So:
SELECT MONTHNAME(created_at) mnt FROM orders GROUP BY MONTHNAME(created_at);
You can append that into your query like:
SELECT IFNULL(SUM(total),0) as total_orders, mnt
from (SELECT MONTHNAME(created_at) mnt FROM orders GROUP BY MONTHNAME(created_at)) mn
LEFT JOIN orders o
ON mn.mnt=MONTHNAME(created_at)
AND is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
GROUP BY mnt;
Apart from adding the 12 months sub-query and a LEFT JOIN, there are 3 other changes from your original query:
IFNULL() is added to the SUM() operation in SELECT to return 0 if the value is non-existent.
All the WHERE conditions has been switched to ON since remaining it as WHERE will make the LEFT JOIN becoming a normal JOIN.
GROUP BY is using the sub-query generated month (mnt) value instead.
Taking consideration of table orders might not have the full 12 months, you can generate it from query. There are a lot of ways of doing it but here I'm only going to show the UNION method that works with most MySQL version.
SELECT MONTHNAME(CONCAT_WS('-',YEAR(NOW()),mnt,'01')) dt
FROM
(SELECT 1 AS mnt UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION
SELECT 9 UNION SELECT 10 UNION SELECT 11 UNION SELECT 12) mn
If you're using MariaDB version that supports SEQUENCE ENGINE, the same query above is much shorter:
SELECT MONTHNAME(CONCAT_WS('-',YEAR(NOW()),mnt,'01'))
FROM (SELECT seq AS mnt FROM seq_1_to_12) mn
I'm using MariaDB 10.5 in this demo fiddle however it seems like the month name ordering is based on the name value rather than the month itself so it looks un-ordered. It's in the correct order if it's in MySQL 8.0 fiddle though.
Thanks all for the answers & comments i really appreciate it.
i solved it by create table helper for static months then use union and aliasing, since i need the months in indonesia, i create case-when function too.
so, the query is like this:
SELECT total_orders,
(CASE date WHEN 01 THEN 'Januari'
WHEN 02 THEN 'Februari'
WHEN 03 THEN 'Maret'
WHEN 04 THEN 'April'
WHEN 05 THEN 'Mei'
WHEN 06 THEN 'Juni'
WHEN 07 THEN 'Juli'
WHEN 08 THEN 'Agustus'
WHEN 09 THEN 'September'
WHEN 10 THEN 'Oktober'
WHEN 11 THEN 'November'
WHEN 12 THEN 'Desember'
ELSE date END ) AS date
FROM (SELECT SUM(total) AS total_orders,
DATE_FORMAT(created_at, "%m") AS date
FROM orders
WHERE is_active = 1
AND tenant_id = 2
AND created_at like '%2021%'
GROUP BY DATE_FORMAT(created_at, "%m")
UNION
SELECT 0 AS total_orders,
code AS date
FROM quantum_default_months ) as Q
GROUP BY date
I still don't know if this query is fully correct or not, but I get my exact result.
cmiiw.
thanks all
If I have the following table in MySQL:
date type amount
2017-12-01 3 2
2018-01-01 1 100
2018-02-01 1 50
2018-03-01 2 2000
2018-04-01 2 4000
2018-05-01 3 2
2018-06-01 3 1
...is there a way to find the sum of the amounts corresponding to the latest dates of each type? There are guaranteed to be no duplicate dates for any given type.
The answer I'd be looking to get from the data above could broken down like this:
The latest date for type 1 is 2018-02-01, where the amount is 50;
The latest date for type 2 is 2018-04-01, where the amount is 4000;
The latest date for type 3 is 2018-06-01, where the amount is 1;
50 + 4000 + 1 = 4051
Is there a way to arrive directly at 4051 in a single query? This is for a Django project using MySQL if that makes a difference; I wasn't able to find an ORM-related solution either, so figured a raw SQL query might be a better place to start.
Thanks!
Not sure for Django but in raw sql you could use a self join to pick latest row for each type based on latest date and then aggregate your results to get the sum of amounts for each type
select sum(a.amount)
from your_table a
left join your_table b on a.type = b.type
and a.date < b.date
where b.type is null
Demo
Or
select sum(a.amount)
from your_table a
join (
select type, max(date) max_date
from your_table
group by type
) b on a.type = b.type
and a.date = b.max_date
Demo
Or by using a correlated subuery
select sum(a.amount)
from your_table a
where a.date = (
select max(date)
from your_table
where type = a.type
)
Demo
For Mysql 8 you can use window functions to get you desired result as
select sum(amount)
from (select *, row_number() over (partition by type order by date desc) as seq
from your_table
) t
where seq = 1;
Demo
Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35
I have a table of records (lets call them TV shows) with an air_date field.
I have another table of advertisements that are related by a show_id field.
I am trying to get the average number of advertisements per show for each date (with a where clause specifying the shows).
I currently have this:
SELECT
`air_date`,
(SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`) AS `num_commercials`,
FROM `show`
WHERE ...
This gives me a result like so:
air_date | num_commercials
2015-6-30 | 6
2015-6-30 | 3
2015-6-30 | 8
2015-6-30 | 2
2015-6-31 | 9
2015-6-31 | 4
When I do a GROUP_BY, it only gives me one of the records, but I want the average for each air_date.
Not too sure I am clear on what you want - but does this do it
SELECT `air_date`,
AVG((SELECT COUNT(*) FROM `commercial` WHERE `show_id` = `show`.`id`)) AS `num_commercials`,
FROM `show`
WHERE .....
GROUP BY `air_date`
(Note double parentheses for AVG function is required)
You can use a sub-query to select count of commercials by air_date/show, then use an outer query to select the average commercials count per air_date.
Something like this should work:
select air_date, avg(num_commercials)
from
(
select show.air_date as air_date,
show.id as show_id,
count(*) as num_commercials
from show
inner join commercial on commercial.show_id = show.id
group by show.air_date, show.id
where ...
) sub
group by air_date
I want to make a MySQL to get daily differential values from a table who looks like this:
Date | VALUE
--------------------------------
"2011-01-14 19:30" | 5
"2011-01-15 13:30" | 6
"2011-01-15 23:50" | 9
"2011-01-16 9:30" | 10
"2011-01-16 18:30" | 15
I have made two subqueries. The first one is to get the last daily value, because I want to compute the difference values from this data:
SELECT r.Date, r.VALUE
FROM table AS r
JOIN (
SELECT DISTINCT max(t.Date) AS Date
FROM table AS t
WHERE t.Date < CURDATE()
GROUP BY DATE(t.Date)
) AS x USING (Date)
The second one is made to get the differential values from the result of the first one (I show it with "table" name):
SELECT Date, VALUE - IFNULL(
(SELECT MAX( VALUE )
FROM table
WHERE Date < t1.table) , 0) AS diff
FROM table AS t1
ORDER BY Date
At first, I tried to save the result of first query in a temporary table but it's not possible to use temporary tables with the second query. If I use the first query inside the FROM of second one between () with an alias, the server complaints about table alias doesn't exist. How can get a something like this:
Date | VALUE
---------------------------
"2011-01-15 00:00" | 4
"2011-01-16 00:00" | 6
Try this query -
SELECT
t1.dt AS date,
t1.value - t2.value AS value
FROM
(SELECT DATE(date) dt, MAX(value) value FROM table GROUP BY dt) t1
JOIN
(SELECT DATE(date) dt, MAX(value) value FROM table GROUP BY dt) t2
ON t1.dt = t2.dt + INTERVAL 1 DAY