Let's consider this example :
Employee Function Start_dept End_dept
A dev 10 13
A dev 11 12
A test 9 9
A dev 13 11
What I want to select is employee, their function and the distinct departments in BOTH "start" and "end" department. It will give this result :
Employee Function count_distinct_dept
A dev 4
A test 1 `
For the dev A, we have only 4 distinct departments (10, 11, 12 and 13) because we shouldn't count duplicate values in the 2 columns (start and end).
How can I do this ? (I'm using mySQL).
Is it possible to do this on one request without any JOIN or any UNION ? Or is it obligatory to use one of them ? Since I am using a huge database (with more than 3 billions lines), I am not sure if a join or union request will be optimal...
Use a union all and aggregation:
select Employee, Function, count(distinct dept)
from ((select Employee, Function, Start_dept as dept
from e
) union all
(select Employee, Function, End_dept
from e
)
) e
group by Employee, Function;
If you want performance, I would suggest starting with two indexes on (Employee, Function, Start_Dept) and (Employee, Function, End_Dept). Then:
select Employee, Function, count(distinct dept)
from ((select distinct Employee, Function, Start_dept as dept
from e
) union all
(select distinct Employee, Function, End_dept
from e
)
) e
group by Employee, Function;
The subqueries should be scanning the index rather than the overall table. You will still need to do the final GROUP BY. I am guessing that COUNT(DISTINCT) is a better approach than UNION in the subquery, but you could test that.
Related
This question already has answers here:
MySQL how to fill missing dates in range?
(6 answers)
Closed last year.
It was hard to find a good title for my question.
I have 3 tables: materials, orders and order_contents.
There are 5 different types of materials in the materials table
The orders table contains the dates for the orders. Orders currently span over 4 months.
The orders are filled with materials in the table called order_contents.
I am trying to get the overall cost per month for materials and display them in a highchart.
Here's the query I run:
SELECT m.name, CONCAT(MONTH(o.order_date), '/', YEAR(o.order_date)) as `month`, SUM(oc.weight * m.price) AS cost
FROM order_contents oc
INNER JOIN orders o ON oc.order_id = o.id
INNER JOIN materials m ON oc.material_id = m.id
GROUP BY MONTH(o.order_date), m.id
ORDER BY m.name, order_date ASC
Here are the results:
The problem is that if a material isn't used in a particular month, it won't generate a record for it (obviously). So when I loop through the results and try to form the hightable data series, it won't fill a month with zero. For example, the material Big Bag is only consumed in January 2022, but since it's the only entry in the data series, it maps with the first month, which is August. I can add logic to fix this problem but I thought I'd ask here first if there is a way to reformat this query to yield the results I'm looking for.
Here's what I'd like to get:
I'm way out of my league here on SQL capabilities for this sort of problem.
Here is a (probably wired) idea:
For a SQL table:
create table temp
(
month int,
year int,
name varchar(16), -- something like material-type
count int,
)
We could run:
select const_year.year, const_month.month, const_name.name, ifnull(count, 0)
from (select 1 month union
select 2 union
select 3 union
select 4 union
select 5 union
select 6 union
select 7 union
select 8 union
select 9 union
select 10 union
select 11 union
select 12) const_month -- now we have a list contains 12 months
left join (select 2020 year union
select 2021 union
select 2022 ) const_year
on true -- now we have a table contains all months between those years
left join (select distinct temp.name as name
from temp) const_name
on true -- join with all distinct names/types
left join (select temp.name as name, temp.year as year, temp.month as month, sum(count) as count
from temp
group by temp.year, temp.month, temp.name -- here is the real query for statistic
) statistic
on statistic.year = const_year.year
and statistic.month = const_month.month
and statistic.name = const_name.name
order by name, year, month -- order results if we need
I think there definitely have more better solution than this. Though it's working for some case.
I just want to ask you please this question on SQL.
Let's consider this EMPLOYEE table :
Employee Department
A 10
A 10
A 11
A 12
B 13
B 13
What I want to display is for each employee, all distinct departments (without duplicates) AND the total number of those distinct departments. So, something like this :
Employee Department total_dept
A 10 3
A 11 3
A 12 3
B 13 1
If possible, I would even prefer something like these :
Employee Department total_dept
A 10 3
A 11 null
A 12 null
B 13 1
I have a very big table (with many columns and many data) so I thought this can be an "optimisation", no ? I mean, there is no need to store the total_dept in all rows. Just put it once it's sufficient. No problem if after this I left the column empty. But I don't know if it's possible to do such thing in SQL.
So, how can I fix this please ? I tried but it seems impossible to combine count(column) with the same column...
Thank you in advance
This might be what you are looking for
SELECT
emp,
dept,
(select count(distinct dept) from TB as tbi where tb.emp = tbi.emp ) x
FROM TB
group by emp, dept;
MySQL 8.0 supports windowed COUNT:
SELECT *,COUNT(*) OVER (PARTITION BY Employee) AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
db<>fiddle demo
You could even have second resulset(I recommend to leave presentation matter to apllication layer):
SELECT *, CASE WHEN ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY Department) = 1
THEN COUNT(*) OVER (PARTITION BY Employee) END AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
ORDER BY Employee, Department;
db<>fiddle demo
For the 2nd version:
SELECT
DISTINCT e.Employee, e.Department,
CASE
WHEN e.Department =
(SELECT MIN(Department) FROM Employees WHERE Employees.Employee = e.Employee)
THEN
(SELECT COUNT(DISTINCT Department) FROM Employees WHERE Employees.Employee = e.Employee)
END AS total_dept
FROM Employees e
ORDER BY e.Employee, e.Department;
See the demo
Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35
Wondering is there is a way to write the following in ONE MySQL query.
I have a table:
cust_ID | rpt_name | req_secs
In the query I'd like to get:
the AVG req_secs when grouped by cust_ID
the AVG req_secs when grouped by rpt_name
the total req_secs AVG
I know I can do separate grouping queries on the same table then UNION the results into one. But I was hoping there was some way to do it in one query.
Thanks.
Well, the following would does two out of three:
select n,
(case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cust_id else rpt_name end);
This essentially "doubles" the data and then does the aggregation for each group. This assumes that cust_id and rpt_name are of compatible types. (The query could be tweaked if this is not the case.)
Actually, you can get the overall average by using rollup:
select n,
(case when n = 1 then cust_id else rpt_name end) as grouping,
avg(req_secs)
from t cross join
(select 1 as n union all select 2
) n
group by n, (case when n = 1 then cast(cust_id as varchar(255)) else rpt_name end) with rollup
This works for average because the average is the same on the "doubled" data as for the original data. It wouldn't work for sum() or count().
No there is not. You can group by a combination of cust_ID and rpt_name at the same time (i.e. two levels of grouping) but you are not going to be able to do separate top-level groupings and then a non-grouped aggregation at the same time.
Because of the way GROUP BY works, the SQL to do this is a little tricky. One way to get the result is to get three copies of the rows, and group each set of rows separately.
SELECT g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,'')) AS gval
, AVG(t.req_secs) AS avg_req_secs
FROM (SELECT 'cust_id' AS gkey UNION ALL SELECT 'rpt_name' UNION ALL SELECT 'total') g
CROSS
JOIN mytable t
GROUP
BY g.gkey
, IF(g.grp='cust_id',t.cust_ID,IF(g.grp='rpt_name',t.rpt_name,''))
The inline view aliased as "g" doesn't have to use UNION ALL operators, you just need a rowset that returns exactly 3 rows with distinct values. I just used the UNION ALL as a convenient way to return three literal values as a rowset, so I could join that to the original table.
I have a table with these columns:
id
user_id
player_in
player_out
date
I need to make a report that count the number of repetitions each "player" both in player_in field, as in player_out field.
For example, if I have this 2 rows in the table (in the respective order).
id user_id player_in player_out
1 1 88 56
2 7 77 88
The result for the player 88 will be 2, and for the players 56 and 77, just 1
Use a subquery that employs union all to get the two column into one column, then use a standard count(*):
Note: Thus query included individual totals for ins and outs as per further request in comments to this answer.
select
player_id,
count(*) as total,
sum(ins) as ins,
sum(outs) as outs
from (
select
player_in as player_id,
1 as ins,
0 as outs
from mytable
union all
select player_out, 0, 1
from mytable
) x
group by player_id
Note: you must use union all (not just union), because union removes duplicates whereas union all does not.
You could use a cross-join to a 2-row virtual table to unpivot the player_* columns, then group the results, like this:
SELECT
player,
COUNT(*) AS total_count
FROM (
SELECT
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
That is, every row of the original table is essentially duplicated and each copy of the row supplies either player_in or player_out, depending on whether the derived table's is_in column is TRUE or FALSE, to form a single player column. This method of unpivoting might perform better than the UNION method suggested by #Bohemian because this way the (physical) table is passed just once (but you'd need to test and compare both methods to determine if there's any substantial benefit to this approach in your particular situation).
To calculate in and out counts, as you have requested in one of your comments to the above mentioned answer, you could extend my original suggestion like this:
SELECT
player,
COUNT( is_in OR NULL) AS in_count,
COUNT(NOT is_in OR NULL) AS out_count,
COUNT(*) AS total_count
FROM (
SELECT
x.is_in,
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
As you can see, the derived table now additionally returns the is_in column in its own right, and the column is used in two conditional aggregations for counting how many times a player was in and out. (If you are interested, the OR NULL trick is explained here.)
You could also rewrite the COUNT(condition OR NULL) entries as SUM(condition). That would certainly shorten both expressions, some also find the SUM method of counting clearer/more elegant. In either event, there would likely be no difference in performance, so choose whichever method suits your taste better.
A SQL Fiddle demo of the second query can be found here.