How can one group by ranges and perform aggregation in mysql? - mysql

I have a table as shown below with year and quantity of goods sold in each year, I want to group the year column into ranges of Decades and sum the quantity sold in each decade. Having in mind that the First decade is 1980 - 1989, Second decade is 1981 - 1990, so on... The expected result is also shown in the second table below
sample: expected_result:
+------+----------+ +-----------+------------+
| year | qty | | Decades | Total_qnty |
+------+----------+ +-----------+------------+
| 1980 | 2 | | 1980-1989 | 13 |
| 1981 | 1 | | 1981-1990 | 12 |
| 1983 | 8 | | 1982-1991 | 12 |
| 1989 | 2 | | 1983-1992 | 12 |
| 1990 | 1 | | . | . |
| 1992 | 1 | | . | . |
| 1994 | 4 | | . | . |
+------+----------+ +-----------+------------+
Below is the sample code I tried with a couple of others but the result is not as expected,
SELECT t.range AS Decades, SUM(t.qty) as Total_qnty
FROM (
SELECT case
when s.Year between 1980 and 1989 then '1980 - 1989'
when s.Year between 1981 and 1990 then '1981 - 1990'
when s.Year between 1982 and 1991 then '1982 - 1991'
when s.Year between 1983 and 1992 then '1983 - 1992'
else '1993 - above'
end as range, s.qty
FROM sample s) t,
group by t.range
I tried this and this but still could not get the expected result. Also I wouldn't want to hardcode things. Please any help will be appreciated.

After getting insight from xObert's answer to hank99's question I was able to work around the problem with self join as shown below. Note: The raw table contains the name of product and the year it was sold, with repeated product names and year sold. Which explains why I was able to use COUNT(*) to obtain total number of products sold in each decade. Thank you all!
SELECT year1 ||' - '|| year2 AS Decades, Count_of_qnty
FROM
(SELECT s1.year year1, s1.year+9 year2, COUNT(*) AS Count_of_qnty
FROM
(SELECT DISTINCT year FROM sample) s1
JOIN sample s2
ON s2.year>=year1 AND s2.year <= year2
GROUP BY year1)

Related

How to check if a group has three consecutive values in a column?

I have a table games with values such as:
+----------+------+
| game | year |
+----------+------+
| Football | 1999 |
| Football | 2000 |
| Football | 2001 |
| Football | 2002 |
| Cricket | 1996 |
| Tennis | 2001 |
| Tennis | 2002 |
| Tennis | 2003 |
| Tennis | 2009 |
| Golf | 1994 |
| Golf | 1996 |
| Golf | 1997 |
+----------+------+
I am trying to see if a game has an entry with a minimum three consecutive years in the table. My expected output is:
+----------+
| game |
+----------+
| Football |
| Tennis |
+----------+
Because:
Football has four entries out of which four are consecutive years => 1999, 2000, 2001, 2002
Tennis has four entries out of which three are consecutive years => 2001, 2002, 2003
In order to find the rows with a minimum three consecutive entries I first partitioned the table on game and then checked difference between the current and the next row as below:
select game, year, case
when (year - lag(year) over (partition by game order by year)) is null then 1
else year - lag(year) over (partition by game order by year)
end as diff
from games
Output of the above query:
+----------+------+------+
| game | year | diff |
+----------+------+------+
| Football | 1999 | 1 |
| Football | 2000 | 1 |
| Football | 2001 | 1 |
| Football | 2002 | 1 |
| Cricket | 1996 | 1 |
| Tennis | 2001 | 1 |
| Tennis | 2002 | 1 |
| Tennis | 2003 | 1 |
| Tennis | 2009 | 6 |
| Golf | 1994 | 1 |
| Golf | 1996 | 2 |
| Golf | 1997 | 1 |
+----------+------+------+
I am not able to proceed from here on getting the output by filtering the data for each game with its difference.
Could anyone let me know if I am in the right track of the implementation? If not, how do I prepare the query to get the expected output?
You could use a self join approach here:
SELECT DISTINCT g1.Game
FROM games g1
INNER JOIN games g2
ON g2.Game = g1.Game AND g2.Year = g1.Year + 1
INNER JOIN games g3
ON g3.Game = g2.Game AND g3.Year = g2.Year + 1;
Demo
The above query requires any matching game to have at least one record whose year can be found in the following year, and the year after that as well.
You can use lag() and lead() and compare them to the current Year:
with u as
(select *, case
when lag(Year) over(partition by Game order by Year) = Year - 1
and lead(Year) over(partition by Game order by Year) = Year + 1
then 1 else 0
end as consec
from games)
select distinct Game
from u
where consec = 1;
Fiddle
Yes, your initial approach is correct. You were actually really close to fully figuring it out yourself.
What I would do is alter LAG a bit:
year - LAG(year, 2) OVER (
PARTITION BY game
ORDER BY year
ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
)
For each row, this will compare the difference between the year from current row and the year from (current - 2)th row.
If it is the third consecutive row it will yield 2 which you can filter in where clause.
If your data contains duplicates you need to group by game, year first.
Using CTE(Common Table Expression) and the useful ROW_NUMBER window function this can be easily solved.
WITH CTE (name, RN) AS (
select name, ROW_NUMBER() OVER (PARTITION BY name order by year) RN
from game)
Select Distinct name
from CTE
Where RN >= 3

Group Rows in group by though it contains NULL value in mysql / postgres

I have a table from where I am getting month names and some quantity measures.
Table Name = Month_Name
SELECT month_name,q1,q2 FROM month_name;
mysql> SELECT * FROM MONTH;
+------------+------+------+
| month_name | q1 | q2 |
+------------+------+------+
| January | 10 | 20 |
| March | 30 | 40 |
| March | 10 | 5 |
+------------+------+------+
Expected Output:
mysql> SELECT month_name ,SUM(q1),SUM(q2) FROM MONTH GROUP BY month_name;
+------------+---------+---------+
| month_name | sum(q1) | sum(q2) |
+------------+---------+---------+
| January | 10 | 20 |
| Febuary | 0 | 0 |
| March | 40 | 45 |
| April | 0 | 0 |
+------------+---------+---------+
Group by month will not print February and April since these 2 months are not present in base table. I do not want to use Union All since there will be performance issues with union All, Is there any other optimised approach to this.
You can use a calendar table which keeps track of all the month names which you want to appear in your report.
SELECT
m1.month_name,
SUM(q1) AS q1_sum,
SUM(q2) AS q2_sum
FROM
(
SELECT 'January' AS month_name UNION ALL
SELECT 'February' UNION ALL
SELECT 'March' UNION ALL
...
SELECT 'December'
) m1
LEFT JOIN month m2
ON m1.month_name = m2.month_name
GROUP BY
m1.month_name;
Note that while this solve your immediate problem, it is still not ideal, because we don't have any easy way to sort the months. A much better table design would be to maintain a date column. The month name is easily derived from the date.

How to join two tables with average function and where clause? SQL

I have two tables below with the following information
project.analytics
| proj_id | list_date | state
| 1 | 03/05/10 | CA
| 2 | 04/05/10 | WA
| 3 | 03/05/10 | WA
| 4 | 04/05/10 | CA
| 5 | 03/05/10 | WA
| 6 | 04/05/10 | CA
employees.analytics
| employee_id | proj_id | worked_date
| 20 | 1 | 3/12/10
| 30 | 1 | 3/11/10
| 40 | 2 | 4/15/10
| 50 | 3 | 3/16/10
| 60 | 3 | 3/17/10
| 70 | 4 | 4/18/10
What query can I write to determine the average number of unique employees who have worked on the project in the first 7 days that it was listed by month and state?
Desired output:
| list_date | state | # Unique Employees of projects first 7 day list
| March | CA | 1
| April | WA | 2
| July | WA | 2
| August | CA | 1
My Attempt
select
month(list_date),
state_name,
count(*) as Projects,
from projects
group by
month(list_date),
state_name;
I understand the next steps are to subtract the worked_date - list_date and if value is <7 then average count of employees from the 2nd table but I'm not sure what query functions to use.
You could use a CASE with a DISTINCT to COUNT the unique employees that worked within the first 7 days of the list_date.
Once you have that total of employees per project, then you can calculate those averages per month & state.
SELECT
MONTHNAME(list_date) as `ListMonth`,
state,
AVG(TotalUniqEmp7Days) AS `Average Unique Employees of projects first 7 day list`
FROM
(
SELECT
proj.proj_id,
proj.list_date,
proj.state,
COUNT(DISTINCT CASE
WHEN emp.worked_date BETWEEN proj.list_date and DATE_ADD(proj.list_date, INTERVAL 6 DAY)
THEN emp.employee_id
END) AS TotalUniqEmp7Days
-- , COUNT(DISTINCT emp.employee_id) AS TotalUniqEmp
FROM project.analytics proj
LEFT JOIN employees.analytics emp ON emp.proj_id = proj.proj_id
GROUP BY proj.proj_id, proj.list_date, proj.state
) AS ProjectTotals
GROUP BY YEAR(list_date), MONTH(list_date), MONTHNAME(list_date), state;
A Sql Fiddle test can be found here
I think this is the code that you want
select
p.list_date, p.state,
emp.no_of_unique_emp
from project.analytics p
inner join (
select
t.project_id,
count(t.employee_id) as no_of_unique_emp
from (
select distinct employee_id, project_id
from employees.analytics
) t
group by t.project_id
) emp
on emp.project_id = p.project_id
where datediff (p.list_date, getdate()) <= 7

2 columns. Merge duplicate values of A, sum duplicate values of B

I have a table that has 2 columns with data like this (from 1950 to 2015):
| Year | Count |
| 1994 | 10 |
| 1994 | 49 |
| 1994 | 2 |
| 1995 | 13 |
| 1995 | 6 |
I want my query result to be:
| Year | Count |
| 1994 | 61 |
| 1995 | 19 |
Things I have tried:
I began with a simple query like SELECT SUM(Count) FROM 'population' WHERE 'Year' = '1994' which was fine to bring a specific year but I wanted to fill an array with the population of every year in the database.
Doing something like SELECT Year, SUM(Count) FROM 'population' is closer to what I want except it just shown the first year only.
I'm not sure what terms I need to search up to get close to my answer. Union? I tried applying it but I just blerghed.
Try to use
SELECT year,SUM(Count) FROM 'population' group by year

Best way to query for most recent based on datetime (MySQL)

If i had a table that stored a person's name, the time it took them to run a mile, and when they ran said mile, what would be the best way to get a person most recent lap-time.
LAPS
____________________________________________
| name | lap_time | date |
--------------------------------------------
| George | 20.3 | 2013-01-17 09:17:14 |
| Alex | 32.2 | 2013-02-17 14:24:32 |
| Mike | 16.6 | 2013-01-17 07:57:54 |
| Alex | 28.5 | 2013-01-17 19:50:21 |
| Mike | 15.1 | 2013-02-17 12:37:12 |
| Mike | 14.8 | 2013-03-17 06:58:34 |
''''''''''''''''''''''''''''''''''''''''''''
I've been doing it this way, and it has worked for me so far, but I'm curious to know if there is a better way.
SELECT l.lap_time
FROM laps l
INNER JOIN(
SELECT *, MAX(date) as most_recent
FROM laps
WHERE name = 'Alex'
)AS temp ON (
l.date = temp.most_recent
AND l.name = temp.name
)
The actual table that i'm using this type of query on is huge, so i'm looking for the most time efficient way of doing it.
This will work for a single name:
SELECT name, lap_time, date
FROM laps
WHERE name = 'Alex'
ORDER BY date DESC
LIMIT 1
Result
| NAME | LAP_TIME | DATE |
-----------------------------------------------------
| Alex | 32.2 | February, 17 2013 14:24:32+0000 |
See the demo
This should work for all names:
SELECT a.name, b.lap_time, a.date
FROM
(SELECT name, MAX(date) AS date FROM laps GROUP BY name) a
LEFT JOIN laps b
ON b.date = a.date AND b.name = a.name
Result
| NAME | LAP_TIME | DATE |
-------------------------------------------------------
| Alex | 32.2 | February, 17 2013 14:24:32+0000 |
| George | 20.3 | January, 17 2013 09:17:14+0000 |
| Mike | 14.8 | March, 17 2013 06:58:34+0000 |
See the demo
I would test it with this query and with MAX to see which is faster.
Select * from laps where name = 'Alex' order by date desc limit 1;