SQL: How to make a conditional COUNT and SUM - mysql

Let's say I have the following table:
id | letter | date
--------------------------------
1 | A | 2011-01-01
2 | A | 2011-04-01
3 | A | 2011-04-01
4 | B | 2011-01-01
5 | B | 2011-01-01
6 | B | 2011-01-01
I would like to make a count of the rows broken down by letter and date, and sum the count of all the previous dates. every letter should have a row to every date of the table (ie. letter B doesn't have a 2011-04-01 date, but still appears in the result)
The resulting table would look like this
letter| date | total
--------------------------------
A | 2011-01-01 | 1
A | 2011-04-01 | 3
B | 2011-01-01 | 3
B | 2011-04-01 | 3
How to achieve this in a SQL query?
Thank you for your help!

NOTE
I didn't notice it was mysql, which doesn't support CTE. You may be able to define temporary tables to use this.
This is an interesting problem. You kind of need to join all letters with all dates and then count the preceding rows. If you weren't concerned with having rows for letters that have a count of 0 for the dates, you could probably just do something like this:
SELECT letter, date,
(SELECT COUNT(*)
FROM tbl tbl2
WHERE tbl2.letter = tbl1.letter
AND tbl2.date <= tbl1.date) AS total
FROM tbl
ORDER BY date, letter
/deleted CTE solution/
Solution without CTE
SELECT tblDates.[date], tblLetters.letter,
(SELECT COUNT(*)
FROM tblData tbl2
WHERE tbl2.letter = tblLetters.letter
AND tbl2.[date] <= tblDates.[date]) AS total
FROM (SELECT DISTINCT [date] FROM tblData) tblDates
CROSS JOIN (SELECT DISTINCT letter FROM tblData) tblLetters
ORDER BY tblDates.[date], tblLetters.letter

The requirement
every letter should have a row to every date of the table
requires a cross join of the distinct dates and letters. Once you do that its pretty straight forward
SELECT letterdate.letter,
letterdate.DATE,
COUNT(yt.id) total
FROM (SELECT letter,
date
FROM (SELECT DISTINCT DATE
FROM yourtable) dates,
(SELECT DISTINCT letter
FROM yourtable) letter) letterdate
LEFT JOIN yourtable yt
ON letterdate.letter = yt.letter
AND yt.DATE < yt.letter
GROUP BY letterdate.letter,
letterdate.DATE

A slight variation on the previous:
declare #table1 table (id int, letter char, date smalldatetime)
insert into #table1 values (1, 'A', '1/1/2011')
insert into #table1 values (2, 'A', '4/1/2011')
insert into #table1 values (3, 'A', '4/1/2011')
insert into #table1 values (4, 'B', '1/1/2011')
insert into #table1 values (5, 'B', '1/1/2011')
insert into #table1 values (6, 'B', '1/1/2011')
select b.letter, b.date, count(0) AS count_
from (
select distinct letter, a.date from #table1
cross join (select distinct date from #table1 ) a
) b
join #table1 t1
on t1.letter = b.letter
and t1.date <= b.date
group by b.letter, b.date
order by b.letter

Related

SQL query for grouping by 2 columns and taking the first occurrence of 3rd column

I am not great at SQL queries so I thought I'd ask here. I have a table my_table:
NOTE : Consider all the columns as strings. I just represent them as numbers here for a better understanding.
A B C
-----
1 2 3
2 2 3
2 5 6
3 5 6
I want the result to be-
A B C
-----
1 2 3
2 5 6
So basically, dropping duplicate pairs for B, C, and taking the first occurrence of A for that pair of B, C.
Seems you need to consider the minimum of the column A and grouping by B and C :
select min(cast(A as unsigned)) as A, cast(B as unsigned) as B, cast(C as unsigned) as C
from my_table
group by B , C
cast(<column> as unsigned) conversion is used to make them numeric.
Demo
You seem to want aggregation:
select min(a) as a, b, c
from t
group by b, c;
I assumes "first" means the minimum value of a. SQL tables represent unordered sets, so that seems like the most sensible interpretation.
first, you'd better post some of the something you tried here.
In mysql 8.0 you can use row_number() over (partition by B, C order by A) to slove this question.
CREATE TABLE Table1
(`A` int, `B` int, `C` int)
;
INSERT INTO Table1
(`A`, `B`, `C`)
VALUES
(1, 2, 3),
(2, 2, 3),
(2, 5, 6),
(3, 5, 6)
;
select `A`, `B`, `C` from (
select *,row_number() over (partition by `B`, `C` order by `A`) rnk from Table1
) T
where rnk = 1;
A | B | C
-: | -: | -:
1 | 2 | 3
2 | 5 | 6
db<>fiddle here
if mysql < 8.0 you can follow this answer ROW_NUMBER() in MySQL
Update :
if like #forpas says : taking the first occurrence of A for that pair of B, C is not solved by order by A.
You have to sort the rownum first :
CREATE TABLE Table1
(`A` int, `B` int, `C` int)
;
INSERT INTO Table1
(`A`, `B`, `C`)
VALUES
(2, 2, 3),
(1, 2, 3),
(2, 5, 6),
(3, 5, 6)
;
SET #rownum:=0;
select `A`, `B`, `C` from (
select *,row_number() over (partition by `B`, `C` order by rownum) rnk from (
select *,#rownum:=#rownum+1 AS rownum from Table1
) T
) T
where rnk = 1;
✓
A | B | C
-: | -: | -:
2 | 2 | 3
2 | 5 | 6
db<>fiddle here
select min(A) as A,B,C
from Table1
group by B,C
As, you requested this will get the minimum value of A for combinations of B and C.

Select duplicates while concatenating every one except the first

I am trying to write a query that will select all of the numbers in my table, but those numbers with duplicates i want to append something on the end that shows it as a duplicate. However I am not sure how to do this.
Here is an example of the table
TableA
ID Number
1 1
2 2
3 2
4 3
5 4
SELECT statement output would be like this.
Number
1
2
2-dup
3
4
Any insight on this would be appreciated.
if you mysql version didn't support window function. you can try to write a subquery to make row_number then use CASE WHEN to judgement rn > 1 then mark dup.
create table T (ID int, Number int);
INSERT INTO T VALUES (1,1);
INSERT INTO T VALUES (2,2);
INSERT INTO T VALUES (3,2);
INSERT INTO T VALUES (4,3);
INSERT INTO T VALUES (5,4);
Query 1:
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,(SELECT COUNT(*)
FROM T tt
where tt.Number = t1.Number and tt.id <= t1.id
) rn
FROM T t1
)t1
Results:
| id | Number |
|----|--------|
| 1 | 1 |
| 2 | 2 |
| 3 | 2-dup |
| 4 | 3 |
| 5 | 4 |
If you can use window function you can use row_number with window function to make rownumber by Number.
select t1.id,
(CASE WHEN rn > 1 then CONCAT(Number,'-dup') ELSE Number END) Number
from (
SELECT *,row_number() over(partition by Number order by id) rn
FROM T t1
)t1
sqlfiddle
I made a list of all the IDs that weren't dups (left join select) and then compared them to the entire list(case when):
select
case when a.id <> b.min_id then cast(a.Number as varchar(6)) + '-dup' else cast(a.Number as varchar(6)) end as Number
from table_a
left join (select MIN(b.id) min_id, Number from table_a b group by b.number)b on b.number = a.number
I did this in MS SQL 2016, hope it works for you.
This creates the table used:
insert into table_a (ID, Number)
select 1,1
union all
select 2,2
union all
select 3,2
union all
select 4,3
union all
select 5,4

same column calculation / percentage

How does someone in MYSQL compare a users percentage from a dates entry and score to another dates entry and score, effectively returning a users percentage increase from one date to another?
I have been trying to wrap my head around this question for a few days and am running out of ideas and feel my sql knowledge is limited. Not sure if I'm supposed to use a join or a subquery? The MYSQL tables consist of 3 fields, name, score, and date.
TABLE: userData
name score date
joe 5 2014-01-01
bob 10 2014-01-01
joe 15 2014-01-08
bob 12 2014-01-08
returned query idea
user %inc last date
joe 33% 2014-01-08
bob 17% 2014-01-08
It seems like such a simple function a database would serve yet trying to understand this is out of my grasp?
You need to use SUBQUERIES. Something like:
SELECT name,
((SELECT score
FROM userData as u2
WHERE u2.name = u1.name
ORDER BY date desc
LIMIT 1
)
/
(
SELECT score
FROM userData as u3
WHERE u3.name = u1.name
ORDER BY date desc
LIMIT 1,1
)
* 100.0
) as inc_perc,
max(date) as last_date
FROM userData as u1
GROUP BY name
Simple solution assuming that the formula for %Inc column = total/sum *100
select name,total/sum * 100, date from (
select name,sum(score) as total,count(*) as num,date from table group by name
)as resultTable
select a.name as [user],(cast(cast(b.score as float)-a.score as float)/cast(a.score as float))*100 as '% Inc',b.[date] as lastdate
from userdata a inner join userdata b on a.name = b.name and a.date < b.date
I guess you are looking for the % increse in the score compared to past date
Another way (and note, that I have another result. Based on the name "percinc", percentage increase, I calculated it in my eyes correctly. If you want your result, just calculate it with t1.score / t2.score * 100):
Sample data:
CREATE TABLE t
(`name` varchar(3), `score` int, `date` varchar(10))
;
INSERT INTO t
(`name`, `score`, `date`)
VALUES
('joe', 5, '2014-01-01'),
('bob', 10, '2014-01-01'),
('joe', 15, '2014-01-08'),
('bob', 12, '2014-01-08')
;
Query:
select
t1.name,
t1.score first_score,
t1.date first_date,
t2.score last_score,
t2.date last_date,
t2.score / t1.score * 100 percinc
from
t t1
join t t2 on t1.name = t2.name
where
t1.date = (select min(date) from t where t.name = t1.name)
and t2.date = (select max(date) from t where t.name = t1.name);
Result:
| NAME | FIRST_SCORE | FIRST_DATE | LAST_SCORE | LAST_DATE | PERCINC |
|------|-------------|------------|------------|------------|---------|
| joe | 5 | 2014-01-01 | 15 | 2014-01-08 | 300 |
| bob | 10 | 2014-01-01 | 12 | 2014-01-08 | 120 |
live demo

Find Missing numbers in SQL query

I need to find the missing numbers between 0 and 16.
My table is like this:
CarId FromCity_Id ToCity_Id Ran_Date RunId
1001 0 2 01-08-2013 1
1001 5 9 02-08-2013 2
1001 11 16 03-08-2013 3
1002 0 11 02-08-2013 4
1002 11 16 08-08-2013 5
I need to find out:
In past three months from now(), between which cities the car has not ran.
For example, in the above records:
Car 1001 not ran between 02-05 & 09-11
Car 1002 has run fully (ie between 0-11 and 11-16)
Over all is that, I need to generate a query which shows the section between which the car has not run in past 3 months with showing the last run date.
How to make such an query please. If any Stored Procedure please advise.
God help me. This uses a doubly-correlated subquery, a table that might not exist in your system, and too much caffeine. But hey, it works.
Right, here goes.
SELECT CarId, GROUP_CONCAT(DISTINCT missing) missing
FROM MyTable r,
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid)
GROUP BY CarID;
Produces (changing the first row for CarID 1002 to 0-9 to open up 10 and give us better test data):
+-------+---------+
| CarId | missing |
+-------+---------+
| 1001 | 3,4,10 |
| 1002 | 10 |
+-------+---------+
2 rows in set (0.00 sec)
And how does it all work?
Firstly...
The inner query gives us a list of numbers from 0 to 16:
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
It does that by starting at -1, and then displaying the result of adding 1 to that number for each row in some sacrificial table. I'm using mysql.help_relation because it's got over a thousand rows and most basic systems have it. YMMV.
Then we cross join that with MyTable:
SELECT CarId, ...
FROM MyTable r,
(...) y
This gives us every possible combination of rows, so we have each CarId and To/From IDs mixed with every number from 1-16.
Filtering...
This is where it gets interesting. We need to find rows that don't match the numbers, and we need to do so per CarID. This sort of thing would do it (as long as y.missing exists, which it will when we correlate the subquery):
SELECT m.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND m.CarID = 1001;
Remember: y.missing is set to a number between 1-16, cross-joined with the rows in MyTable. This gives us a list of all numbers from 1-16 where CarID 1001 is busy. We can invert that set with a NOT EXISTS, and while we're at it, correlate (again) with CarId so we can get all such IDs.
Then it's an easy matter of filtering the rows that don't fit:
SELECT CarId, ...
FROM MyTable r,
(...) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid)
Output
To give a sensible result (attempt 1), we could then get distinct combinations. Here's that version:
SELECT DISTINCT CarId, missing
FROM MyTable r,
(SELECT #a := #a + 1 missing
FROM mysql.help_relation, (SELECT #a := -1) t
WHERE #a < 16 ) y
WHERE NOT EXISTS
(SELECT r.CarID FROM MyTable m
WHERE y.missing BETWEEN FromCity_Id AND ToCity_Id
AND r.carid = m.carid);
This gives:
+-------+---------+
| CarId | missing |
+-------+---------+
| 1001 | 3 |
| 1001 | 4 |
| 1001 | 10 |
| 1002 | 10 |
+-------+---------+
4 rows in set (0.01 sec)
The simple addition of a GROUP BY and a GROUP CONCAT gives the pretty result you get at the top of this answer.
I apologise for the inconvenience.
select * from carstable where CarId not in
(select distinct CarId from ranRecordTable where DATEDIFF(NOW(), Ran_Date) <= 90)
Hope this helps.
Here is the idea. Create a list of all cars and all numbers. Then, return all combinations that are not covered by the data. This is hard because there is more than one row for each car.
Here is one method:
select cars.CarId, n.n
from (select distinct CarId from t) cars cross join
(select 0 as n union all select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all select 7 union all
select 8 union all select 9 union all select 10 union all select 11 union all
select 12 union all select 13 union all select 14 union all select 15 union all
select 16
) n
where t.ran_date >= now() - interval 90 day and
not exists (select 1
from t t2
where t2.ran_date >= now() - interval 90 day and
t2.CarId = cars.CarId and
n.n not between t2.FromCity_id and t2.ToCity_id
);
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE Table1
(`CarId` int, `FromCity_Id` int, `ToCity_Id` int, `Ran_Date` datetime, `RunId` int)
;
INSERT INTO Table1
(`CarId`, `FromCity_Id`, `ToCity_Id`, `Ran_Date`, `RunId`)
VALUES
(1001, 0, 2, '2013-08-01 00:00:00', 1),
(1001, 5, 9, '2013-08-02 00:00:00', 2),
(1001, 11, 16, '2013-08-03 00:00:00', 3),
(1002, 0, 11, '2013-08-02 00:00:00', 4),
(1002, 11, 16, '2013-08-08 00:00:00', 5)
;
Query 1:
SELECT r1.CarId,r1.ToCity_Id as Missing_From, r2.FromCity_Id as Missing_To,
max(t.Ran_Date) as Last_Run_Date
FROM (
SELECT #i1:=#i1+1 AS rownum, t.*
FROM Table1 as t, (SELECT #i1:=0) as foo
ORDER BY CarId, Ran_Date) as r1
INNER JOIN (
SELECT #i2:=#i2+1 AS rownum, t.*
FROM Table1 as t, (SELECT #i2:=0) as foo
ORDER BY CarId, Ran_Date) as r2 ON r1.CarId = r2.CarId AND
r1.ToCity_Id != r2.FromCity_Id AND
r2.rownum = (r1.rownum + 1)
INNER JOIN Table1 as t ON r1.CarId = t.CarId
WHERE r1.Ran_Date >= now() - interval 90 day
GROUP BY r1.CarId, r1.ToCity_Id, r2.FromCity_Id
Results:
| CARID | MISSING_FROM | MISSING_TO | LAST_RUN_DATE |
|-------|--------------|------------|-------------------------------|
| 1001 | 2 | 5 | August, 03 2013 00:00:00+0000 |
| 1001 | 9 | 11 | August, 03 2013 00:00:00+0000 |

SQL statement for querying with multiple conditions including 3 most recent dates

I need help in finding the rows that correspond to the most recent date, the next most recent and the one after that, where some condition ABC is "Y" and group it by a column name XYZ ASC but XYZ can appear multiple times. So, say XYZ is 50, then for the rows in the three years, the XYZ will be 50. I have the following code that executes but returns only two rows out of thousands which is impossible. I tried executing just the date condition but it returned dates that were less than or equal to MAX(DATE)-3 as well. Don't know where I am going wrong.
select * from money.cash where DATE =(
select
MAX(DATE)
from
money.cash
where
DATE > (select MAX(DATE)-3 from money.cash)
)
GROUP BY XYZ ASC
having ABC = "Y";
The structure of the table is as follows (only a schematic, not the real thing).
Comp_ID DATE XYZ ABC $$$$ ....
1 2012-1-1 10 Y SOME-AMOUNT
2 2011-1-1 10 Y
3 2006-1-1 10 Y
4 2011-1-1 20 Y
5 2002-1-1 20 Y
6 2000-1-1 20 Y
7 1998-1-1 20 Y
The desired o/p would be the first three rows for XYZ=10 in ascending order and the most recent 3 dates for XYZ=20.
LAST AND IMPORTANT-This table's values keeps changing as new data comes in. So, the o/p(which will be in a new table) must reflect the dynamics in the 1st/original/above TABLE.
MySQL doesn't have functionallity that is friendly to greatest-n-per-group queries.
One option would be...
- Find the MAX(Date) per group (XYZ)
- Then use that result to find the MAX(Date) of all records before that date
- Then do it again for all records before that date
It's really innefficient, but MySQL hasn't got the functionality required to do this efficiently. Sorry...
CREATE TABLE yourTable
(
comp_id INT,
myDate DATE,
xyz INT,
abc VARCHAR(1)
)
;
INSERT INTO yourTable SELECT 1, '2012-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 2, '2011-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 3, '2006-01-01', 10, 'Y';
INSERT INTO yourTable SELECT 4, '2011-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 5, '2002-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 6, '2000-01-01', 20, 'Y';
INSERT INTO yourTable SELECT 7, '1998-01-01', 20, 'Y';
SELECT
yourTable.*
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
lookup.XYZ,
COALESCE(MAX(yourTable.myDate), lookup.MaxDate) AS MaxDate
FROM
(
SELECT
yourTable.XYZ,
MAX(yourTable.myDate) AS MaxDate
FROM
yourTable
WHERE
yourTable.ABC = 'Y'
GROUP BY
yourTable.XYZ
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
LEFT JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate < lookup.MaxDate
AND yourTable.ABC = 'Y'
GROUP BY
lookup.XYZ,
lookup.MaxDate
)
AS lookup
INNER JOIN
yourTable
ON yourTable.XYZ = lookup.XYZ
AND yourTable.myDate >= lookup.MaxDate
WHERE
yourTable.ABC = 'Y'
ORDER BY
yourTable.comp_id
;
DROP TABLE yourTable;
There are other options, but they're all a bit hacky. Search SO for greatest-n-per-group mysql.
My results using your example data:
Comp_ID | DATE | XYZ | ABC
------------------------------
1 | 2012-1-1 | 10 | Y
2 | 2011-1-1 | 10 | Y
3 | 2006-1-1 | 10 | Y
4 | 2011-1-1 | 20 | Y
5 | 2002-1-1 | 20 | Y
6 | 2000-1-1 | 20 | Y
Here's another way, hopefully more efficient than Dems' answer.
Test it with an index on (abc, xyz, date):
SELECT m.xyz, m.date --- for all columns: SELECT m.*
FROM
( SELECT DISTINCT xyz
FROM money.cash
WHERE abc = 'Y'
) AS dm
JOIN
money.cash AS m
ON m.abc = 'Y'
AND m.xyz = dm.xyz
AND m.date >= COALESCE(
( SELECT im.date
FROM money.cash AS im
WHERE im.abc = 'Y'
AND im.xyz = dm.xyz
ORDER BY im.date DESC
LIMIT 1
OFFSET 2 --- to get 3 latest rows per xyz
), DATE('1000-01-01') ) ;
If you have more than rows with same (abc, xyz, date), the query may return more than 3 rows per xyz (all tied in 3rd place will all be shown).