MySQL query GROUP BY two columns - mysql

I need help with MySQL query
I have this cenary:
table A
- page_id
- rev_id
- date
one page_id can have multiples rev_id
table B
- rev_id
- words
I have what words have in each revision
I need return for each date the quantity of words that I have in the
last rev_id in each page_id
Example:
table A
page_id | rev_id | date
---------------------------------
1 | 231 | 2002-01-01
2 | 345 | 2002-10-12
1 | 324 | 2002-10-13
3 | 348 | 2003-01-01
--
table B
rev_id | words
---------------
231 | 'ask'
231 | 'the'
231 | 'if'
345 | 'ask'
324 | 'ask'
324 | 'if'
348 | 'who'
magical sql here edited to show how its calculated {page_id : [words]}
date | count(words)
--------------------------
2002-01-01 | 3 { 1:[ask, the, if] }
2002-10-12 | 4 { 1:[ask, the, if], 2:[ask] }
2002-10-13 | 3 { 1:[ask, if], 2:[ask] }
2003-01-01 | 4 { 1:[ask, if], 2:[ask], 3:[who] }
I did this query, but my date are fixed and I need for all dates contained in table revision:
SELECT SUM(q)
FROM (
SELECT COUNT(equation) q
FROM revision r, equation e
WHERE r.rev_id in (
SELECT max(rev_id)
FROM revision
WHERE date < '2006-01-01'
GROUP BY page_id
)
AND r.rev_id = e.rev_id
GROUP BY date
) q;
Solved
My friend help-me to create query to solve my problem!
select s.date, count(words) from
(select d.date, r.page_id, max(r.rev_id) as rev_id
from revision r, (select distinct(date) from revision) d
where d.date >= r.date group by d.date, r.page_id) s
join words e on e.rev_id = s.rev_id
group by s.date;

I think this is a basic join and group by:
select a.date, count(*)
from a join
b
on a.rev_id = b.rev_id
group by a.date;
EDIT:
Oh, I think I get it. This is a cumulative thing. That makes it more complicated.
select d.date,
(select count(*)
from a join
b
on a.rev_id = b.rev_id
where a.date <= d.date and
a.rev_id = (select max(a2.rev_id) from a a2 where a2.date = a.date and a2.date <= d.date)
) as cnt
from (select date from a) d;
But that won't work in MySQL because of the nesting of the correlation clause. So, we can restructure the logic as:
select a.date, count(*)
from (select a.*,
(select max(a2.rev_id)
from a a2
where a2.date <= a.date and a2.page_id = a.page_id
) as last_rev_id
from a
) a join
b
on a.last_rev_id = b.rev_id
group by a.date;

Related

Get the latest record of datetime field by date value

I have a table like
id | start | Value | Value2 | Value3
1 | 2019-01-01 22:15:02 | A | P | C
2 | 2019-01-01 22:35:23 | B | O | G
4 | 2019-01-02 22:35:36 | C | D | H
5 | 2019-01-02 22:37:15 | D | C | F
7 | 2019-01-03 17:26:36 | C | K | M
10 | 2019-01-03 12:05:15 | D | J | L
I have a lot of records for the same day, but different time.
I need to select the latest of each day from a DateTime field.
It should return the records of IDs:
id: 2 for Jan 1
id: 5 for Jan 2nd
id: 7 for January 3rd
Tried without success:
SELECT value, value2, value3
FROM myTable AS mt
INNER JOIN (
SELECT id, MAX(start)
FROM myTable
GROUP BY start
) AS b ON mt.id = b.id
I get no errors, but the data are mixed up. It shows the latest dateTime value, but the rest of the fields (Value, value2, value3) are wrong. They don't match with the latest row.
There are several possible solutions:
SELECT mt.<columns>
FROM myTable AS mt
INNER JOIN (
SELECT DATE(start) as start_date, MAX(start) AS start
FROM myTable
GROUP BY DATE(start)
) AS b ON mt.start = b.start;
I like to use an exclusion join. Look for another row with a greater start datetime on the same date. The no such row exists, then mt must have the greatest time for a given date.
SELECT mt.<columns>
FROM myTable AS mt
LEFT OUTER JOIN myTable AS mt2
ON DATE(mt.start) = DATE(mt2.start) AND mt.start < mt2.start
WHERE mt2.start IS NULL;
You can also use a window function if you're using MySQL 8.0:
SELECT * FROM (
SELECT mt.<columns>,
ROW_NUMBER() OVER (PARTITION BY DATE(start) ORDER BY start DESC) AS rownum
FROM myTable AS mt
) AS b
WHERE b.rownum = 1;

Mysql Count row data by every date but only have few data date

I have a table "activity" like this
idEmployee | activity | Date
1 | a | 2019/01/01
1 | b | 2019/01/01
2 | c | 2019/01/01
2 | d | 2019/01/01
1 | e | 2019/01/02
2 | f | 2019/01/03
1 | f | 2019/01/03
3 | c | 2019/01/01
4 | d | 2019/01/03
1 | e | 2019/01/02
2 | f | 2019/01/03
and i want to count every date from 2019/01/01 - 2019/01/03 that has no activity by every idEmpolyee (as total_no_actitivity) like this
idEmployee | total_no_activity
1 | 0
2 | 1 (2019/01/02
3 | 2 (2019/01/02,2019/01/03)
4 | 2 (2019/01/01,2019/01/02)
but i only can select idemployee that has no activity , without count total_no_activity.
SELECT idEmployee, namaLengkap, date
FROM account LEFT JOIN timesheet USING (idEmployee)
WHERE NOT EXISTS (SELECT idEmployee
FROM timesheet
WHERE account.idEmployee = timesheet.idEmployee AND weekday(date) AND date between '2019/08/05' and '2019/08/09' AND idrole = '4' AND statusaktif = '1' )
ORDER BY idEmployee ASC
is it possible to count total_no_activity with table "activity" only?
SELECT idEmployee,
3 - COUNT(DISTINCT `Date`) total_no_activity
FROM account
WHERE `Date` BETWEEN `2019/01/01` AND `2019/01/03`
GROUP BY idEmployee
where 3 is the amount of days in the period if interest, inclusive.
If some idEmployee have no records at all in the period in interest then this value will not be listed in output.
unfortunately i need the idEmployee that have no records will be listed in the output
Assiming that you need all idEmployee values which are present in source table at least once (maybe even out of the period in interest) use
SELECT account.idEmployee,
3 - COUNT(DISTINCT account.`Date`) total_no_activity
FROM (SELECT DISTINCT idEmployee FROM account) all_employees
LEFT JOIN account USING (idEmployee)
WHERE account.`Date` BETWEEN `2019/01/01` AND `2019/01/03`
GROUP BY account.idEmployee
I would suggest:
select a.idEmployee,
(datediff(params.date2, params.date1) + 1 -
count(distinct ac.date)
) as missing_days
from (select date('2019-01-01') as date1, date('2019-01-03') as date2
) params cross join -- a convenience so we don't have to retype the constants
accounts a left join
activity ac
on ac.idEmployee = a.idEmployee and
ac.date >= params.date1 and
ac.date <= params.date2
group by a.idEmployee;
To prevent typos and to allow the dates to change easily, this introduces a subquery, params, that has the date values.

How to Select First Date, Previous Date, Latest Date where first date is higher than a reference date

I want to SELECT the Latest Date, the Second Latest Date and the First Date FROM a table1 where the First Date is higher than a reference Date found in another table2. And that reference Date should also be the latest from that table2. I have a solution, supposed to be. But the problem is, the solutions will not return an output if there is ONLY 1 record from table1. Example of the tables:
table1
Reg ID | DateOfAI | byTechnician
2GP001 | 2015-01-13 | 31
2GP001 | 2015-02-18 | 31
2GP001 | 2017-11-10 | 45
2GP001 | 2017-11-30 | 32
2GP044 | 2017-11-30 | 28
2GP001 | 2017-12-23 | 32
table2
Reg ID | DateOfCalving | DryOffDate
2GP001 | 2016-01-14 |
2GP070 | 2016-01-14 |
2GP065 | 2017-04-08 |
2GP001 | 2017-04-12 |
my expected output would be:
Reg ID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LastestAIDate
2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23
I have searched everywhere from the moon and back... still no luck. these are the queries that i have used
the Fisrt:
SELECT b.actualDam,COUNT(x.actualDam) AS ilanba, max(b.breedDate) AS huli, max(x.breedDate) AS nex,MIN(x.breedDate) AS una,IFNULL(c.calvingDate,NULL) AS nganak,r.*,h.herdID,a.animalID,a.regID, IFNULL(a.dateOfBirth,NULL) AS buho
FROM x_animal_breeding_rec b
LEFT JOIN x_animal_calving_rec c ON b.recID=c.brecID
LEFT JOIN x_herd_animal_rel r ON b.actualDam=r.animal
LEFT JOIN x_herd h ON r.herd=h.herdID
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
JOIN x_animal_breeding_rec x ON b.actualDam = x.actualDam AND x.breedDate < b.breedDate
WHERE h.herdID = ? AND x.mateType = ? AND x.recFlag = ? GROUP BY b.actualDam
and the Second one that I've tried is this code:
SELECT b.recID
, b.actualDam
, b.breedDate
, min(b.breedDate) AS una
, max(b.breedDate) AS huli
, COUNT(b.actualDam) AS sundot
, b.mateType
, b.recFlag
, a.animalID
, a.regID
, h.*
FROM
( SELECT c.recID, c.actualDam
, c.breedDate
, c.mateType
, c.recFlag
, CASE WHEN #prev=c.recID THEN #i:=#i+1 ELSE #i:=1 END i
, #prev:=c.recID prev
FROM x_animal_breeding_rec c
, ( SELECT #prev:=null,#i:=0 ) vars
ORDER BY c.recID,c.breedDate DESC
) b
LEFT JOIN x_animal_main_info a ON b.actualDam=a.animalID
LEFT JOIN x_herd_animal_rel h ON b.actualDam=h.animal
WHERE i <= 2 GROUP BY b.actualDam HAVING h.herd = ? AND b.mateType = ? AND b.recFlag = ? ORDER BY b.breedDate DESC
Another problem here is the first solution returns a WRONG COUNT. the second solution returns a CORRECT COUNT, however, wrong Dates were returned. I hope you could give me an idea. Thanx in Advance.
The following query answers your question:
SELECT
RegID,
LatestDateOfCalving,
MIN(DateOfAI) AS 1stDateOfAI,
REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(DateOfAI ORDER BY DateOfAI DESC), ',', 2), CONCAT(MAX(DateOfAI), ','), '') AS PreviousAIDate,
MAX(DateOfAI) AS LatestAIDate
FROM (
SELECT
t1.RegID,
LatestDateOfCalving,
DateOfAI,
IF(DateOfAI >= LatestDateOfCalving, 1, 0) AS dates
FROM table1 AS t1
INNER JOIN (
SELECT
RegID,
MAX(DateOfCalving) AS LatestDateOfCalving
FROM table2 GROUP BY RegID
) AS tt2 ON t1.RegID = tt2.RegID) AS x
WHERE dates = 1
GROUP BY RegID
HAVING COUNT(dates) >= 3;
Output:
+--------+---------------------+-------------+----------------+--------------+
| RegID | LatestDateOfCalving | 1stDateOfAI | PreviousAIDate | LatestAIDate |
+--------+---------------------+-------------+----------------+--------------+
| 2GP001 | 2017-04-12 | 2017-11-10 | 2017-11-30 | 2017-12-23 |
+--------+---------------------+-------------+----------------+--------------+
DEMO
In a subquery we select RegID and LatestDateOfCalving from table2 in order to have a reference date. Then join it to table1 and flag the record whether DateOfAI is greater or equal to LatestDateOfCalving (IF(DateOfAI >= LatestDateOfCalving, 1, 0)). We use this subquery in the outer query (SELECT RegID, LatestDateOfCalving, MIN(DateOfAI) AS 1stDateOfAI, MAX(DateOfAI) AS LatestAIDate, ...) and select only those records where the DateOfAI are at or after LatestDateOfCalving (WHERE dates = 1, where 1 is the flag where the condition was true) and have at least 3 records (HAVING COUNT(dates) >= 3). In the outer query I use the REPLACE(SUBSTRING_INDEX(GROUP_CONCAT(...))) structure in order to extract the previousAIDate from a comma (,) separated list of dates.

MySQL count daily new users VS returned users (cohort analysis)

The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)

MySQL count rows within the same intervals to eachother

I have a table where one column is the date:
+----------+---------------------+
| id | date |
+----------+---------------------+
| 5 | 2012-12-10 10:12:37 |
+----------+---------------------+
| 4 | 2012-12-10 09:09:55 |
+----------+---------------------+
| 3 | 2012-12-09 21:12:35 |
+----------+---------------------+
| 2 | 2012-12-09 20:15:07 |
+----------+---------------------+
| 1 | 2012-12-09 20:01:42 |
+----------+---------------------+
What I need, is to count the rows which are for example whitin 3 hours to each other. In this example I want to join the upper row with the 2nd row, and the 3rd row with the 4th and 5th rows. So my output should be like this:
+----------+---------------------+---------+
| id | date | count |
+----------+---------------------+---------+
| 5 | 2012-12-10 10:12:37 | 2 |
+----------+---------------------+---------+
| 3 | 2012-12-09 21:12:35 | 3 |
+----------+---------------------+---------+
How could I do this?
I think you need a self-join for this:
select t.id, t.date, COUNT(t2.id)
from t left outer join
t t2
on t.date between t2.date - interval 3 hour and t2.date + interval 3 hour
group by t.id, t.date
(This is untested code so it might have a syntax error.)
If you are trying to divide everything into 3-hour intervals, you can do something like:
select max(t.date), t.id, count(*)
from (select t.*,
(date(date)*100 + floor(hour(date)/3)*3) as interval
from t
) t
group by interval
I am not sure how to do this with My SQL but i am able to build a set of queries in SQL Server 2005 which will provide the intended results. Here is the working sample, its very complex and may be overly complex but that's how i was able to get the desired result:
WITH BaseData AS
(
SELECT 5 AS ID, '2012-12-10 10:12:37' AS Date
UNION ALL
SELECT 4 AS ID, '2012-12-10 09:09:55' AS Date
UNION ALL
SELECT 3 AS ID, '2012-12-09 21:12:35' AS Date
UNION ALL
SELECT 2 AS ID, '2012-12-09 20:15:07' AS Date
UNION ALL
SELECT 1 AS ID, '2012-12-09 20:01:42' AS Date
),
BaseDataWithRowNum AS
(
SELECT ID,DATE, ROW_NUMBER() OVER (ORDER BY Date DESC) AS RowNum
FROM BaseData
),
InterRelatedDates AS
(
SELECT B1.RowNum AS RowNum1,B2.RowNum AS RowNum2
FROM BaseDataWithRowNum B1
INNER JOIN BaseDataWithRowNum B2
ON B1.Date BETWEEN B2.Date AND DATEADD(hh,3,B2.Date)
AND B1.RowNum < B2.RowNum
AND B1.ID != B2.ID
),
InterRelatedDatesWithinMultipleGroups AS
(
SELECT G1.RowNum1,G2.RowNum2
FROM InterRelatedDates G1
LEFT JOIN InterRelatedDates G2
ON G1.RowNum2 = G2.RowNum2
AND G1.RowNum1 != G2.RowNum1
)
SELECT BN.ID,
BN.Date,
CountExcludingOriginalGrouppingRecord +1 AS C
FROM
(
SELECT RowNum1 AS RowNum,COUNT(1) AS CountExcludingOriginalGrouppingRecord
FROM
(
-- If a row was used in only one group then it is ok. use as it is
SELECT D1.RowNum1
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NULL
UNION ALL
-- In case a row was selected in two groups, choose the one with higher date
SELECT Min(D1.RowNum1)
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NOT NULL
GROUP BY D1.RowNum2
) T
GROUP BY RowNum1
) T2
INNER JOIN BaseDataWithRowNum BN
ON BN.RowNum = T2.RowNum