Count number of rows in each day grouped by another field - mysql

For the purpose of drawing an activity chart, how can we count number of rows for each type (distinct field value) in each day?
Consider a table with a date field and a field for each type:
CREATE TABLE TableName
(`PK` int, `type` varchar(1), `timestamp` datetime)
;
INSERT INTO TableName
(`PK`, `type`, `timestamp`)
VALUES
(11, 'Q', '2013-01-04 22:23:56'),
(7, 'A', '2013-01-03 22:23:41'),
(8, 'C', '2013-01-04 22:23:42'),
(10, 'Q', '2013-01-05 22:23:56'),
(5, 'C', '2013-01-03 22:23:25'),
(12, 'Q', '2013-01-05 22:23:57'),
(6, 'Q', '2013-01-07 22:23:40'),
(4, 'Q', '2013-01-02 22:23:23'),
(9, 'A', '2013-01-05 22:23:55'),
(1, 'A', '2013-01-08 21:29:38'),
(2, 'Q', '2013-01-02 21:31:59'),
(3, 'C', '2013-01-04 21:32:22')
;
For example output can be (last field is the count of rows with that type and in that day):
'Q', 2013-01-04, 1
'C', 2013-01-04, 2
'A', 2013-01-03, 1
'C', 2013-01-03, 2
and so on...

You just need a group by.
select `type`, date(`timestamp`), count(*)
from tableName
group by `type`, date(`timestamp`)

select `type`, date(`timestamp`) as the_date, count(*) as counter
from MyTable
group by `type`, date(`timestamp`)

Related

MySQL: extract unique dates

I have created a table with
CREATE TABLE visits (
user_id int,
event_date timestamp
);
INSERT INTO visits (user_id, event_date)
VALUES
(1, '2021-12-22 12:12:00'),
(1, '2021-12-23 12:12:05'),
(1, '2021-12-24 12:13:00'),
(1, '2021-12-24 12:14:00'),
(1, '2022-03-10 12:14:00'),
(1, '2022-03-11 12:14:00'),
(2, '2021-12-23 12:12:00'),
(1, '2022-03-12 12:14:00'),
(2, '2021-12-23 13:12:00'),
(1, '2022-03-13 12:14:00'),
(1, '2022-03-14 12:14:00'),
(3, '2021-12-25 12:12:00'),
(1, '2022-03-15 12:14:00'),
(1, '2022-03-20 12:14:00'),
(1, '2022-03-21 12:14:00'),
(1, '2022-03-23 12:14:00'),
(1, '2022-03-24 12:14:00'),
(1, '2022-03-25 12:14:00'),
(3, '2021-12-30 12:12:00'),
(3, '2021-12-31 12:12:00'),
(3, '2021-12-31 12:12:00'),
(4, '2022-03-21 12:12:00'),
(4, '2022-03-22 12:12:00'),
(4, '2022-03-23 12:12:00'),
(4, '2022-03-24 12:12:00');
And then I try to extract unique dates with
select
user_id,
distinct cast(event_date as date) as event_date
from visits;
And I get
ERROR 1064 (42000) at line 111: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'distinct cast(event_date as date) as event_date
from visits' at line 3
What did I do wrong?
You should put the keyword distinct at the first position:
select distinct cast(event_date as date) as event_date, user_id
from visits;
OR
select distinct user_id, cast(event_date as date) as event_date
from visits;
Here is a demo
You can use GROUP BY.
(The column COUNT is optional)
SELECT
count(*) "number",
user_id,
DATE(event_date) "event date"
FROM visits
GROUP BY
user_id,
DATE(event_date)
ORDER BY
user_id,
DATE(event_date);

Find users with activities in all the last 6 months

I'm looking for the best solution on retrieving the list of users ID with activities in all the last 6 months.
Table structure and data, simplified, is the following:
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
I need to retrieve the list of users that has at least one activity for each one of the last 6 months. In the previous example just client_id 1
I thought doing a join, but it seems too expensive. I won't give you any idea on possible solutions in order not to deviate and accept whatever you have in mind.
Please, consider that I have to manage a really big data source (more then 50 millions rows).
Any quick idea?
I make no claims for the supremacy of this solution, partly because I find such requests disingenuous, but it should work, at least...
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
SELECT a.client_id
FROM activities a
WHERE a.created_at >= LAST_DAY(CURDATE() - INTERVAL 7 MONTH)+INTERVAL 1 DAY
GROUP
BY a.client_id
HAVING COUNT(DISTINCT(DATE_FORMAT(a.created_at,'%Y-%m'))) >= 6;
+-----------+
| client_id |
+-----------+
| 1 |
+-----------+

Select rows grouped by a column having max aggregate

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref11#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.
I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10#test.org | 2
ref11#test.org | 2
For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;
MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref30#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
select email, CountDecision
from (
SELECT er.email, COUNT(fd.Decision) AS CountDecision
, max(COUNT(fd.Decision)) over() maxCountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
) d
where CountDecision = maxCountDecision
email | CountDecision
:------------- | ------------:
ref10#test.org | 2
dbfiddle here

Age range computation where each country has at least two distinct ages of the individuals

For my dataset here, I want to create result such that for those countries (which have at least two distinct Ages), I could summarise the Age Range.
CREATE TABLE Employees(
ID int (3) NOT NULL,
Name varchar (50) NOT NULL,
Age int (3) NOT NULL,
Nationality varchar (50) NOT NULL
);
INSERT INTO Employees
(ID, Name, Age, Nationality)
VALUES
(1, 'CHIN YEN', '19', 'China'),
(2, 'MIKE PEARL', '21', 'United Kingdom'),
(3, 'GREEN FIELD', '45', 'Nethernalnds'),
(4, 'DEWANE PAUL', '57', 'Canada'),
(5, 'MATTS', '32', 'Australia'),
(6, 'PLANK OTO', '51', 'France'),
(7, 'Manish Kumar', '42', 'India'),
(8, 'Matts', '55', 'USA'),
(9, 'Mahesh Kumar', '32', 'USA'),
(10, 'Chin Yen', '21', 'Japan');
And what I was trying to do is:
SELECT Nationality,
Max(Age) - Min(Age) AS Age_Range
FROM Employees;
I think you just need a group by:
SELECT Nationality,
Max(Age) - Min(Age) AS Age_Range
FROM Employees
GROUP BY Nationality;
You might want to add HAVING Age_Range > 0.
To construct a query which will return age range for only those countries in which at least two individuals with distinct non-zero ages exist, following can be approach.
SELECT Nationality,
Max(Age) - Min(NULLIF(Age,0)) AS Age_Range
FROM Employees
GROUP BY Nationality
having Max(Age) - Min(NULLIF(Age,0)) > 0
what is does is that for any individual has age=0 , nullif convert its age into NULL which is then ignored by aggregate function MIN.
I have changed the data you shared as below.
INSERT INTO Employees
(ID, Name, Age, Nationality)
VALUES
(1, 'CHIN YEN', 0, 'United Kingdom'),
(2, 'MIKE PEARL', 21, 'United Kingdom'),
(3, 'GREEN FIELD', 45, 'Nethernalnds'),
(4, 'DEWANE PAUL', 57, 'Nethernalnds'),
(5, 'MATTS', 0, 'Nethernalnds'),
(6, 'PLANK OTO', 51, 'France'),
(7, 'Manish Kumar', 42, 'India'),
(8, 'Matts', 55, 'USA'),
(9, 'Mahesh Kumar', 32, 'USA'),
(10, 'Chin Yen', 21, 'Japan');
Below is the result as expected using the query i shared.
You can check DEMO here

MySQL: Select first row with value in interval

With the following table:
CREATE TABLE table1 (`id` INT, `num` INT);
INSERT INTO table1 (`id`, `num`) VALUES
(1, 1),
(1, 5),
(1, 7),
(1, 12),
(1, 22),
(1, 23),
(1, 24),
(2, 1),
(2, 6);
How do I select a row for each num interval of 5 (ie. select the first row for [0,5), the first for [5,10), the first for [10,15), etc.), with a given id? Is this possible with a MySQL query, or must I process it in a programming language later?
For reference, the output I'd want for id=1:
(1, 1), (1,5), (1,12), (1,22)
Here is a short query:
select min(num), ceiling((num + 1)/5)
from table1
where id = 1
group by ceiling((num + 1)/5);