SQL: Union or Self Join - mysql

I have a simple table: user(id, date, task)
The task field contains either "download" or "upload"
I want to figure out the number of users who do each action per day.
Output: date, # of users who downloaded, # of users who uploaded
I first ran into the issue of using a subquery in the aggregate count function of the select, so I thought I should be using a self join here to break apart the data in the "task" column.
I thought I could create to tables for each case and then combine those and count, but I am having trouble finishing this out:
SELECT id, date, task as task_download
FROM user
WHERE task = 'download'
SELECT id, date, task as task_upload
FROM user
WHERE task = 'upload'

select `date`,
COUNT( distinct CASE WHEN task = 'download' then id end ) 'download',
COUNT( distinct CASE WHEN task = 'upload' then id end ) 'upload'
from user
group by `date`

I would say, neither nor. Just a query like this will do the job:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by `date`
assuming, date is a column containing only the date part and not the complete timestamp and id is the user id. You can use the distinct keyword within aggregate functions, that's what I did here.
To have this query run appropriately fast, I recommend using an index on task,date
If, however, date contains the complete timestamp (i.e. including the time-part) you would want to group differently:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by date(`date`)

You can do it with sub-queries, e.g.:
SELECT `date` AS `day`,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'upload') AS upload_count,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'download') AS download_count
FROM activity
GROUP BY date;
Here's the SQL Fiddle.

First count distinct users by date and task, and then sum users depending on each task by date.
select date,
sum(case when task = 'upload' then num_users else 0 end) as "upload",
sum(case when task = 'download' then num_users else 0 end) as "download"
from (
select date, task, count(distinct id) num_users
from usert
group by date, task
) x
group by date
;
Check it here: http://rextester.com/ZACFB64945

If you want the distinct users, then that suggests count(distinct):
SELECT date,
COUNT(DISTINCT CASE WHEN task = 'upload' THEN userid END) as uploads,
COUNT(DISTINCT CASE WHEN task = 'download' THEN userid END) as downloads
FROM user
GROUP BY date
ORDER BY date;
If you want distinct actions then you can do this as:
SELECT date,
SUM( (task = 'upload')::int ) as uploads,
SUM( (task = 'download')::int) as downloads
FROM user
GROUP BY date
ORDER BY date;
This uses a convenient Postgres shorthand for counting the boolean expressions.

I'd use conditional aggregation.
To get a count of the number of users that performed at least one upload on a given date (but only increment the count by one for that user for that date, even if that user performed more than one upload on the same date), we can use a COUNT(DISTINCT user) expression.
To get a count of the total number of uploads, we can use a COUNT or SUM.
SELECT DATE(t.date) AS `date`
, COUNT(DISTINCT IF(t.task='upload' ,t.user,NULL)) AS cnt_users_who_uploaded
, COUNT(DISTINCT IF(t.task='download',t.user,NULL)) AS cnt_users_who_downloaded
, SUM(IF(t.task='upload' ,1,0)) AS cnt_uploads
, SUM(IF(t.task='download',1,0)) AS cnt_downloads
FROM user t
GROUP BY DATE(t.date)
ORDER BY DATE(t.date)
Note: this will not return counts of zero for dates where there are no rows for that date does not appear in the table.

Related

Calculating consecutive occurences in MySQL

I have a quick question in relation to windowing in MySQL
SELECT
Client,
User,
Date,
Flag,
lag(Date) over (partition by Client,User order by Date asc) as last_date,
lag(Flag) over (partition by Client,User order by Date asc) as last_flag,
case when Flag = 1 and last_flag = 1 then 1 else 0 end as consecutive
FROM db.tbl
This query returns something like the below. I am trying to work out the number of consecutive times that the Flag column was 1 for each user most recently, if they had 11110000111 then we should take the final three occurences of 1 to determine that they had a consecutive flag of 3 times.
I need to extract the start and end date for the consecutive flag.
How would I go about doing this, can anyone help me :)
If we use the example of 11110000111 then we should extract only 111 and therefore the 3 most recent dates for that customer. So in the below, we would need to take 10.01.2023 as the first date and 24.01.2023 as the last date. The consecutive count should be 3
Output:
Use aggregation and string functions:
WITH cte AS (
SELECT Client, User,
GROUP_CONCAT(CASE WHEN Flag THEN Date END ORDER BY Date) AS dates,
CHAR_LENGTH(SUBSTRING_INDEX(GROUP_CONCAT(Flag ORDER BY Date SEPARATOR ''), '0', '-1')) AS consecutive
FROM tablename
GROUP BY Client, User
)
SELECT Client, User,
NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(dates, ',', -consecutive), ',', 1), '') AS first_date,
CASE WHEN consecutive > 0 THEN SUBSTRING_INDEX(dates, ',', -1) END AS last_date,
consecutive
FROM cte;
Another solution with window functions and conditional aggregation:
WITH
cte1 AS (SELECT *, SUM(NOT Flag) OVER (PARTITION BY Client, User ORDER BY Date) AS grp FROM tablename),
cte2 AS (SELECT *, MAX(grp) OVER (PARTITION BY Client, User) AS max_grp FROM cte1)
SELECT Client, User,
MIN(CASE WHEN Flag THEN Date END) AS first_date,
MAX(CASE WHEN Flag THEN Date END) AS last_date,
SUM(Flag) AS consecutive
FROM cte2
WHERE grp = max_grp
GROUP BY Client, User;
See the demo.
Made an attempt to get the result with more simpler queries and here is my approach taking advantage of lastDate and lastFlag column too.
Run here
WITH eTT
AS
( SELECT Client, User, NULLIF(MAX(Date),
(SELECT MAX(Date) FROM tt t2 WHERE t1.Client=t2.Client AND t1.User=t2.User)) as endDate
FROM tt t1 WHERE LastFlag=0 OR LastFlag IS NULL GROUP BY Client, User
)
SELECT Client, User,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MIN(Date) END) as first_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MAX(Date) END) as last_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE COUNT(endDate) END) as consecutive
FROM tt LEFT JOIN eTT USING (Client, User)
WHERE Date >= endDate OR endDate IS null GROUP BY Client, User;
EDIT
The original table doesn't have LastDate and LastFlag columns and were created using OP's initial query.
Since the method used is not apparantly supported but I get an impression that OP somehow manages to do that on their side.
Hence another cte called tt can be added before eTT containing that query.

Summing of count result at same level

I'm trying to sum the results of count(id) at the same level, in order to find out the relative portion of the count(id) from the overall count.
The count is grouped by the respective previous number, and I want to stay at the same table and have it all together.
`
select totalattempts, count(totalattempts) allattempts, count(case when success>0 then totalattempts else null end) successfulattempts
from (
select *, case when success> 0 then attemptspresuccess+1 else attemptspresuccess end totalattempts
from (select orderid, count(orderid) attemptspresuccess, count(case when recoveredPaymentId is not null then recoveredPaymentId end ) success from (
select orderid, recoveredPaymentId
from errors
where platform = 'woo'
) alitable
group by orderid) minitable ) finaltable
group by totalattempts
order by totalattempts asc
`
I need to add another column that basically would have, to put it simply, count(totalattempts)/sum(count(totalattempts).
I'm running out of ideas basically.
I can't use windows as this is an app of retool which doesn't support that
Assuming some test data here:
DECLARE #table TABLE (AttemptNumber INT IDENTITY, Success BIT)
INSERT INTO #table (Success) VALUES
(0),(0),(0),(0),(1),(1),(0),(0),(0),(0),(0),(1),(0),(1),(0),(0),
(0),(0),(1),(0),(0),(0),(0),(1),(0),(1),(0),(0),(0),(1),(0),(0)
I sounds like you want to know how many attempts there were, how many were successful and what that is a percentage?
SELECT COUNT(Success) AS TotalCount,
COUNT(CASE WHEN Success = 1 THEN 1 END) AS SuccessCount,
COUNT(CASE WHEN Success = 1.0 THEN 1 END)/(COUNT(Success)+.0) AS SuccessPct
FROM #table
TotalCount SuccessCount SuccessPct
--------------------------------------
32 8 0.2500000000000

SQL query to get percentages within a grouping

I've looked over similar questions and I just can't seem to get this right.
I have a table with three columns: ID, Date, and Method. None are unique.
I want to be able to see for any given date, how many rows match a certain pattern on Method.
So, for example, if the table has 100 rows, and 8 of them have the date "01-01-2020" and of those 8, two of them have a method of "A", I would want a return row that says "01-01-2020", "8", "2", and "25%".
My SQL is pretty rudimentary. I have been able to make a query to get me the count of each method by date:
select Date, count(*) from mytable WHERE Method="A" group by Date;
But I haven't been able to figure out how to put together the results that I am needing. Can someone help me out?
You could perform a count over a case expression for that method, and then divide the two counts:
SELECT date,
COUNT(*),
COUNT(CASE method WHEN 'A' THEN 1 END),
COUNT(CASE method WHEN 'A' THEN 1 END) / COUNT(*) * 100
FROM mytable
GROUP BY date
I'm assuming you're interested in all methods rather than just 'A', so you could do the following:
with ptotals as
(
SELECT
thedate,
count(*) as NumRows
FROM
mytable
group by
thedate
)
select
mytable.thedate,
mytable.themethod,
count(*) as method_count,
100 * count(*) / max(ptotals.NumRows) as Pct
from
mytable
inner join
ptotals
on
mytable.thedate = ptotals.thedate
group by
mytable.thedate,
mytable.themethod
You can use AVG() for the ratio/percentage:
SELECT date, COUNT(*),
SUM(CASE WHEN method = 'A' THEN 1 ELSE 0 END),
AVG(CASE WHEN method = 'A' THEN 100.0 ELSE 0 END)
FROM t
GROUP BY date;

Select column(s) corresponding to max/min of another column without joins

I have a table (id, employee_id, device_id, logged_time) [simplified] that logs attendances of employees from biometric devices.
I generate reports showing the first in and last out time of each employee by date.
Currently, I am able to fetch the first in and last out time of each employee by date, but I also need to fetch the first in and last out device_ids of each employee. The entries are not in sequential order of the logged time.
I do not want to (and probably cannot) use joins as in one of the reports the columns are dynamically generated and can lead to thousands of joins. Furthermore, these are subqueries and are joined to other queries to get further details.
A sample setup of the table and queries are at http://sqlfiddle.com/#!9/3bc755/4
The first one just shows lists the entry and exit time by date of every employee
select
attendance_logs.employee_id,
DATE(attendance_logs.logged_time) as date,
TIME(MIN(attendance_logs.logged_time)) as entry_time,
TIME(MAX(attendance_logs.logged_time)) as exit_time
from attendance_logs
group by date, attendance_logs.employee_id
The second one builds up an attendance chart given a date range
select
`attendance_logs`.`employee_id`,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_18,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_18,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_18,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_19,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_19,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_19
/*
* dynamically generated columns for dates in date range
*/
from `attendance_logs`
where `attendance_logs`.`logged_time` >= '2017-09-18 00:00:00' and `attendance_logs`.`logged_time` <= '2017-09-19 23:59:59'
group by `attendance_logs`.`employee_id`;
Tried:
Similar to max and min logged_time of each date using case, tried to select the device_id where logged_time is max/min.
```MIN(case
when
`attendance_logs.logged_time` = MIN(
case when DATE(`attendance_logs`.`logged_time`)
= '2017-09-18' THEN `attendance_logs`.`logged_time` END
)
then `attendance_logs`.`device_id` end) as entry_device_2017_09_18 ```
This results in invalid use of group by
A quick hack for your query to pick the device id for in and out by using GROUP_CONCAT with in SUBSTRING_INDEX
SUBSTRING_INDEX(GROUP_CONCAT(case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END ORDER BY `l`.`device_id` desc),',',1) exit_device_2017_09_18,
Or if device id will be same for each in and its out then simply it can be written with GROUP_CONCAT only
GROUP_CONCAT(DISTINCT case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END)
DEMO
To avoid joins I suggest you try "correlated subqueries" instead:
select
employee_id
, logdate
, TIME(entry_time) entry_time
, (select MIN(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.entry_time) entry_device
, TIME(exit_time) exit_time
, (select MAX(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.exit_time) exit_device
from (
select
attendance_logs.employee_id
, DATE(attendance_logs.logged_time) as logdate
, MIN(attendance_logs.logged_time) as entry_time
, MAX(attendance_logs.logged_time) as exit_time
from attendance_logs
group by
attendance_logs.employee_id
, DATE(attendance_logs.logged_time)
) d
;
see: http://sqlfiddle.com/#!9/06e0e2/3
Note: I have used MIN() and MAX() on those subqueries only to avoid any possibility that these return more than one value. You could use limit 1 instead if you prefer.
Note also: I do not normally recommend correlated subqueries as they can cause performance issues, but they do supply the data you need.
oh, and please try to avoid using date as a column name, it isn't good practice.

MySql - Exclude default date from max inside case

I am working with a table of items with expiration dates,these items are assigned to users.
I want to get for each user,the highest expiration date.The issue here is that default items are initialized with a '3000/01/01' expiration date that should be ignored if another item exists for that user.
I've got a query doing that:
SELECT
user_id as UserId,
CASE WHEN (YEAR(MAX(date_expiration)) = 3000)
THEN (
SELECT MAX(temp.date_expiration)
FROM user_items temp
WHERE YEAR(temp.date_expiration) &lt&gt 3000 and temp.user_id = UserId
)
ELSE MAX(date_expiration)
END as date_expiration
FROM user_items GROUP BY user_id
This works, but the query inside THEN block is killing performance a bit and it is a huge table.
So,Is there a better way to ignore the default date from the MAX operation when entering the CASE condition?
SELECT user_id,
COALESCE(
MAX(CASE WHEN YEAR(date_expiration) = 3000 THEN NULL ELSE date_expiration END),
MAX(date_expiration)
)
FROM user_items
GROUP BY
user_id
If there are few users but lots of entries per user in your table, you can try improving your query yet a little more:
SELECT user_id,
COALESCE(
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
AND date_expiration < '3000-01-01'
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
),
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
)
)
FROM (
SELECT DISTINCT
user_id
FROM user_items
) uid
You need an index on (user_id, date_expiration) for this to work fast.