MySQL aggregate data IN, OUT times - mysql

I got in table something like this:
ID | UID | ACTION | URL | TIMESTAMP
Where ...
ID - primary key
UID - user id
ACTION - IN or OUT
URL - action URL
TIMESTAMP - action TIMESTAMP
How to aggregate all data with one query?
I mean... as output I would like table with UID,URL,TOTAL_TIME where TOTAL_TIME would be a sum of all times between IN and OUT of given URL...
I tried some custom functions, but without luck...
Example Input (timestamp simplified to show what I mean):
1|13|IN|http://www.gógle.koń|1
2|13|OUT|http://www.gógle.koń|5
...
13454|13|IN|http://www.gógle.koń|550
...
13465|13|OUT|http://www.gógle.koń|600
...
243252|13|IN|http://www.pr0nstaff.meh|tiny_leg_finger|1200
...
245431|13|OUT|http://www.pr0nstaff.meh/tiny_leg_finger|2200
PLEASE NOTE THAT THERE MAY BE A CASE (AND SURELY WILL BE) WHERE IN - OUT OF ONE URL WOULD BE BROKEN BY IN OR IN - OUT OR OUT OF OTHER
... so we canno't simply count from IN to OUT without checking the site match.
Output for example input (for UUID = 13) should be:
13|www.gógle.koń|14
13|http://www.pr0nstaff.meh/tiny_leg_finger|1000

Try this, but I'm not shure, if there IN/OUT is not always double. So please check..
CREATE TABLE test1 (
id INT NOT NULL,
uid INT NOT NULL,
action VARCHAR(3),
url varchar(100),
timestamp1 TIMESTAMP
);
INSERT INTO test1 VALUES
( 1 , 13 , 'IN', 'www.go.com', '2015-01-07 08:00:00'),
( 2 , 13 , 'OUT', 'www.go.com', '2015-01-07 09:00:00'),
( 3 , 14 , 'IN', 'www.go2.com', '2015-01-07 08:30:00'),
( 4 , 14 , 'OUT', 'www.go2.com', '2015-01-07 09:00:00'),
( 5 , 15 , 'IN', 'www.go3.com', '2015-01-07 09:00:00'),
( 6 , 16 , 'OUT', 'www.go3.com', '2015-01-07 09:00:00');
SELECT i.uid,i.url,SUM(TIMESTAMPDIFF(minute, i.timestamp1, o.timestamp1)) AS diff_hour
FROM (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'IN') i
JOIN (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'OUT') o
ON i.uid = o.uid
AND i.url = o.url
AND i.id < o.id
GROUP BY i.uid,i.url
ORDER BY i.uid,i.url;

Try this:
SELECT UID, URL, TIMESTAMPDIFF(HOUR, InTime, OutTime) AS TOTAL_TIME
FROM (SELECT UID, URL,
MAX(CASE WHEN ACTION = 'IN' THEN TIMESTAMP ELSE NULL END) InTime,
MAX(CASE WHEN ACTION = 'OUT' THEN TIMESTAMP ELSE NULL END) OutTime
FROM tableA
GROUP BY UID, URL
) AS A;

Related

Displayed values are not what they should be

There are 2 tables ost_ticket and ost_ticket_action_history.
create table ost_ticket(
ticket_id int not null PRIMARY KEY,
created timestamp,
staff bool,
status varchar(50),
city_id int
);
create table ost_ticket_action_history(
ticket_id int not null,
action_id int not null PRIMARY KEY,
action_name varchar(50),
started timestamp,
FOREIGN KEY(ticket_id) REFERENCES ost_ticket(ticket_id)
);
In the ost_ticket_action_history table the data is:
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (1, 1, 'Consultation', '2022-01-06 18:30:29');
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (2, 2, 'Bank Application', '2022-02-06 18:30:45');
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (3, 3, 'Consultation', '2022-05-06 18:42:48');
In the ost_ticket table the data is:
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (1, '2022-04-04 18:26:41', 1, 'open', 2);
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (2, '2022-05-05 18:30:48', 0, 'open', 3);
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (3, '2022-04-06 18:42:53', 1, 'open', 4);
My task is to get the conversion from the “Consultation” stage to the “Bank Application” stage broken down by months (based on the start date of the “Bank Application” stage).Conversion is calculated according to the following formula: (number of applications with the “Bank Application” stage / number of applications with the “Consultation” stage) * 100%.
My request is like this:
select SUM(action_name='Bank Application')/SUM(action_name='Consultation') * 2 as 'Conversion' from ost_ticket_action_history JOIN ost_ticket ot on ot.ticket_id = ost_ticket_action_history.ticket_id where status = 'open' and created > '2020 -01-01 00:00:00' group by action_name,started having action_name = 'Bank Application';
As a result I get:
Another query:
SELECT
SUM(CASE
WHEN b.ticket_id IS NOT NULL THEN 1
ELSE 0
END) / COUNT(*) conversion,
YEAR(a.started) AS 'year',
MONTH(a.started) AS 'month'
FROM
ost_ticket_action_history a
LEFT JOIN
ost_ticket_action_history b ON a.ticket_id = b.ticket_id
AND b.action_name = 'Bank Application'
WHERE
a.action_name = 'Consultation'
AND a.status = 'open'
AND a.created > '2020-01-01 00:00:00'
GROUP BY YEAR(a.started) , MONTH(a.started)
I apologize if I didn't write very clearly. Please explain what to do.
Like I explained in my comment, you exclude rows with your having clause.
I will show you in the next how to debug.
First check what the raw result of the select query is.
As you see, when you remove the GROUP BY and see what you actually get is only 1 row with bank application, because the having clause excludes all other rows
SELECT
*
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY
action_name, started
HAVING
action_name = 'Bank Application';
Output:
ticket_id
action_id
action_name
started
ticket_id
created
staff
status
city_id
2
2
Bank Application
2022-02-06 18:30:45
2
2022-05-05 18:30:48
0
open
3
Second step, see what the result set is without calculating anything.
As you can see you make a division with 0, what you have learned in school, is forbidden, hat is why you have as result set NULL
SELECT
SUM(action_name = 'Bank Application')
#/
,SUM(action_name = 'Consultation') * 2 AS 'Conversion'
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY action_name , started
HAVING action_name = 'Bank Application';
SUM(action_name = 'Bank Application') | Conversion
------------------------------------: | ---------:
1 | 0
db<>fiddle here
#Third what you can do exclude a division with 0, here i didn't remove all othe rows as this is only for emphasis
SELECT
SUM(action_name = 'Bank Application')
/
SUM(action_name = 'Consultation') * 2 AS 'Conversion'
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY action_name , started
HAVING SUM(action_name = 'Consultation') > 0;
| Conversion |
| ---------: |
| 0.0000 |
| 0.0000 |
db<>fiddle here
Final words,
If you get a strange result, simply go back remove everything that doesn't matter and try to get all values, so hat you can check your math

SQL multi query

I need some help to do it right in one query (if it possible).
(this is a theoretical example and I assume the presence of events in event_name(like registration/action etc)
I have 3 colums:
-user_id
-event_timestamp
-event_name
From this 3 columns we need to create new table with 4 new columns:
-user year and month registration time
-number of new user registration in this month
-number of users who returned to the second calendar month after registration
-return probability
Result must be looks like this:
2019-1 | 1 | 1 | 100%
2019-2 | 3 | 2 | 67%
2019-3 | 2 | 0 | 0%
What I've done now:
I'm use this toy example of my possible main table:
CREATE TABLE `main` (
`event_timestamp` timestamp,
`user_id` int(10),
`event_name` char(12)
) DEFAULT CHARSET=utf8;
INSERT INTO `main` (`event_timestamp`, `user_id`, `event_name`) VALUES
('2019-01-23 20:02:21.550', '1', 'registration'),
('2019-01-24 20:03:21.550', '2', 'action'),
('2019-02-21 20:04:21.550', '3', 'registration'),
('2019-02-22 20:05:21.550', '4', 'registration'),
('2019-02-23 20:06:21.550', '5', 'registration'),
('2019-02-23 20:06:21.550', '1', 'action'),
('2019-02-24 20:07:21.550', '6', 'action'),
('2019-03-20 20:08:21.550', '3', 'action'),
('2019-03-21 20:09:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '9', 'action'),
('2019-03-23 20:11:21.550', '10', 'registration'),
('2019-03-22 20:10:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '5', 'action'),
('2019-03-24 20:11:21.550', '11', 'registration');
I'm trying to test some queries to create 4 new columns:
This is for column #1, we select month and year from timestamp where action is registration (as I guess), but I need to sum it for month (like 2019-11, 2019-12)
SELECT DATE_FORMAT(event_timestamp, '%Y-%m') AS column_1 FROM main
WHERE event_name='registration';
For column #2 we need to sum users with even_name registration in this month for every month, or.. we can trying for searching first time activity by user_id, but I don't know how to do this.
Here is some thinks about it...
SELECT COUNT(DISTINCT user_id) AS user_count
FROM main
GROUP BY MONTH(event_timestamp);
SELECT COUNT(DISTINCT user_id) AS user_count FROM main
WHERE event_name='registration';
For column #3 we need to compare user_id with the event_name registration and last month event with any event of the second month so we get users who returned for the next month.
Any idea how to create this query?
This is how to calc column #4
SELECT *,
ROUND ((column_3/column_2)*100) AS column_4
FROM main;
I hope you will find the following answer helpful.
The first column is the extraction of year and month. The new_users column is the COUNT of the unique user ids when the action is 'registration' since the user can be duplicated from the JOIN as a result of taking multiple actions the following month. The returned_users column is the number of users who have an action in the next month from the registration. The returned_users column needs a DISTINCT clause since a user can have multiple actions during one month. The final column is the probability that you asked from the two previous columns.
The JOIN clause is a self-join to bring the users that had at least one action the next month of their registration.
SELECT CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp)),
COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END)) AS new_users,
COUNT(DISTINCT B.user_id) AS returned_users,
CASE WHEN COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))=0 THEN 0 ELSE COUNT(DISTINCT B.user_id)/COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))*100 END AS My_Ratio
FROM main AS A
LEFT JOIN main AS B
ON A.user_id=B.user_id AND MONTH(A.event_timestamp)+1=MONTH(B.event_timestamp)
AND A.event_name='registration' AND B.event_name='action'
GROUP BY CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp))
What we will do is to use window functions and aggregation -- window functions to get the earliest registration date. Then some conditional aggregation.
One challenge is the handling of calendar months. To handle this, we will truncate the dates to the beginning of the month to facilitate the date arithmetic:
select yyyymm_reg, count(*) as regs_in_month,
sum( month_2 > 0 ) as visits_2months,
avg( month_2 > 0 ) as return_rate_2months
from (select m.user_id, m.yyyymm_reg,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 1) ) as month_1,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 2) ) as month_2,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 3) ) as month_3
from (select m.*,
cast(concat(extract(year_month from event_timestamp), '01') as date) as yyyymm,
cast(concat(extract(year_month from min(case when event_name = 'registration' then event_timestamp end) over (partition by user_id)), '01') as date) as yyyymm_reg
from main m
) m
where m.yyyymm_reg is not null
group by m.user_id, m.yyyymm_reg
) u
group by u.yyyymm_reg;
Here is a db<>fiddle.
Here you go, done in T-SQL:
;with cte as(
select a.* from (
select form,user_id,sum(count_regs) as count_regs,sum(count_action) as count_action from (
select FORMAT(event_timestamp,'yyyy-MM') as form,user_id,event_name,
CASE WHEN event_name = 'registration' THEN 1 ELSE 0 END as count_regs,
CASE WHEN event_name = 'action' THEN 1 ELSE 0 END as count_action from main) a
group by form,user_id) a)
select final.form,final.count_regs,final.count_action,((CAST(final.count_action as float)/(CASE WHEN final.count_regs = '0' THEN '1' ELSE final.count_regs END))*100) as probability from (
select a.form,sum(a.count_regs) count_regs,CASE WHEN sum(b.count_action) is null then '0' else sum(b.count_action) end count_action from cte a
left join
cte b
ON a.user_id = b.user_id and
DATEADD(month,1,CONVERT(date,a.form+'-01')) = CONVERT(date,b.form+'-01')
group by a.form ) final where final.count_regs != '0' or final.count_action != '0'

Go back in time with a SQL request with a join or group by

I'm working on an old DB and I have to create new columns based on old ones. This is a log table, each lines is an event. So uid is a User ID and a user can make several validation requests.
Here's my table
event_id uid event
1 1 REQUEST
2 1 VALIDATION
3 2 REQUEST
4 3 REQUEST
5 3 VALIDATION
6 2 VALIDATION
7 1 REQUEST
8 1 VALIDATION
Here's what I'd like to have
event_id uid event last_event_id request_nb
1 1 REQUEST 8 1
2 1 VALIDATION 8 1
3 2 REQUEST 6 1
4 3 REQUEST 5 1
5 3 VALIDATION 5 1
6 2 VALIDATION 6 1
7 1 REQUEST 8 2
8 1 VALIDATION 8 2
I assume that I need a group by on uid, a sum like 1 + SUM(CASE WHEN EVENT = 'VALIDATION' THEN 1 ELSE 0 END) AS request_nb. And a MAX somewhere for last_event_id. But I'm not familiar with these kinds of join.
Here's my sample data set:
CREATE TABLE IF NOT EXISTS `docs` (
`event_id` int(6) unsigned NOT NULL,
`uid` int(3) unsigned NOT NULL,
`event` varchar(200) NOT NULL,
PRIMARY KEY (`event_id`)
) DEFAULT CHARSET=utf8;
INSERT INTO `docs` (`event_id`, `uid`, `event`) VALUES
('1', '1', 'REQUEST'),
('2', '1', 'VALIDATION'),
('3', '2', 'REQUEST'),
('4', '3', 'REQUEST'),
('5', '3', 'VALIDATION'),
('6', '2', 'VALIDATION'),
('7', '1', 'REQUEST'),
('8', '1', 'VALIDATION');
and fiddle of same: http://sqlfiddle.com/#!9/404dcf/1
Does anyone have an idea ?
Thanks!
E.g.:
SELECT a.event_id
, a.uid
, a.event
, a.i
, b.last_event_id
FROM
( SELECT x.*
, CASE WHEN #prev_uid = uid THEN CASE WHEN #prev_event = event THEN #i:=#i+1 ELSE #i:=1 END ELSE #i:=1 END i
, #prev_uid := uid prev_uid
, #prev_event := event prev_event FROM docs x
, (SELECT #prev_uid := null, #prev_event := null, #i:=1) vars
ORDER
BY uid
, event
, event_id
) a
JOIN
( SELECT uid, MAX(event_id) last_event_id FROM docs GROUP BY uid ) b
ON b.uid = a.uid
ORDER
BY a.event_id;
You can use ANSI-standard window functions:
select t.*,
max(event_id) over (partition by uid) as last_event_id,
row_number() over (partition by uid, event order by event_id) as request_nb
from t;

SQL query - credit , debit , balance

DISCLAIMER : I Know this has been asked numerous times, but all I want is an alternative.
The table is as below :
create table
Account
(Name varchar(20),
TType varchar(5),
Amount int);
insert into Account Values
('abc' ,'c', 500),
('abc', 'c', 700),
('abc', 'd', 100),
('abc', 'd', 200),
('ab' ,'c', 300),
('ab', 'c', 700),
('ab', 'd', 200),
('ab', 'd', 200);
Expected result is simple:
Name Balance
------ -----------
ab 600
abc 900
The query that worked is :
select Name, sum(case TType when 'c' then Amount
when 'd' then Amount * -1 end) as balance
from Account a1
group by Name.
All I want is, is there any query sans the 'case' statement (like subquery or self join ) for the same result?
Sure. You can use a second query with a where clause and a union all:
select name
, sum(Amount) balance
from Account a1
where TType when 'c'
group
by Name
union
all
select name
, sum(Amount * -1) balance
from Account a1
where TType when 'd'
group
by Name
Or this, using a join with an inline view:
select name
, sum(Amount * o.mult) balance
from Account a1
join ( select 'c' cd
, 1 mult
from dual
union all
select 'd'
, -1
from dual
) o
on o.cd = a1.TType
group
by Name
To be honest, I would suggest to use case...
Use the ASCII code of the char and try to go from there. It is 100 for 'd' and 99 for 'c'. Untested example:
select Name, sum((ASCII(TType) - 100) * Amount * (-1)) + sum((ASCII(TType) - 99) * Amount * (-1)))) as balance from Account a1 group by Name.
I would not recommend using this method but it is a way of achieving what you want.
select t.Name, sum(t.cr) - sum(t.dr) as balance from (select Name, case TType when 'c' then sum(Amount) else 0 end as cr, case TType when 'd' then sum(Amount) else 0 end as dr from Account group by Name, TType) t group by t.Name;
This will surely help you!!
The following worked for me on Microsoft SQL server. It has the Brought Forward balance as well
WITH tempDebitCredit AS (
Select 0 As Details_ID, null As Creation_Date, null As Reference_ID, 'Brought
Forward' As Transaction_Kind, null As Amount_Debit, null As Amount_Credit,
isNull(Sum(Amount_Debit - Amount_Credit), 0) 'diff'
From _YourTable_Name
where Account_ID = #Account_ID
And Creation_Date < #Query_Start_Date
Union All
SELECT a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit, a.Amount_Debit - a.Amount_Credit 'diff'
FROM _YourTable_Name a
where Account_ID = #Account_ID
And Creation_Date >= #Query_Start_Date And Creation_Date <= #Query_End_Date
)
SELECT a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit, SUM(b.diff) 'Balance'
FROM tempDebitCredit a, tempDebitCredit b
WHERE b.Details_ID <= a.Details_ID
GROUP BY a.Details_ID, a.Creation_Date, a.Reference_ID, a.Transaction_Kind,
a.Amount_Debit, a.Amount_Credit
Order By a.Details_ID Desc

Multiple queries into one (Report)?

How do I combine multiple queries into one?
For example:
//Successful Sales:
SELECT username, count(*) as TotalSales, sum(point) as Points FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 1 group by username
/Return Sales:
SELECT username, count(*) as Return FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 2 group by username
//Unsuccessful Sales:
SELECT username, count(*) as UnsuccessfulSales FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND (status = 3 OR status = 6) group by username
So the report look something like this:
Also How do I add percentage of return?
Note: Fixed SQL queries
I have tried doing this but couldn't get it to work?
SELECT username, TotalSales, Points, Return
FROM (
SELECT username, count(*) as TotalSales, sum(point) as Points FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 1 group by username
UNION
SELECT count(*) as Return FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 4 group by username
)
..
// Example Data Structure
CREATE TABLE IF NOT EXISTS `sales2` (
`salesid` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`point` int(11) NOT NULL,
`status` int(11) NOT NULL,
PRIMARY KEY (`salesid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `sales2` (`salesid`, `username`, `point`, `status`) VALUES
(1, 'User1', 2, 1),
(2, 'User1', 2, 1),
(3, 'User2', 11, 1),
(4, 'User2', 1, 2),
(5, 'User3', 5, 6);
field status = 1, successful Sales and show point
status 2 - return sales
status 3/6 - unsuccessful sales:
UPDATE:
For your first question, I think this will do what you want (but be warned, this query is dog slow, full of table scans... You should ask for a more experienced stack overflow user to optimize that for you):
SELECT
distinct(outer_sales.username),
(SELECT count(*) as Points FROM sales where status = 1 AND username = outer_sales.username) as TotalSales,
(SELECT sum(point) as Points FROM sales where status = 1 AND username = outer_sales.username) as Points,
(SELECT count(*) FROM sales where status = 2 AND username = outer_sales.username) as Return,
(SELECT count(*) FROM sales where (status = 3 OR status = 6) AND username = outer_sales.username) as UnsuccessfulSales
FROM
sales outer_sales
ORDER BY
outer_sales.username;
And for the second question, if you just want to add a percent sign to the Return column, you can USE the CONCAT function: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat.
Try:
SELECT CONCAT(CAST(COUNT(*) AS CHAR), '%') AS Return ...
If the number (and types) of columns match in the queries, you can use UNION to combine the results of the 3 queries.