Displayed values are not what they should be - mysql

There are 2 tables ost_ticket and ost_ticket_action_history.
create table ost_ticket(
ticket_id int not null PRIMARY KEY,
created timestamp,
staff bool,
status varchar(50),
city_id int
);
create table ost_ticket_action_history(
ticket_id int not null,
action_id int not null PRIMARY KEY,
action_name varchar(50),
started timestamp,
FOREIGN KEY(ticket_id) REFERENCES ost_ticket(ticket_id)
);
In the ost_ticket_action_history table the data is:
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (1, 1, 'Consultation', '2022-01-06 18:30:29');
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (2, 2, 'Bank Application', '2022-02-06 18:30:45');
INSERT INTO newdb.ost_ticket_action_history (ticket_id, action_id, action_name, started) VALUES (3, 3, 'Consultation', '2022-05-06 18:42:48');
In the ost_ticket table the data is:
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (1, '2022-04-04 18:26:41', 1, 'open', 2);
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (2, '2022-05-05 18:30:48', 0, 'open', 3);
INSERT INTO newdb.ost_ticket (ticket_id, created, staff, status, city_id) VALUES (3, '2022-04-06 18:42:53', 1, 'open', 4);
My task is to get the conversion from the “Consultation” stage to the “Bank Application” stage broken down by months (based on the start date of the “Bank Application” stage).Conversion is calculated according to the following formula: (number of applications with the “Bank Application” stage / number of applications with the “Consultation” stage) * 100%.
My request is like this:
select SUM(action_name='Bank Application')/SUM(action_name='Consultation') * 2 as 'Conversion' from ost_ticket_action_history JOIN ost_ticket ot on ot.ticket_id = ost_ticket_action_history.ticket_id where status = 'open' and created > '2020 -01-01 00:00:00' group by action_name,started having action_name = 'Bank Application';
As a result I get:
Another query:
SELECT
SUM(CASE
WHEN b.ticket_id IS NOT NULL THEN 1
ELSE 0
END) / COUNT(*) conversion,
YEAR(a.started) AS 'year',
MONTH(a.started) AS 'month'
FROM
ost_ticket_action_history a
LEFT JOIN
ost_ticket_action_history b ON a.ticket_id = b.ticket_id
AND b.action_name = 'Bank Application'
WHERE
a.action_name = 'Consultation'
AND a.status = 'open'
AND a.created > '2020-01-01 00:00:00'
GROUP BY YEAR(a.started) , MONTH(a.started)
I apologize if I didn't write very clearly. Please explain what to do.

Like I explained in my comment, you exclude rows with your having clause.
I will show you in the next how to debug.
First check what the raw result of the select query is.
As you see, when you remove the GROUP BY and see what you actually get is only 1 row with bank application, because the having clause excludes all other rows
SELECT
*
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY
action_name, started
HAVING
action_name = 'Bank Application';
Output:
ticket_id
action_id
action_name
started
ticket_id
created
staff
status
city_id
2
2
Bank Application
2022-02-06 18:30:45
2
2022-05-05 18:30:48
0
open
3
Second step, see what the result set is without calculating anything.
As you can see you make a division with 0, what you have learned in school, is forbidden, hat is why you have as result set NULL
SELECT
SUM(action_name = 'Bank Application')
#/
,SUM(action_name = 'Consultation') * 2 AS 'Conversion'
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY action_name , started
HAVING action_name = 'Bank Application';
SUM(action_name = 'Bank Application') | Conversion
------------------------------------: | ---------:
1 | 0
db<>fiddle here
#Third what you can do exclude a division with 0, here i didn't remove all othe rows as this is only for emphasis
SELECT
SUM(action_name = 'Bank Application')
/
SUM(action_name = 'Consultation') * 2 AS 'Conversion'
FROM
ost_ticket_action_history
JOIN
ost_ticket ot ON ot.ticket_id = ost_ticket_action_history.ticket_id
WHERE
status = 'open'
AND created > '2020-01-01 00:00:00'
GROUP BY action_name , started
HAVING SUM(action_name = 'Consultation') > 0;
| Conversion |
| ---------: |
| 0.0000 |
| 0.0000 |
db<>fiddle here
Final words,
If you get a strange result, simply go back remove everything that doesn't matter and try to get all values, so hat you can check your math

Related

Is it possible to fetch needed data in one query?

I have a database containing tickets. Each ticket has a unique number but this number is not unique in the table. So for example ticket #1000 can be multiple times in the table with different other columns (Which I have removed here for the example).
create table countries
(
isoalpha varchar(2),
pole varchar(50)
);
insert into countries values ('DE', 'EMEA'),('FR', 'EMEA'),('IT', 'EMEA'),('US','USCAN'),('CA', 'USCAN');
create table tickets
(
id int primary key auto_increment,
number int,
isoalpha varchar(2),
created datetime
);
insert into tickets (number, isoalpha, created) values
(1000, 'DE', '2021-01-01 00:00:00'),
(1001, 'US', '2021-01-01 00:00:00'),
(1002, 'FR', '2021-01-01 00:00:00'),
(1003, 'CA', '2021-01-01 00:00:00'),
(1000, 'DE', '2021-01-01 00:00:00'),
(1000, 'DE', '2021-01-01 00:00:00'),
(1004, 'DE', '2021-01-02 00:00:00'),
(1001, 'US', '2021-01-01 00:00:00'),
(1002, 'FR', '2021-01-01 00:00:00'),
(1005, 'IT', '2021-01-02 00:00:00'),
(1006, 'US', '2021-01-02 00:00:00'),
(1007, 'DE', '2021-01-02 00:00:00');
Here is an example:
http://sqlfiddle.com/#!9/3f4ba4/6
What I need as output is the number of new created tickets for each day, devided into tickets from USCAN and rest of world.
So for this Example the out coming data should be
Date | USCAN | Other
'2021-01-01' | 2 | 2
'2021-01-02' | 1 | 3
At the moment I use this two queries to fetch all new tickets and then add the number of rows with same date in my application code:
SELECT MIN(ti.created) AS date
FROM tickets ti
LEFT JOIN countries ct ON (ct.isoalpha = ti.isoalpha)
WHERE ct.pole = 'USCAN'
GROUP BY ti.number
ORDER BY date
SELECT MIN(ti.created) AS date
FROM tickets ti
LEFT JOIN countries ct ON (ct.isoalpha = ti.isoalpha)
WHERE ct.pole <> 'USCAN'
GROUP BY ti.number
ORDER BY date
but that doesn't look like a very clean method. So how can I improved the query to get the needed data with less overhead?
Ii is recommended that is works with mySQL 5.7
You may logically combine the queries using conditional aggregation:
SELECT
MIN(CASE WHEN ct.pole = 'USCAN' THEN ti.created END) AS date_uscan,
MIN(CASE WHEN ct.pole <> 'USCAN' THEN ti.created END) AS date_other
FROM tickets ti
LEFT JOIN countries ct ON ct.isoalpha = ti.isoalpha
GROUP BY ti.number
ORDER BY date;
You can create unique entries for each date/country then use that value to count USCAN and non-USCAN
SELECT created,
SUM(1) as total,
SUM(CASE WHEN pole = 'USCAN' THEN 1 ELSE 0 END) as uscan,
SUM(CASE WHEN pole != 'USCAN' THEN 1 ELSE 0 END) as nonuscan
FROM (
SELECT created, t.isoalpha, MIN(pole) AS pole
FROM tickets t JOIN countries c ON t.isoalpha = c.isoalpha
GROUP BY created,isoalpha
) AS uniqueTickets
GROUP BY created
Results:
created total uscan nonuscan
2021-01-01T00:00:00Z 4 2 2
2021-01-02T00:00:00Z 3 1 2
http://sqlfiddle.com/#!9/3f4ba4/45/0
Regarding the answer of SQL Hacks I found the right solution
SELECT created,
SUM(1) as total,
SUM(CASE WHEN pole = 'USCAN' THEN 1 ELSE 0 END) as uscan,
SUM(CASE WHEN pole != 'USCAN' THEN 1 ELSE 0 END) as nonuscan
FROM (
SELECT created, t.isoalpha, MIN(pole) AS pole
FROM tickets t JOIN countries c ON t.isoalpha = c.isoalpha
GROUP BY t.number
) AS uniqueTickets
GROUP BY SUBSTR(created, 1 10)

SQL multi query

I need some help to do it right in one query (if it possible).
(this is a theoretical example and I assume the presence of events in event_name(like registration/action etc)
I have 3 colums:
-user_id
-event_timestamp
-event_name
From this 3 columns we need to create new table with 4 new columns:
-user year and month registration time
-number of new user registration in this month
-number of users who returned to the second calendar month after registration
-return probability
Result must be looks like this:
2019-1 | 1 | 1 | 100%
2019-2 | 3 | 2 | 67%
2019-3 | 2 | 0 | 0%
What I've done now:
I'm use this toy example of my possible main table:
CREATE TABLE `main` (
`event_timestamp` timestamp,
`user_id` int(10),
`event_name` char(12)
) DEFAULT CHARSET=utf8;
INSERT INTO `main` (`event_timestamp`, `user_id`, `event_name`) VALUES
('2019-01-23 20:02:21.550', '1', 'registration'),
('2019-01-24 20:03:21.550', '2', 'action'),
('2019-02-21 20:04:21.550', '3', 'registration'),
('2019-02-22 20:05:21.550', '4', 'registration'),
('2019-02-23 20:06:21.550', '5', 'registration'),
('2019-02-23 20:06:21.550', '1', 'action'),
('2019-02-24 20:07:21.550', '6', 'action'),
('2019-03-20 20:08:21.550', '3', 'action'),
('2019-03-21 20:09:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '9', 'action'),
('2019-03-23 20:11:21.550', '10', 'registration'),
('2019-03-22 20:10:21.550', '4', 'action'),
('2019-03-22 20:10:21.550', '5', 'action'),
('2019-03-24 20:11:21.550', '11', 'registration');
I'm trying to test some queries to create 4 new columns:
This is for column #1, we select month and year from timestamp where action is registration (as I guess), but I need to sum it for month (like 2019-11, 2019-12)
SELECT DATE_FORMAT(event_timestamp, '%Y-%m') AS column_1 FROM main
WHERE event_name='registration';
For column #2 we need to sum users with even_name registration in this month for every month, or.. we can trying for searching first time activity by user_id, but I don't know how to do this.
Here is some thinks about it...
SELECT COUNT(DISTINCT user_id) AS user_count
FROM main
GROUP BY MONTH(event_timestamp);
SELECT COUNT(DISTINCT user_id) AS user_count FROM main
WHERE event_name='registration';
For column #3 we need to compare user_id with the event_name registration and last month event with any event of the second month so we get users who returned for the next month.
Any idea how to create this query?
This is how to calc column #4
SELECT *,
ROUND ((column_3/column_2)*100) AS column_4
FROM main;
I hope you will find the following answer helpful.
The first column is the extraction of year and month. The new_users column is the COUNT of the unique user ids when the action is 'registration' since the user can be duplicated from the JOIN as a result of taking multiple actions the following month. The returned_users column is the number of users who have an action in the next month from the registration. The returned_users column needs a DISTINCT clause since a user can have multiple actions during one month. The final column is the probability that you asked from the two previous columns.
The JOIN clause is a self-join to bring the users that had at least one action the next month of their registration.
SELECT CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp)),
COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END)) AS new_users,
COUNT(DISTINCT B.user_id) AS returned_users,
CASE WHEN COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))=0 THEN 0 ELSE COUNT(DISTINCT B.user_id)/COUNT(DISTINCT(CASE WHEN A.event_name LIKE 'registration' THEN A.user_id END))*100 END AS My_Ratio
FROM main AS A
LEFT JOIN main AS B
ON A.user_id=B.user_id AND MONTH(A.event_timestamp)+1=MONTH(B.event_timestamp)
AND A.event_name='registration' AND B.event_name='action'
GROUP BY CONCAT(YEAR(A.event_timestamp),'-',MONTH(A.event_timestamp))
What we will do is to use window functions and aggregation -- window functions to get the earliest registration date. Then some conditional aggregation.
One challenge is the handling of calendar months. To handle this, we will truncate the dates to the beginning of the month to facilitate the date arithmetic:
select yyyymm_reg, count(*) as regs_in_month,
sum( month_2 > 0 ) as visits_2months,
avg( month_2 > 0 ) as return_rate_2months
from (select m.user_id, m.yyyymm_reg,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 1) ) as month_1,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 2) ) as month_2,
max( (timestampdiff(month, m.yyyymm_reg, m.yyyymm) = 3) ) as month_3
from (select m.*,
cast(concat(extract(year_month from event_timestamp), '01') as date) as yyyymm,
cast(concat(extract(year_month from min(case when event_name = 'registration' then event_timestamp end) over (partition by user_id)), '01') as date) as yyyymm_reg
from main m
) m
where m.yyyymm_reg is not null
group by m.user_id, m.yyyymm_reg
) u
group by u.yyyymm_reg;
Here is a db<>fiddle.
Here you go, done in T-SQL:
;with cte as(
select a.* from (
select form,user_id,sum(count_regs) as count_regs,sum(count_action) as count_action from (
select FORMAT(event_timestamp,'yyyy-MM') as form,user_id,event_name,
CASE WHEN event_name = 'registration' THEN 1 ELSE 0 END as count_regs,
CASE WHEN event_name = 'action' THEN 1 ELSE 0 END as count_action from main) a
group by form,user_id) a)
select final.form,final.count_regs,final.count_action,((CAST(final.count_action as float)/(CASE WHEN final.count_regs = '0' THEN '1' ELSE final.count_regs END))*100) as probability from (
select a.form,sum(a.count_regs) count_regs,CASE WHEN sum(b.count_action) is null then '0' else sum(b.count_action) end count_action from cte a
left join
cte b
ON a.user_id = b.user_id and
DATEADD(month,1,CONVERT(date,a.form+'-01')) = CONVERT(date,b.form+'-01')
group by a.form ) final where final.count_regs != '0' or final.count_action != '0'

MySQL aggregate data IN, OUT times

I got in table something like this:
ID | UID | ACTION | URL | TIMESTAMP
Where ...
ID - primary key
UID - user id
ACTION - IN or OUT
URL - action URL
TIMESTAMP - action TIMESTAMP
How to aggregate all data with one query?
I mean... as output I would like table with UID,URL,TOTAL_TIME where TOTAL_TIME would be a sum of all times between IN and OUT of given URL...
I tried some custom functions, but without luck...
Example Input (timestamp simplified to show what I mean):
1|13|IN|http://www.gógle.koń|1
2|13|OUT|http://www.gógle.koń|5
...
13454|13|IN|http://www.gógle.koń|550
...
13465|13|OUT|http://www.gógle.koń|600
...
243252|13|IN|http://www.pr0nstaff.meh|tiny_leg_finger|1200
...
245431|13|OUT|http://www.pr0nstaff.meh/tiny_leg_finger|2200
PLEASE NOTE THAT THERE MAY BE A CASE (AND SURELY WILL BE) WHERE IN - OUT OF ONE URL WOULD BE BROKEN BY IN OR IN - OUT OR OUT OF OTHER
... so we canno't simply count from IN to OUT without checking the site match.
Output for example input (for UUID = 13) should be:
13|www.gógle.koń|14
13|http://www.pr0nstaff.meh/tiny_leg_finger|1000
Try this, but I'm not shure, if there IN/OUT is not always double. So please check..
CREATE TABLE test1 (
id INT NOT NULL,
uid INT NOT NULL,
action VARCHAR(3),
url varchar(100),
timestamp1 TIMESTAMP
);
INSERT INTO test1 VALUES
( 1 , 13 , 'IN', 'www.go.com', '2015-01-07 08:00:00'),
( 2 , 13 , 'OUT', 'www.go.com', '2015-01-07 09:00:00'),
( 3 , 14 , 'IN', 'www.go2.com', '2015-01-07 08:30:00'),
( 4 , 14 , 'OUT', 'www.go2.com', '2015-01-07 09:00:00'),
( 5 , 15 , 'IN', 'www.go3.com', '2015-01-07 09:00:00'),
( 6 , 16 , 'OUT', 'www.go3.com', '2015-01-07 09:00:00');
SELECT i.uid,i.url,SUM(TIMESTAMPDIFF(minute, i.timestamp1, o.timestamp1)) AS diff_hour
FROM (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'IN') i
JOIN (SELECT id,uid,url,timestamp1
FROM test1
WHERE action = 'OUT') o
ON i.uid = o.uid
AND i.url = o.url
AND i.id < o.id
GROUP BY i.uid,i.url
ORDER BY i.uid,i.url;
Try this:
SELECT UID, URL, TIMESTAMPDIFF(HOUR, InTime, OutTime) AS TOTAL_TIME
FROM (SELECT UID, URL,
MAX(CASE WHEN ACTION = 'IN' THEN TIMESTAMP ELSE NULL END) InTime,
MAX(CASE WHEN ACTION = 'OUT' THEN TIMESTAMP ELSE NULL END) OutTime
FROM tableA
GROUP BY UID, URL
) AS A;

SQL: find all items on left side where right side items all have specific field value

There is a table job that contains data as shown below:
Id Status
-----------
1 NEW
2 NEW
There is a table item that contains data as shown below:
Id Status JobId
---------------------
1 NEW 1
2 PROCESSED 1
3 NEW 1
4 PROCESSED 2
5 PROCESSED 2
I want to run a query, that will return all Jobs whose "children" all have a status of X
Pseudo-SQL:
SELECT * FROM Job WHERE status = 'NEW' AND Items for Job WHERE all items status = PROCESSED
That should return
Id Status
-----------
2 NEW
Because all of Job 2 items have status = PROCESSED.
Job 1 does not appear because it has items with the unwanted status NEW
SELECT * from job where Id not in (SELECT JobId from item where Status <> 'PROCESSED');
This will return all from job where id is not in result of all jobids which have status different from 'PROCESSED'.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE job
(`Id` int, `Status` varchar(3))
;
INSERT INTO job
(`Id`, `Status`)
VALUES
(1, 'NEW'),
(2, 'NEW')
;
CREATE TABLE item
(`Id` int, `Status` varchar(9), `JobId` int)
;
INSERT INTO item
(`Id`, `Status`, `JobId`)
VALUES
(1, 'NEW', 1),
(2, 'PROCESSED', 1),
(3, 'NEW', 1),
(4, 'PROCESSED', 2),
(5, 'PROCESSED', 2)
;
Query 1:
SELECT *
FROM job
WHERE NOT EXISTS
(SELECT 1
FROM item
WHERE job.Id = item.JobId AND item.Status <> 'PROCESSED')
Results:
| ID | STATUS |
|----|--------|
| 2 | NEW |
SELECT j.* FROM Job j
WHERE not exists (select 1 from item i where i.JobId = j.id and i.Status != 'PROCESSED')
and exists (select 1 from item i where i.JobId = j.id and i.Status = 'PROCESSED')
and j.status = 'NEW';
Or
SELECT j.* FROM Job j
WHERE j.id in
(select jobId from (
select jobId, count(distinct status) n_all,
count(distinct case when status = 'PROCESSED'
then status else null
end) n_processed
from item group by jobId
) t
where n_all = n_processed
)
and j.status = 'NEW';

Multiple queries into one (Report)?

How do I combine multiple queries into one?
For example:
//Successful Sales:
SELECT username, count(*) as TotalSales, sum(point) as Points FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 1 group by username
/Return Sales:
SELECT username, count(*) as Return FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 2 group by username
//Unsuccessful Sales:
SELECT username, count(*) as UnsuccessfulSales FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND (status = 3 OR status = 6) group by username
So the report look something like this:
Also How do I add percentage of return?
Note: Fixed SQL queries
I have tried doing this but couldn't get it to work?
SELECT username, TotalSales, Points, Return
FROM (
SELECT username, count(*) as TotalSales, sum(point) as Points FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 1 group by username
UNION
SELECT count(*) as Return FROM sales where submit_date >= 1301612400 AND submit_date <= 1304204400 AND status = 4 group by username
)
..
// Example Data Structure
CREATE TABLE IF NOT EXISTS `sales2` (
`salesid` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`point` int(11) NOT NULL,
`status` int(11) NOT NULL,
PRIMARY KEY (`salesid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;
INSERT INTO `sales2` (`salesid`, `username`, `point`, `status`) VALUES
(1, 'User1', 2, 1),
(2, 'User1', 2, 1),
(3, 'User2', 11, 1),
(4, 'User2', 1, 2),
(5, 'User3', 5, 6);
field status = 1, successful Sales and show point
status 2 - return sales
status 3/6 - unsuccessful sales:
UPDATE:
For your first question, I think this will do what you want (but be warned, this query is dog slow, full of table scans... You should ask for a more experienced stack overflow user to optimize that for you):
SELECT
distinct(outer_sales.username),
(SELECT count(*) as Points FROM sales where status = 1 AND username = outer_sales.username) as TotalSales,
(SELECT sum(point) as Points FROM sales where status = 1 AND username = outer_sales.username) as Points,
(SELECT count(*) FROM sales where status = 2 AND username = outer_sales.username) as Return,
(SELECT count(*) FROM sales where (status = 3 OR status = 6) AND username = outer_sales.username) as UnsuccessfulSales
FROM
sales outer_sales
ORDER BY
outer_sales.username;
And for the second question, if you just want to add a percent sign to the Return column, you can USE the CONCAT function: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat.
Try:
SELECT CONCAT(CAST(COUNT(*) AS CHAR), '%') AS Return ...
If the number (and types) of columns match in the queries, you can use UNION to combine the results of the 3 queries.