Select quantity of record instances separated by weeks - mysql

I have a table like the below:
CompanyID | Logged | UniqueID
A | 2014-06-24 | 8
B | 2014-06-24 | 7
A | 2014-06-16 | 6
B | 2014-06-16 | 5
A | 2014-06-08 | 4
B | 2014-06-08 | 3
A | 2014-06-01 | 2
B | 2014-06-01 | 1
I'm stuck trying to create an SQL statement that will return the quantity of rows found for each unique CompanyID, separated into 4 week periods, so something like the below:
CompanyID | Period (week) | Quantity
A | 0 | 1
B | 0 | 1
A | 1 | 1
B | 1 | 1
A | 2 | 1
B | 2 | 1
A | 3 | 1
B | 3 | 1
I have done something similar before, except by the last 7 days instead of last 4 weeks, but am not sure if this can be reworked:
select CompanyID,
case DATE_FORMAT(Logged, '%Y%m%d')
when '20140618' then '0'
when '20140619' then '1'
when '20140620' then '2'
when '20140621' then '3'
when '20140622' then '4'
when '20140623' then '5'
when '20140624' then '6'
end as period ,
count(UniqueID) as quantity from TABLE
where DATE_FORMAT(Logged, '%Y%m%d')
in (20140618,20140619,20140620,20140621,20140622,20140623,20140624) group by CompanyID,
DATE_FORMAT(Logged, '%Y%m%d')
Is there a more straightforward way to obtain the output desired above?

Maybe something like this?
SQL FIDDLE to test with
Theres the original query that doesn't use any hard coding... that is generally a really bad practice. it will have the count inflated by 1 since it starts with one and you want it to start with zero so to fix this do a select of the original query where you fix the count and then also not show the user defined variable
SELECT CompanyID, Period - 1 as Period, Quantity FROM(
SELECT
CompanyID,
if(#a = Logged, #b, #b := #b + 1) as Period,
COUNT(*) as Quantity,
#a := Logged
FROM test
JOIN (SELECT #a := '', #b := 0) as temp
GROUP BY UniqueID
ORDER BY Period
) as subQuery
ORIGINAL QUERY
SELECT
CompanyID,
if(#a = Logged, #b, #b := #b + 1) as Period,
COUNT(*) as Quantity,
#a := Logged
FROM test
JOIN (SELECT #a := '', #b := 0) as temp
GROUP BY UniqueID
ORDER BY Period

Related

mysql group by but only group if second row is the same

im wondering what the smartest way is to group my mysql results... I have the following table structure:
- id
- userId
- status (values from 1-100)
Lets say with the following content:
1 | 1 | 10
2 | 1 | 10
3 | 1 | 15
4 | 2 | 15
5 | 3 | 10
Now I want to group all results by user but only for each status. So the results im looking for should be:
1 | 1 | 10
3 | 1 | 15
4 | 2 | 15
5 | 3 | 10
Hope you understand want im looking for...
Best
Tassilo
If you need the id, then a GROUPing query is needed; this will produce the results you shown:
SELECT MIN(id), userId, status
FROM your_table
GROUP BY userId, status
;
If you don't need the id, then GROUPing is not the best tool, use DISTINCT instead; like so:
SELECT DISTINCT userId, status
FROM your_table
;
The topic of this question say "Group only if next row is the same" in that case I would do something like this:
create table USER_(id integer, UserId integer, status integer);
insert into USER_ values(1,1,10);
insert into USER_ values(2,1,10);
insert into USER_ values(3,1,115);
insert into USER_ values(4,2,115);
insert into USER_ values(5,3,10);
insert into USER_ values(6,1,10);
select min(a.id)as id, a.userId, a.status ,count(*) from USER_ a join USER_ b
on a.userid = b.userid and a.id = b.id-1 group by a.userId,a.status;
id | userid | status | count
-----+--------+--------+-------
1 | 1 | 10 | 2
If I look at the explanation for the question here then, I would do something like this:
select min(a.id) as id, a.userId, a.status from USER_ a
group by a.userId,a.status order by a.userid,status;
id | userid | status
----+--------+--------
1 | 1 | 10
3 | 1 | 15
4 | 2 | 15
5 | 3 | 10
Please correct if I have a wrong understanding of the question

MySQL top 2 records per group

Basically I need to get only the last 2 records for each user, considering the last created_datetime:
id | user_id | created_datetime
1 | 34 | '2015-09-10'
2 | 34 | '2015-10-11'
3 | 34 | '2015-05-23'
4 | 34 | '2015-09-13'
5 | 159 | '2015-10-01'
6 | 159 | '2015-10-02'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
Returns (expected output):
2 | 34 | '2015-10-11'
1 | 34 | '2015-09-10'
7 | 159 | '2015-10-03'
8 | 159 | '2015-10-06'
I was trying with this idea:
select user_id, created_datetime,
$num := if($user_id = user_id, $num + 1, 1) as row_number,
$id := user_id as dummy
from logs group by user_id
having row_number <= 2
The idea is keep only these top 2 rows and remove all the others.
Any ideas?
Your idea is close. I think this will work better:
select u.*
from (select user_id, created_datetime,
$num := if(#user_id = user_id, #num + 1,
if(#user_id := id, 1, 1)
) as row_number
from logs cross join
(select #user_id := 0, #num := 0) params
order by user_id
) u
where row_number <= 2 ;
Here are the changes:
The variables are set in only one expression. MySQL does not guarantee the order of evaluation of expressions, so this is important.
The work is done in a subquery, which is then processed in the outer query.
The subquery uses order by, not group by.
The outer query uses where instead of having (actually, in MySQL having would work, but where is more appropriate).

SQL: transform rows into columns in MySQL (SELECT statement)

I got table orders and order_comments. Each order can have from 0 to n comments. I would like to get list of all orders with their comments in a sepcific order.
Table orders:
order_id | order_nr
1 | 5252
4 | 6783
5 | 6785
Table order_comments
id_order_comments | order_fk | created_at | email | content
1 | 4 | 2015-01-12 | jack | some text here
2 | 5 | 2015-01-13 | marta | some text here
3 | 5 | 2015-01-14 | beata | some text here
4 | 4 | 2015-01-16 | julia | some text here
As a result, I would like to get 1 row for each order. Comments should be shown in separate columns, starting from the oldest comment. So desired output in this case is:
order_id | 1_comment_created_at | 1_comment_author | 1_comment_content | 2_comment_created_at | 2_comment_author | 2_comment_content
1 | NULL | NULL | NULL | NULL | NULL | NULL
4 | 2015-01-12 | jack | some text here | 2015-01-16 | Julia | some text here
5 | 2015-01-13 | marta | some text here | 2015-01-14 | beata | some text here
I found this: MySQL - Rows to Columns - but I cannot use 'create view'.
I found this: http://dev.mysql.com/doc/refman/5.5/en/while.html - but I cannot create procedure in this db.
What I got:
SELECT #c := (SELECT count(*) FROM order_comments GROUP BY order_fk ORDER BY count(*) DESC LIMIT 1);
SET #rank=0;
SET #test=0;
SELECT
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.created_at END AS created,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.author END AS author,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.content END AS content
/*But I cannot set #test as +1. And I cannot name column with variable - like CONCAT(#test, '_created')*/
FROM (
SELECT #rank := #rank +1 AS comment_id, created_at, author, content
FROM order_comments
WHERE order_fk = 4
ORDER BY created_at
) AS temp
Problem: I would like to search more than 1 order. I should get orders with no comments too.
What can I do?
You can use variables for this type of pivot, but the query is a bit more complicated, because you need to enumerate the values for each order:
SELECT o.order_id,
MAX(case when rank = 1 then created_at end) as created_at_1,
MAX(case when rank = 1 then email end) as email_1,
MAX(case when rank = 1 then content end) as content_1,
MAX(case when rank = 2 then created_at end) as created_at_2,
MAX(case when rank = 2 then email end) as email_2,
MAX(case when rank = 2 then content end) as content_2,
FROM orders o LEFT JOIN
(SELECT oc.*,
(#rn := if(#o = order_fk, #rn + 1,
if(#o := order_fk, 1, 1)
)
) as rank
FROM order_comments oc CROSS JOIN
(SELECT #rn := 0, #o := 0) vars
ORDER BY order_fk, created_at
) oc
ON o.order_id = oc.order_fk
GROUP BY o.order_id;

Sql to find timediff between two rows based on ID

The subject of the question is not very explanatory, sorry for that.
Ya so the question follows:
I have a database structure as below where pk is primary key, id
is something which is multiple for many rows.
+------+------+---------------------+
| pk | id | value |
+------+------+---------------------+
| 99 | 1 | 2013-08-06 11:10:00 |
| 100 | 1 | 2013-08-06 11:15:00 |
| 101 | 1 | 2013-08-06 11:20:00 |
| 102 | 1 | 2013-08-06 11:25:00 |
| 103 | 2 | 2013-08-06 15:10:00 |
| 104 | 2 | 2013-08-06 15:15:00 |
| 105 | 2 | 2013-08-06 15:20:00 |
+------+------+---------------------+
What is really need to get is, value difference between first two rows (which is ordered by value) for each
group (where group is by id). So according to above structure I need
timediff(value100, value99) [ which is for id 1 group]
and timediff(value104, value103) [ which is for id 2 group]
i.e. value difference of time ordered by value for 1st two rows in each group.
One way i can think to do is by 3 self joins (or 3 sub queries) so as to find the
first two in 2 of them , and third query subtracting it. Any suggestions?
try this.. CTE is pretty powerfull!
WITH CTE AS (
SELECT
value, pk, id,
rnk = ROW_NUMBER() OVER ( PARTITION BY id order by id DESC)
, rownum = ROW_NUMBER() OVER (ORDER BY id, pk)
FROM test
)
SELECT
curr.rnk, prev.rnk, curr.rownum, prev.rownum, curr.pk, prev.pk, curr.id, prev.id, curr.value, prev.value, curr.value - prev.value
FROM CTE curr
INNER JOIN CTE prev on curr.rownum = prev.rownum -1 and curr.id = prev.id
and curr.rnk <=1
Looks a bit wierd... But you can try this way
SET #previous = 0;
SET #temp = 0;
SET #tempID = 0;
Above step may not be needed .. But just to make sure nothing goes wrong
SELECT pkid, id, diff, valtemp FROM (
SELECT IF(#previousID = id, #temp := #temp + 1, #temp := 1) occ, #previousID := id,
TIMEDIFF(`value`, #previous) diff, pk, id, `value`, #previous := `value`
FROM testtable) a WHERE occ = 2
Demo on sql fiddle

MySQL: group by consecutive days and count groups

I have a database table which holds each user's checkins in cities. I need to know how many days a user has been in a city, and then, how many visits a user has made to a city (a visit consists of consecutive days spent in a city).
So, consider I have the following table (simplified, containing only the DATETIMEs - same user and city):
datetime
-------------------
2011-06-30 12:11:46
2011-07-01 13:16:34
2011-07-01 15:22:45
2011-07-01 22:35:00
2011-07-02 13:45:12
2011-08-01 00:11:45
2011-08-05 17:14:34
2011-08-05 18:11:46
2011-08-06 20:22:12
The number of days this user has been to this city would be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).
I thought of doing this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)
Then, for the number of visits this user has made to this city, the query should return 3 (30.06-02.07, 01.08, 05.08-06.08).
The problem is that I have no idea how shall I build this query.
Any help would be highly appreciated!
You can find the first day of each visit by finding checkins where there was no checkin the day before.
select count(distinct date(start_of_visit.datetime))
from checkin start_of_visit
left join checkin previous_day
on start_of_visit.user = previous_day.user
and start_of_visit.city = previous_day.city
and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)
where previous_day.id is null
There are several important parts to this query.
First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.
Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)
Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:
select count(distinct date(datetime))
from checkin
where user='some user' and city='some city'
Try to apply this code to your task -
CREATE TABLE visits(
user_id INT(11) NOT NULL,
dt DATETIME DEFAULT NULL
);
INSERT INTO visits VALUES
(1, '2011-06-30 12:11:46'),
(1, '2011-07-01 13:16:34'),
(1, '2011-07-01 15:22:45'),
(1, '2011-07-01 22:35:00'),
(1, '2011-07-02 13:45:12'),
(1, '2011-08-01 00:11:45'),
(1, '2011-08-05 17:14:34'),
(1, '2011-08-05 18:11:46'),
(1, '2011-08-06 20:22:12'),
(2, '2011-08-30 16:13:34'),
(2, '2011-08-31 16:13:41');
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT v.user_id,
COUNT(DISTINCT(DATE(dt))) number_of_days,
MAX(days) number_of_visits
FROM
(SELECT user_id, dt
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS days,
#last_dt := DATE(dt),
#last_user := user_id
FROM
visits
ORDER BY
user_id, dt
) v
GROUP BY
v.user_id;
----------------
Output:
+---------+----------------+------------------+
| user_id | number_of_days | number_of_visits |
+---------+----------------+------------------+
| 1 | 6 | 3 |
| 2 | 2 | 1 |
+---------+----------------+------------------+
Explanation:
To understand how it works let's check the subquery, here it is.
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT user_id, dt,
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS
days,
#last_dt := DATE(dt) lt,
#last_user := user_id lu
FROM
visits
ORDER BY
user_id, dt;
As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -
+---------+---------------------+------+------------+----+
| user_id | dt | days | lt | lu |
+---------+---------------------+------+------------+----+
| 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 |
| 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 |
| 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 |
| 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 |
| 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 |
| 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 |
+---------+---------------------+------+------------+----+
Then we group this data set by user and use aggregate functions:
'COUNT(DISTINCT(DATE(dt)))' - counts the number of days
'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.
That is all;)
As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the #LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the #LUser and #LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of
User ID Distinct Visits Total Days
1 3 9
2 1 2
select PreQuery.User_ID,
sum( PreQuery.NextVisit ) as DistinctVisits,
count(*) as TotalDays
from
( select v.user_id,
if( #LUser <> v.User_ID OR #LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,
#LUser := v.user_id,
#LDate := date( v.dt )
from
Visits v,
( select #LUser := -1, #LDate := date(now()) ) AtVars
order by
v.user_id,
v.dt ) PreQuery
group by
PreQuery.User_ID
for a first sub-task:
select count(*)
from (
select TO_DAYS(p.d)
from p
group by TO_DAYS(p.d)
) t
I think you should consider changing database structure. You could add table visits and visit_id into your checkins table. Each time you want to register new checkin you check if there is any checkin a day back. If yes then you add a new checkin with visit_id from yesterday's checkin. If not then you add new visit to visits and new checkin with new visit_id.
Then you could get you data in one query with something like that:
SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city
It's not very optimal but still better than doing anything with current structure and it will work. Also if results can be separate queries it will work very fast.
But of course drawbacks are you will need to change database structure, do some more scripting and convert current data to new structure (i.e. you will need to add visit_id to current data).