First of all: sorry for the title, but maybe I will find a better one later.
I asked this some minutes ago, but since I was not able to describe what I want I try it again :)
Here is my table structure:
http://sqlfiddle.com/#!2/b25f9/37
The table is used to store user sessions.
Out of this I would like to generate a stacked bar chart that should show how many active users I have. My idea was that I group the users based on their online-times of the last days like this
Lets say its friday:
Group B: Users that were online thursday (and today)
Group C: Users that were not online thursday but wednesday (and today)
Group D: Users that were not online thursday or wednesday but tuesday (and today)
Group E: Users that were not online thursday, wednesday or tuesday but last monday, sunday or saturday (and today)
Group A: Users that do not match the other groups (but were only today)
I only want to know the number of users in those groups (for a specific day)
a user can only be in ONE of these groups (for the same day)
Another Update: Accidently (by copy&paste) had starttime = ... or starttime = ... but it should be starttime = ... or endtime = ...
UPDATE:
To explain my query in more detail (in the final query there are even more comments):
First we simply got
SELECT
...
FROM gc_sessions s
WHERE DATE(starttime) = CURDATE() OR DATE(endtime) = CURDATE()
That's nothing more like saying "give me all users whose session started today or ended today". Having to consider those two times again and again makes the query a bit clumsy, but actually it's not that complicated.
So, usually we would use the COUNT() function to count something, obviously, but since we want "conditional counting", we simply use the SUM() function and tell it when to add 1 and when not.
SUM (CASE WHEN ... THEN 1 ELSE 0 END) AS a_column_name
The SUM() function examines now each row in the result set of sessions from today. So for each user in this result set we look if this user was online the date we specify. It doesn't matter how many times he/she was online, so for performance reasons we use EXISTS. With EXISTS you can specify a subquery which stops as soon as something is found, so it doesn't matter what it returns when something is found, as long as it's not NULL. So don't get confused why I selected 1. In the subquery we have to connect the user which is currently examined from the outer query with the user from the inner query (subquery) and specify the time window. If all criterias meet count 1 else 0 like explained before.
SUM(CASE WHEN
EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 1 DAY)
OR (date(endtime) = CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS todayAndYesterday,
Then we make a column for each condition and voila, you have all you need in one query. So with your updated question your criteria has changed, we just have to add more rules:
SELECT
/*this is like before*/
SUM(CASE WHEN
EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 1 DAY)
OR (date(endtime) = CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS FridayAndThursday,
SUM(CASE WHEN
EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 2 DAY)
OR (date(endtime) = CURDATE() - INTERVAL 2 DAY)))
/*this one here is a new addition, since you don't want to count the users that were online yesterday*/
AND NOT EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 1 DAY)
OR (date(endtime) = CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS FridayAndWednesdayButNotThursday,
SUM(CASE WHEN
EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 3 DAY) /* minus 3 days to get tuesday*/
OR (date(endtime) = CURDATE() - INTERVAL 3 DAY)))
/*this is the same as before, we check again that the user was not online between today and tuesday, but this time we really use BETWEEN for convenience*/
AND NOT EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) BETWEEN CURDATE() - INTERVAL 2 DAY AND CURDATE() - INTERVAL 1 DAY)
OR (date(endtime) BETWEEN CURDATE() - INTERVAL 2 DAY AND CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS FridayAndTuesdayButNotThursdayAndNotWednesday,
.../*and so on*/
FROM gc_sessions s
WHERE DATE(starttime) = CURDATE() OR DATE(endtime) = CURDATE()
So, I hope you get the idea now. Any more questions? Feel free to ask.
end of update
Answer to previous version of question:
select
SUM(CASE WHEN EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) = CURDATE() - INTERVAL 1 DAY)
OR (date(starttime) = CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS todayAndYesterday,
SUM(CASE WHEN EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) BETWEEN CURDATE() - INTERVAL 2 DAY AND CURDATE() - INTERVAL 1 DAY)
OR (date(starttime) BETWEEN CURDATE() - INTERVAL 2 DAY AND CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS todayAndYesterdayOrTheDayBeforeYesterday,
SUM(CASE WHEN EXISTS (SELECT 1 FROM gc_sessions sub_s WHERE s.user = sub_s.user
AND ((date(starttime) BETWEEN CURDATE() - INTERVAL 7 DAY AND CURDATE() - INTERVAL 1 DAY)
OR (date(starttime) BETWEEN CURDATE() - INTERVAL 7 DAY AND CURDATE() - INTERVAL 1 DAY)))
THEN 1 ELSE 0 END) AS todayAndWithinTheLastWeek
from gc_sessions s
where date(starttime) = CURDATE()
or date(endtime) = CURDATE()
Instead of relying on session table, I suggest you to create separate table, which stores 2 fields, date and user_id.
Every time user logs-in you need to insert new entry into this table.
This way you will be able to retrieve all the 3 requirement of yours.
Example table:
CREATE TABLE `test`.`user_login_history` (
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`userid` INTEGER UNSIGNED NOT NULL,
`date` DATETIME NOT NULL,
PRIMARY KEY (`id`)
)
ENGINE = InnoDB;
Once a user login, check whether he/she has login today or not:
select count(*) from user_login_history where
userid = 1 and `date` = '2013-01-28 00:00:00';
If the returned value is 1, means he/she has login today. no changes needed.
but, if the returned value is 0, means he/she has not login today. So record it down.
insert into user_login_history(userid,`date`)values(1,'2013-01-28 00:00:00');
Q1. How many users were online TODAY that were also online YESTERDAY?
select count(*) from user_login_history u where
u.`date` = '2013-01-28 00:00:00' and
(
select count(*) from user_login_history v where
v.`date` = '2013-01-27 00:00:00' and
v.userid = u.userid
) = 1;
Q2. How many users were online TODAY that were also online within in the last TWO DAYS
select count(*) from user_login_history u where
u.`date` = '2013-01-28 00:00:00' and
(
select count(*) from user_login_history v where
v.`date` >= '2013-01-26 00:00:00' and
v.`date` <= '2013-01-27 00:00:00' and
v.userid = u.userid
) > 0;
Q3. How many users were online TODAY that were also online within the last 7 DAYS
select count(*) from user_login_history u where
u.`date` = '2013-01-28 00:00:00' and
(
select count(*) from user_login_history v where
v.`date` >= '2013-01-21 00:00:00' and
v.`date` <= '2013-01-27 00:00:00' and
v.userid = u.userid
) > 0;
For yesterday
select id from gc_sessions where id in
(
select id
from gc_sessions
where starttime > subdate(current_date, 2)
and endtime < subdate(current_date, 1)
)
and starttime > subdate(current_date, 1);
For 2 Days
select id from gc_sessions where id in
(
select id
from gc_sessions
where starttime > subdate(current_date, 3)
and endtime < subdate(current_date, 1)
)
and starttime > subdate(current_date, 1);
For 7 Days
select id from gc_sessions where id in
(
select id
from gc_sessions
where starttime > subdate(current_date, 8)
and endtime < subdate(current_date, 1)
)
and starttime > subdate(current_date, 1);
You need to add a subquery that loads the data from the specified range (eg, 1day/2day/7days) and compares it with the data for the current day.
set #range = 7;
select * from gc_sessions
WHERE user in (SELECT user from gc_sessions
where starttime between subdate(current_date, #range) AND subdate(current_date, 1))
AND starttime > subdate(current_date, 0)
Where #range holds information about the number of days. See your expanded sql fiddle at - http://sqlfiddle.com/#!2/9584b/24
SELECT today.user
, GROUP_CONCAT(DISTINCT today.ip) ip
FROM gc_sessions today
JOIN gc_sessions yesterday
ON DATE(yesterday.starttime) = DATE(today.starttime) - INTERVAL 1 DAY
AND today.user = yesterday.user
WHERE DATE(today.starttime) = '2013-01-10'
GROUP
BY today.user;
Related
I am trying to run this query against tickets table. ticket_updates table contains rows matching tickets.ticketnumber = ticket_updates.ticketnumber
I want to check for rows in tickets where the last row in ticket_updates.datetime is >= 1 hour ago.
The problem with the below is that it's picking up rows from ticket_updates where datetime is over 1 hour ago, because its in my WHERE clause, so it's completely ignoring the most recent row which in fact is only 10 minutes ago.
So I think I need to remote the datetime from my WHERE clause, but I'm not sure what to add to make it work.
SELECT * FROM tickets WHERE
(
status = 'Pending Response' AND
ticketnumber IN
(
SELECT ticketnumber FROM ticket_updates WHERE
type = 'customer_reminder_flag' AND
datetime < NOW() - INTERVAL 2 DAY
)
) OR
(
status = 'Pending Completion' AND
ticketnumber IN (
SELECT ticketnumber FROM ticket_updates WHERE
type = 'update' AND
datetime < NOW() - INTERVAL 1 HOUR
ORDER BY datetime DESC
)
)
You can re-write your query using EXISTS as follows:
SELECT t.*
FROM tickets t join ticket_updates tu on t.ticketnumber = tu.ticketnumber
WHERE t.status = 'Pending Completion'
AND tu.type = 'update'
AND tu.datetime < NOW() - INTERVAL 1 HOUR
AND NOT EXISTS
(SELECT 1 FROM ticket_updates tuu
WHERE tu.ticketnumber = tuu.ticketnumber
AND tuu.type = 'update'
AND tuu.datetime < NOW() - INTERVAL 1 HOUR
AND tuu.datetime > tu.datetime
)
If you are running on mysql 8.0+ then you can use analytical function as follows:
SELECT * FROM
(SELECT t.*, row_number() over (partition by tu.ticketnumber order by tu.datetime) as rn
FROM tickets t join ticket_updates tu on t.ticketnumber = tu.ticketnumber
WHERE t.status = 'Pending Completion'
AND tu.type = 'update'
AND tu.datetime < NOW() - INTERVAL 1 HOUR) t
WHERE RN = 1
I want to check for rows in tickets where the last row in ticket_updates.datetime is >= 1 hour ago.
For this problem statement, the code would use not exists:
select t.*
from tickets t
where not exists (select 1
from ticket_updates tu
where tu.ticketnumber = t.ticketnumber and
tu.datetime > now() - interval 1 hour
);
This returns tickets that have had more than one hour since the last update.
It is unclear to me what this problem statement has to do with the code you have shown.
I have complex query that I wrote partly as MySQL database view and partly as ActiveRecord logic in Rails. Each record has it's own priority from 0-4 where 4 is top priority.
I'm using Kaminari for pagination and I'm wondering if there's a way to show per page sets of records with some extra rules:
Show all #4 priority rows on first page
Take per_page number and show priority 3 with this formula: 0.3*per_page
Then do the same with priority 2
Then if all 3 steps didn't produced 100% of per_page show the rest with priority 0 and 1
How could I achieve result by using Rails. Or is it better to implement it directly in SQL?
Here is sample of my db view:
select *
from (
select
s.id as source_id,
'Spree::Store' as source_type,
(case when (s.created_at >= curdate() - INTERVAL DAYOFWEEK(curdate())+6 DAY AND s.created_at < curdate() - INTERVAL DAYOFWEEK(curdate())-1 DAY)
then
'new'
else
'old'
end) as sub_type,
1 as priority,
s.created_at as created_at,
s.updated_at as updated_at,
null as owner_id
from spree_stores as s
where s.image_id is not NULL and s.is_hidden = false
union
select
e.id as source_id,
'Event' as source_type,
(case
when (e.status = 1 and e.is_featured is false)
then
'live'
when (e.is_featured = true)
then
'featured'
else
case when (e.created_at >= curdate() - INTERVAL DAYOFWEEK(curdate())+6 DAY AND e.created_at < curdate() - INTERVAL DAYOFWEEK(curdate())-1 DAY)
then
'new'
else
'old'
end
end) as sub_type,
(case
when (e.status = 1 or e.is_featured is true)
then
3
else
1
end) as priority,
e.created_at as created_at,
e.updated_at as updated_at,
null as owner_id
from events as e
where e.status >= 1 and e.expires_at >= curdate()
union
select
o.id as source_id,
'Spree::Order' as source_type,
(case when (o.created_at >= curdate() - INTERVAL DAYOFWEEK(curdate())+6 DAY AND o.created_at < curdate() - INTERVAL DAYOFWEEK(curdate())-1 DAY)
then
'new'
else
'old'
end) as sub_type,
1 as priority,
o.created_at as created_at,
o.updated_at as updated_at,
o.user_id as owner_id
from spree_orders as o
where o.user_id is not NULL and o.share is true and o.state = 'complete' and o.completed_at is not NULL
union
select
p.id as source_id,
'Spree::Product' as source_type,
(case when (p.created_at >= curdate() - INTERVAL DAYOFWEEK(curdate())+6 DAY AND p.created_at < curdate() - INTERVAL DAYOFWEEK(curdate())-1 DAY)
then
'new'
else
'old'
end) as sub_type,
1 as priority,
p.created_at as created_at,
p.updated_at as updated_at,
null as owner_id
from spree_products as p
join spree_variants as sv on (sv.product_id = p.id and sv.is_master = true)
join spree_assets as sa on (sa.viewable_id = sv.id and sa.viewable_type = 'Spree::Variant')
where p.deleted_at is NULL
group by p.id
) a
order by priority desc, created_at desc;
This is the result I'm getting (only few lines not all 200 results):
This sounds like more complex logic than Kaminari is built for and probably worth doing it yourself. Kaminari is certainly convenient for knocking out a quick pagination UI, but it really doesn't add a huge amount of value compared to rolling your own solution. You might be able to hack it to fit your needs, but that's probably more headache than just doing it yourself.
I'm also a little skeptical the complex algorithm you're wanting is really going to benefit users. Only you know that for sure, but you might want to consider a simple "score" or "rank" column and then just use Kaminari with a query sorted by score desc.
I have table ORDERS where is stored data about orders with their status and the date of order. I would like to search all orders with specified status and which was made yesterday after 3pm untill today 4pm. The query will run in different times (10am, 3pm, 5 pm... regardless).
So on example: if I run the query today (13.05.2014) I would like to get all orders made from 2014-12-05 15:00:00 untill 13-05-2015 16:00:00
The date is stored in format: YYYY-MM-DD HH:MM:SS
What I got is:
select *
from orders
where status = 'new'
and (
(
date_add(created_at, INTERVAL 1 day) = CURRENT_DATE()
and hour(created_at) >= 15
) /*1*/
or (
date(created_at) = CURRENT_DATE()
and hour(created_at) <= 16
) /*2*/
)
And I get only orders made today - like only the 2nd condition was taken into account.
I prefer not to use created >= '2014-05-12 16:00:00' (I will not use this query, someone else will).
When you add an interval of 1 day to the date/time, you still keep the time component. Use date() for the first condition:
where status = 'new' and
((date(date_add(created_at, INTERVAL 1 day)) = CURRENT_DATE() and
hour(created_at) >= 15
) /*1*/ or
(date(created_at) = CURRENT_DATE() and
hour(created_at) <= 16
) /*2*/
)
And alternative method is:
where status = 'new' and
(created_at >= date_add(CURRENT_DATE(), interval 15-24 hour) and
created_at <= date_add(CURRENT_DATE(), interval 16 hour)
)
The advantage of this approach is that all functions are moved to CURRENT_DATE(). This would allow MYSQL to take advantage of an index on created_at.
I'm using MySQL 5.0, and I need to fine tune this query. Can anyone please tell me what tuning I can do in this?
SELECT DISTINCT(alert_master_id) FROM alert_appln_header
WHERE created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
AND alert_master_id NOT IN (
SELECT DISTINCT(alert_master_id) FROM alert_details
WHERE end_date IS NULL AND created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
UNION
SELECT DISTINCT(alert_master_id) FROM alert_sara_header
WHERE sara_master_id IN
(SELECT alert_sara_master_id FROM alert_sara_lines
WHERE end_date IS NULL) AND created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
) LIMIT 5000;
The first thing that I'd do is rewrite the subqueries as joins:
SELECT h.alert_master_id
FROM alert_appln_header h
JOIN schedule_config c
ON c.schedule_name = 'Purging_Config'
LEFT JOIN alert_details d
ON d.alert_master_id = h.alert_master_id
AND d.end_date IS NULL
AND d.created_date < CURRENT_DATE - INTERVAL c.parameters DAY
LEFT JOIN (
alert_sara_header s
JOIN alert_sara_lines l
ON l.alert_sara_master_id = s.sara_master_id
)
ON s.alert_master_id = h.alert_master_id
AND s.end_date IS NULL
AND s.created_date < CURRENT_DATE - INTERVAL c.parameters DAY
WHERE h.created_date < CURRENT_DATE - INTERVAL c.parameters DAY
AND d.alert_master_id IS NULL
AND s.alert_master_id IS NULL
GROUP BY h.alert_master_id
LIMIT 5000
If it's still slow after that, re-examine your indexing strategy. I'd suggest indexes over:
alert_appln_header(alert_master_id,created_date)
schedule_config(schedule_name)
alert_details(alert_master_id,end_date,created_date)
alert_sara_header(sara_master_id,alert_master_id,end_date,created_date)
alert_sara_lines(alert_sara_master_id)
OK, this may be just a shot in the dark, but I think you don't need as many DISTINCT here.
SELECT DISTINCT(alert_master_id) FROM alert_appln_header
WHERE created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
AND alert_master_id NOT IN (
-- removed distinct here --
SELECT alert_master_id FROM alert_details
WHERE end_date IS NULL AND created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
UNION
-- removed distinct here --
SELECT alert_master_id FROM alert_sara_header
WHERE sara_master_id IN
(SELECT alert_sara_master_id FROM alert_sara_lines
WHERE end_date IS NULL)
AND created_date < DATE_SUB(CURDATE(), INTERVAL (SELECT parameters FROM schedule_config WHERE schedule_name = "Purging_Config") DAY)
) LIMIT 5000;
Since using the DISTINCT is very costly, try to avoid it. In the first WHERE clause you are checking for ids that are NOT within some result, so it shouldn't matter if in that result some ids appear more than once.
There is a query which brings back sales data for the last 7 days.
How to get the sales of the last 30 days as well (to see the sales for the last 7 days AND the last 30 days in the results)?
SELECT
a.row_id,
MAX(ad.new_value) - MIN(ad.new_value) AS sales7days
FROM
_audit a
LEFT JOIN _audit_data ad
ON a.audit_id = ad.audit_id
WHERE ad.col = 'sales'
AND a.triggered_datetime > NOW() - INTERVAL 7 DAY
GROUP BY a.row_id
ORDER BY sales7days DESC;
Perhaps with a CASE expression:
SELECT a.row_id
, MAX(case when a.triggered_datetime > NOW() - INTERVAL 7 DAY
then ad.new_value else NULL end)
- MIN(case when a.triggered_datetime > NOW() - INTERVAL 7 DAY
then ad.new_value else NULL end) AS sales7days
, MAX(case when a.triggered_datetime > NOW() - INTERVAL 30 DAY
then ad.new_value else NULL end)
- MIN(case when a.triggered_datetime > NOW() - INTERVAL 30 DAY
then ad.new_value else NULL end) AS sales30days
FROM _audit a, _audit_data ad
WHERE a.audit_id = ad.audit_id AND ad.col = 'sales'
GROUP BY a.row_id;
SELECT
d7.row_id,
d7.salesdays, d30.salesdays
FROM
(
Select a.row_id, MAX(ad.new_value) - MIN(ad.new_value) AS salesdays
From _audit a
LEFT JOIN _audit_data ad ON a.audit_id = ad.audit_id
WHERE ad.col = 'sales' AND a.triggered_datetime > NOW() - INTERVAL 7 DAY
GROUP BY a.row_id
) d7,
(
Select a.row_id, MAX(ad.new_value) - MIN(ad.new_value) AS salesdays
From _audit a
LEFT JOIN _audit_data ad ON a.audit_id = ad.audit_id
WHERE ad.col = 'sales' AND a.triggered_datetime > NOW() - INTERVAL 30 DAY
GROUP BY a.row_id
) d30
where d7.row_id = d30.row_id
ORDER BY sales7days DESC;
assume you want the same row id for both - and either value to show, you may or may not want to make it inner or outer joined and/or COALESCE the value fields (don't know enough about the data).