MySQL version: 5.7
Here is users table:
+------------+------+
| date | uid |
+------------+------+
| 2020-06-29 05:00:00 | 352 |
| 2020-06-29 08:00:00 | 354 |
| 2020-06-29 09:25:53 | 354 |
| 2020-06-30 08:00:00 | 863 |
| 2020-06-30 09:00:01 | 352 |
| 2020-06-30 09:59:59 | 352 |
| 2020-07-01 07:00:00 | 358 |
| 2020-07-01 09:00:00 | 358 |
+------------+------+
I want to count the number of new visitors per day,But there is an important condition here that new visitors of the day cannot be visited before.
I want the result:
Result:
+------------+------------------+
| date | new_user_count |
+------------+------------------+
| 2020-06-29 | 2 |
| 2020-06-30 | 1 |
| 2020-07-01 | 1 |
+------------+------------------+
The above result is equivalent to these three sql:
2020-06-29 (352,354) : select count( distinct uid ) as new_user_count from users where DATE(date) = '2020-06-29' and uid not in ( select distinct uid from users where date < '2020-06-29 05:00:00'); #2
2020-06-30 (863): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-06-30' and uid not in ( select distinct uid from users where date < '2020-06-30 08:00:00'); # 1
2020-07-01 (358): select count( distinct uid ) as new_user_count from users where DATE(date)= '2020-07-01' and uid not in ( select distinct uid from users where date < '2020-07-01 07:00:00'); # 1
I haven't thought of it until now, thanks
Here is Online users table
You could try using a correlated subquery to check if each user visit be the first or not:
SELECT
date,
SUM(CASE WHEN NOT EXISTS (SELECT 1 FROM users u2
WHERE u2.date < u1.date AND u2.uid = u1.uid)
THEN 1 ELSE 0 END) AS new_user_count
FROM
(SELECT DISTINCT date, uid FROM users) u1
GROUP BY
date;
Demo
The above logic actually reads straightforward, and says to count a user record only if we cannot find that same user appearing in the table at some later date. Note that I use distinct selects, because it appears that in your data a given user might appear more than once on the same date. This data would spoof the above correlated subquery, so we ensure that a given user appear only once on a given date (and besides, one user can only be counted once per day anyway).
SELECT
date,
(
SELECT COUNT(DISTINCT u1.uid)
FROM users u1
WHERE NOT EXISTS(
SELECT * FROM users u2
WHERE u2.uid = u1.uid AND u2.date < u0.date
) AND u1.date = u0.date
)
FROM
users u0
GROUP BY
date
;
-- get date and the amount of distinct users
SELECT date, COUNT(DISTINCT uid)
-- from users table
FROM users
-- only when there not exists a row
WHERE NOT EXISTS ( SELECT NULL -- may use any literal value instead of NULL
-- in the table
FROM users u
-- with this user id
WHERE users.uid = u.uid
-- but earlier (less) date
AND users.date > u.date )
GROUP BY date;
Related
I am looking to take two tables I have a perform a data transformation to create a single table. I have an events table and user table:
Events: {id, user_id, start_date, end_date, cost...}
Users: {id, name, ...}
I am trying to create a table at that shows user spend at a daily level, assuming the user start with a starting cost of zero and it goes up after every event.
The intended output would be:
{date, userid, beginning_balance, sum(cost), num_of_events}
i need some direction on how to tackle this one as I am not very familiar with data transformation within SQL
Your requirement is a bit unclear but you may be after something like this
drop table if exists event;
create table event(id int auto_increment primary key, user_id int,start_date date, end_date date, cost int);
insert into event (user_id,start_date , end_date, cost) values
(1,'2017-01-01','2017-01-01',10),(1,'2017-01-01','2017-01-01',10),
(1,'2017-02-01','2017-01-01',10),
(2,'2017-01-01','2017-01-01',10);
select e.user_id,start_date,
ifnull(
(select sum(cost)
from event e1
where e1.user_id = e.user_id and e1.start_date <e.start_date
), 0 )beginning_balance,
sum(cost),count(*)
as num_of_events
from users u
join event e on e.user_id = u.userid
group by e.user_id,start_date
+---------+------------+-------------------+-----------+---------------+
| user_id | start_date | beginning_balance | sum(cost) | num_of_events |
+---------+------------+-------------------+-----------+---------------+
| 1 | 2017-01-01 | 0 | 20 | 2 |
| 1 | 2017-02-01 | 20 | 10 | 1 |
| 2 | 2017-01-01 | 0 | 10 | 1 |
+---------+------------+-------------------+-----------+---------------+
3 rows in set (0.03 sec)
Can you try this query
SELECT
data,
User.id AS userid,
0 AS beginning_balance,
SUM(cost) AS cost,
COUNT(0) AS num_of_events
FROM
Users
LEFT JOIN Events ON (user_id = Users.id)
GROUP BY
Users.id
The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)
I have table like this:
CreateDate | UserID
2012-01-1 | 1
2012-01-10 | 2
2012-01-20 | 3
2012-02-2 | 4
2012-02-11 | 1
2012-02-22 | 2
2012-03-5 | 3
2012-03-13 | 4
2012-03-17 | 5
I need the query to show UserID which created after 1 February 2013 and not exist in database befor 1 February 2013
From the above example the result must be:
CreateDate | UserID
2012-02-2 | 4
2012-03-13 | 4
2012-03-17 | 5
Can it resolved only in single query without Stored Procedure?
You can use a subquery which separately gets the UserID which exist before Feb. 2, 2013 and the result of the subquery is then joined back on the table itself using LEFT JOIN.
SELECT a.*
FROM tableName a
LEFT JOIN
(
SELECT UserID
FROM tableName
WHERE CreateDate < '2013-02-01'
) b ON a.userID = b.userID
WHERE a.CreateDate > '2013-02-01' AND
b.userID IS NULL
SQLFiddle Demo
for faster performance, add an INDEX on column userID.
SQL JOIN vs IN performance?
This is one way to do it:
select
CreateDate,
UserID
from Users
where CreateDate >= '2012-02-01'
and UserId not in (
select UserId
from Users
where CreateDate < '2012-02-01'
)
SqlFiddle link: http://www.sqlfiddle.com/#!2/7efae/2
I have a table containing the logging of a web app which tracks when people log in. An example of my table is:
| user_id | date_time |
+---------+------------------+
| 0033 | 2012-11-22 10:33 | <- first login of 0033 on 2012-11-22
| 0034 | 2012-11-22 10:38 | <- first login of 0034 on 2012-11-22
| 0052 | 2012-11-22 10:43 | <- first login of 0052 on 2012-11-22
| 0052 | 2012-11-23 09:23 |
| 0066 | 2012-11-23 15:58 | <- first login of 0066 on 2012-11-23
| 0033 | 2012-11-23 16:14 |
The thing I want is a table with the amount of people that logged in for the first time on each date, i.e.:
| count | date |
+-------+------------+
| 3 | 2012-11-22 | <- there were 3 users that logged in for the first time on 2012-11-22
| 1 | 2012-11-23 |
I know I can get the date only, by doing
SELECT DATE(`date_time`) AS `date`
FROM `logging`
GROUP BY `date`
ORDER BY `date` ASC
I would like to get the second table in one query, I know it's possible, I just don't know how. Thanks in advance
You can use an uncorrelated subquery to get the first login date for every user and then group those dates together to get the number of first logins per day.
SELECT dd, COUNT(*)
FROM (SELECT MIN(DATE(`date_time`)) AS dd
FROM `logging`
GROUP BY `user_id`) a
GROUP BY dd
ORDER BY dd;
Demo
Count the number of logins per day, for user_ids that have do not have a previous login record:
select DATE(`date_time`) as `date`,
count(user_id)
from `logging` l1
where user_id not in (
select user_id from `logging` l2 where l1.user_id = l2.user_id and l2.date_time < l1.date_time)
group by DATE(`date_time`)
I think you need this:
SELECT count(1) ,
DATE(`date_time`)
from my_table
group by DATE(`date_time`)
If you need users which had been logged in day wise
Select
user_id, `date_time` from my_table group by DATE(`date_time`), user_id
I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+
Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.
Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.
It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC
chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly