Retrieve running-total record growth over time in mysql - mysql

I have a Drupal site which has a table that keeps track of users. What I want to do is graph membership growth over time. So I want to massage mysql into returning something like this:
date | # of users (total who have registered up to the given date)
1/1/2014 | 0
1/2/2014 | 2
1/3/2014 | 10
Where '# of users' is the total number of users that have registered accounts up to the given date (running-total)--NOT the number of users who registered on that particular day (which is trivial to retrieve).
Each row of my {users} table has a uid column, a name column, and a created (timestamp) column.
So a sample record from my {users} table would be:
name: John Smith
uid: 526
created: 1365844220

Try:
select u.created, count(*)
from (select distinct date(created) created from `users`) u
join `users` u2 on u.created >= date(u2.created)
group by u.created
SQLFiddle here.

I ended up using a solution that incorporates variables, based on a Stack Overflow answer posted here. This solution appears to be a bit more flexible and efficient than other answers provided.
SELECT u.date,
#running_total := #running_total + u.count AS count
FROM (
SELECT COUNT(*) AS count, DATE_FORMAT(FROM_UNIXTIME(created), '%b %d %Y') AS date
FROM {users}
WHERE created >= :start_time AND created <= :end_time
GROUP BY YEAR(FROM_UNIXTIME(created)), MONTH(FROM_UNIXTIME(created)), DAY(FROM_UNIXTIME(created))
) u
JOIN (
SELECT #running_total := u2.starting_total
FROM (
SELECT COUNT(*) as starting_total
FROM {users}
WHERE created < :start_time
) u2
) initialize;
Note that the group by, date formatting, and range requirements are simply specifics of my particular project. A more generic form of this solution (as per the original question) would be:
SELECT u.date,
#running_total := #running_total + u.count AS count
FROM (
SELECT COUNT(*) AS count, DATE(FROM_UNIXTIME(created)) AS date
FROM {users}
GROUP BY date
) u
JOIN (
SELECT #running_total := 0
) initialize;

Don't know the table structure so adjust the query to you needs
SELECT DATE(created), COUNT(*) AS Users FROM users GROUP BY DATE(created)
When you only want to show the dates having registerd users add
HAVING COUNT(*) > 0
At the and of the query

Related

How to query GHTorrent's (SQL-like language) for most common languages per country

Based on this question How to query GHTorrent's (SQL-like language) for country/city/users number/repositories number? and first query here https://ghtorrent.org/gcloud.html, I am trying to get an sql query to get the most common coding language per country and ideally per month/year from the GHtorrent bigquery database. I have tried to edit this answer code https://stackoverflow.com/a/65460166/10624798/, but fail to get the correct join. My ideal outcome would looks something like this
country
Year
Month
Language
Number of commits
total_bytes
US
2016
Jan
Python
10000
46789390
CH
2016
Jan
Java
20000
5679304
Basically, I am not very good at creating SQL queries.
I checked the two examples of the query that you passed, then I found the common value that was the project_id and I modified the second example to bring the project_id and the created_date of the commits. Then I decided as you mention to format the created_date to bring the year and the month and to add it as a filter.
Then I join the two examples in a CTE and I only SELECT the names of the columns that are needed.
Finally I used a ROW_NUMBER only to bring the maximum value of the processed bytes of every language by country/year/month.
WITH ltb as(
select pl3.lang, sum(pl3.size) as total_bytes, pl3.project_id
from (
select pl2.bytes as size, pl2.language as lang, pl2.project_id
from (
select pl.language as lang, max(pl.created_at) as latest, pl.project_id as project_id
from `ghtorrent-bq.ght.project_languages` pl
join `ghtorrent-bq.ght.projects` p on p.id = pl.project_id
where p.deleted is false
and p.forked_from is null
group by lang, project_id
) pl1 join `ghtorrent-bq.ght.project_languages` pl2 on pl1.project_id = pl2.project_id
and pl1.latest = pl2.created_at
and pl1.lang = pl2.language
) pl3
group by pl3.lang, pl3.project_id
order by total_bytes desc
), fprt as(
SELECT country_code, count(*) AS NoOfCommits, c.project_id,
FORMAT_TIMESTAMP("%m", c.created_at)
AS formattedmonth,FORMAT_TIMESTAMP("%b", c.created_at)
AS formattedmonthname, FORMAT_TIMESTAMP("%Y", c.created_at)
AS formattedyear,
FROM `ghtorrent-bq.ght.commits` AS c
JOIN `ghtorrent-bq.ght.users` AS u
ON c.Committer_Id = u.id
WHERE NOT u.fake and country_code is not null
GROUP BY country_code, c.project_id, formattedmonth, formattedyear, formattedmonthname
ORDER BY NoOfCommits DESC
), almst as(
SELECT country_code,formattedmonth, formattedmonthname, formattedyear, lang, NoOfCommits, total_bytes FROM fprt JOIN ltb
on ltb.project_id=fprt.project_id
where country_code is not null
)
SELECT country_code, formattedyear as year, formattedmonthname as month, lang, NoOfCommits, total_bytes
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY country_code, formattedyear, formattedmonth ORDER BY total_bytes DESC) rn
FROM almst
) t
WHERE rn = 1
ORDER BY formattedyear asc, formattedmonth asc
Output:

Who to the number of users who have had one transaction per day?

Here is my query:
select count(1) from
(select count(1) num, user_id from pos_transactions pt
where date(created_at) <= '2020-6-21'
group by user_id
having num = 1) x
It gives me the number of users who have had 1 transaction until 2020-6-21. Now I want to group it also per date(created_at). I mean, I want to get a list of dates (such as 2020-6-21, 2020-6-22 etc ..) plus the number of users who have had 1 transaction in that date (day).
Any idea how can I do that?
EDIT: The result of query above is correct, the issue is, it's manually now. I mean, I have to increase 2020-6-21 by hand. I want to make it automatically. In other words, I want a list of all dates (from 2020-6-21 til now) contains the number of users who have had 1 transaction until that date.
If you want the number of users who had one transaction on each day, then you need to aggregate by the date as well:
select dte, count(*)
from (select date(created_at) as dte, user_id
from pos_transactions pt
where date(created_at) <= '2020-6-21'
group by dte, user_id
having count(*) = 1
) du
group by dte;

How to write a query to return daily returning users?

I have a very simple table phone_calls with these columns:
id, timestamp, phone_number
How to write a query which returns a daily count of returning users? It should have two columns:
Date, Number of Returning Users
Returning user:
A returning user for any given day D is the one, who has called at least once before D.
A user who has called multiple times on D, but hasn't called before D won't be counted as a returning user.
UPDATE
So here is what I have tried:
SELECT DATE(timestamp) AS date, COUNT(DISTINCT phone_number) AS user_count
FROM phone_calls
WHERE phone_number IN (SELECT phone_number FROM phone_calls GROUP BY phone_number HAVING COUNT(consumer_id) > 1)
GROUP BY DATE(timestamp)
But it's not a correct solution, because it doesn't comply with definition of Returning User mentioned above.
What I am struggling with?
For any given date, how do I filter out those phone numbers from the count, who never dialed in before that day?
SELECT
DATE(timestamp) AS date,
COUNT(DISTINCT phone_number) AS user_count
FROM
phone_calls pc
WHERE EXISTS (
SELECT *
FROM phone_calls pc1
WHERE
pc1.phone_number = pc.phone_number AND
DATE(pc1.timestamp) < DATE(pc.timestamp)
)
GROUP BY DATE(pc.timestamp)
Updated try this query
Select date(pc.timestamp) AS RDate ,count(*)
from phone_calls pc INNER JOIN phone_calls pcc
on pcc.phone_number=pc.phone_number
and date(DATE_ADD(pcc.timestamp, INTERVAL -1 DAY))= DATE (pc.timestamp) group by DATE (pc.timestamp);

How to wirte an extensible SQL to find the users who continuously login for n days

If I have a table(Oracle or MySQL), which stores the date user logins.
So how can I write a SQL(or something else) to find the users who have continuously login for n days.
For example:
userID | logindate
1000 2014-01-10
1000 2014-01-11
1000 2014-02-01
1000 2014-02-02
1001 2014-02-01
1001 2014-02-02
1001 2014-02-03
1001 2014-02-04
1001 2014-02-05
1002 2014-02-01
1002 2014-02-03
1002 2014-02-05
.....
We can see that user 1000 has continually logined for two days in 2014, and user 1001 has continually logined for 5 days. and user 1002 never continuously logins.
The SQL should be extensible , which means I can pick every number of n, and modify a little or pass a new parameter, and the results is as expected.
Thank you!
As we don't know what dbms you are using (you named both MySQL and Oracle), here are are two solutions, both doing the same: Order the rows and subtract rownumber days from the login date (so if the 6th record is 2014-02-12 and the 7th is 2014-02-13 they both result in 2014-02-06). So we group by user and that groupday and count the days. Then we group by user to find the longest series.
Here is a solution for a dbms with analytic window functions (e.g. Oracle):
select userid, max(days)
from
(
select userid, groupday, count(*) as days
from
(
select
userid, logindate - row_number() over (partition by userid order by logindate) as groupday
from mytable
)
group by userid, groupday
)
group by userid
--having max(days) >= 3
And here is a MySQL query (untested, because I don't have MySQL available):
select
userid, max(days)
from
(
select
userid, date_add(logindate, interval -row_number day) as groupday, count(*) as days
from
(
select
userid, logindate,
#row_num := #row_num + 1 as row_number
from mytable
cross join (select #row_num := 0) r
order by userid, logindate
)
group by userid, groupday
)
group by userid
-- having max(days) >= 3
I think the following query will give you a very extensible parametrization:
select z.userid, count(*) continuous_login_days
from
(
with max_dates as
( -- Get max date for every user ID
select t.userid, max(t.logindate) max_date
from test t
group by t.userid
),
ranks as
( -- Get ranks for login dates per user
select t.*,
row_number() over
(partition by t.userid order by t.logindate desc) rnk
from test t
)
-- So here, we select continuous days by checking if rank inside group
-- (per user ID) matches login date compared to max date
select r.userid, r.logindate, r.rnk, m.max_date
from ranks r, max_dates m
where m.userid = r.userid
and r.logindate + r.rnk - 1 = m.max_date -- here is the key
) z
-- Then we only group by user ID to get the number of continuous days
group by z.userid
;
Here is the result:
USERID CONTINUOUS_LOGIN_DAYS
1 1000 2
2 1001 5
3 1002 1
So you can just choose by querying field CONTINUOUS_LOGIN_DAYS.
EDIT : If you want to choose from all ranges (not only the last one), my query structure no longer works because it relied on the last range. But here is a workaround:
with w as
( -- Parameter
select 2 nb_cont_days from dual
)
select *
from
(
select t.*,
-- Get number of days around
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days + 1
and t.logindate) m1,
-- Get also number of days more in the past, and in the future
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days
and t.logindate + 1) m2,
w.nb_cont_days
from w, test t
) x
-- If these 2 fields match, then we have what we want
where x.m1 = x.nb_cont_days
and x.m2 = x.nb_cont_days
order by 1, 2
You just have to change the parameter in the WITH clause, so you can even create a function from this query to call it with this parameter.
SELECT userID,count(userID) as numOfDays FROM LOGINTABLE WHERE logindate between '2014-01-01' AND '2014-02-28'
GROUP BY userID
In this case you can check the login days per user, in a specific period

How to get Cumulative count since month begin in mysql

ID int(11) (NULL) NO PRI (NULL)
CREATED_DATE datetime (NULL) YES (NULL)
As mentioned above is some of field of my table 'User'.I want number of total user and cumulative count grouped on date.I used below query in mysql.
SELECT q1.CREATED_DATE,q1.NO_OF_USER, (#runtot := #runtot + q1.NO_OF_USER) AS CUMM_REGISTRATION FROM (SELECT date(CREATED_DATE) AS CREATED_DATE,
COUNT(ID) AS NO_OF_USER FROM USER,(SELECT #runtot:=0) AS n GROUP BY CREATED_DATE ORDER BY CREATED_DATE) AS q1
Which is working fine.Now I want one more additional data which will be 'CUMULATIVE USER COUNT SINCE 1 AUGUST '.Is it possible to fetch this modifying above query or its better to handle in code?Please suggest.
You can do this by adding another variable and doing it in the code:
SELECT q1.CREATED_DATE, q1.NO_OF_USER,
(#runtot := #runtot + q1.NO_OF_USER) AS CUMM_REGISTRATION,
#Aug1tot := if(CREATED_DATE >=date('2013-08-01'), #Aug1tot + q1.NO_OF_USER, NULL) as CUMM_SINCE_Aug1
FROM (SELECT date(CREATED_DATE) AS CREATED_DATE,
COUNT(ID) AS NO_OF_USER
FROM USER cross join
(SELECT #runtot:=0, #Aug1tot := 0) n
GROUP BY date(CREATED_DATE)
ORDER BY date(CREATED_DATE);
) AS q1
I guess you have some function called MONTH(yourDate) in mySQL, where MONTH(15-Aug-2013) will return 8.
You could either group your results per MONTH(yourDate), or filter your original data for MONTH(yourDate) = 8. Be careful if your data runs along multiple years, as all dates where month = 8 will be cumulated. You could then add a sorting / filtering critera based on YEAR(yourDate)