Select distinct but group them by 24 hours? - mysql

Alright so I'm creating articles. Everytime an user opens the article, their IP, article id and date gets saved into MySQL.
$date = time();
$sql = "INSERT INTO views(ip,article,date) VALUES(?,?,?)";
$stmt = $mysql->prepare($sql);
$stmt->bind_param('sss', $_SERVER['REMOTE_ADDR'],$_GET['id'],$date);
$stmt->execute();
Now, I want to get unique views and I know how to do this with DISTINCT. But this isn't entirely what I want to do. It will get unique views of all time, but I want each user's view to count for every 24 hours.
Say I'm person A and there's person B. We both view the article. Total unique views will be 2. Tomorrow I view the article again, it'll still show 2 views, but I want it to show 3 views because it has been 24 hours.
I can do this by checking if they have viewed the article therefore not adding their view count to the database, but I don't want to do this. I need an alternative way.
Thank you.

You need a GROUP BY with COUNT, but if the date is of data type datetime, then you need to group by the date part only, like this:
SELECT CAST(date AS DATE) AS Date, article, COUNT(DISTINCT ip) AS TotalViews
FROM views
WHERE article = $articleId
AND DATE>= now() - INTERVAL 1 DAY
GROUP BY CAST(date AS DATE), article
LIMIT 0, 25;

I want each user's view to count for every 24 hours.
Just use two arguments to count(distinct):
select article, count(distinct ip, date(date))
from v
group by article;
If you want a separate result for each day, then put the date logic in the group by:
select article, date(date), count(distinct ip)
from v
group by article, date(date);
However, that doesn't seem to be the question you are asking.

Related

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

Get amount of active user of the last n days grouped by date

Suppose I have a Hive table logins with the following columns:
user_id | login_timestamp
I'm now interested in getting some activity KPIs. For instance, daily active user:
SELECT
to_date(login_timestamp) as date,
COUNT(DISTINCT user_id) daily_active_user
FROM
logins
GROUP BY to_date(login_timestamp)
ORDER BY date asc
Changing it from daily active to weekly/monthly active is not a great deal because I can just exchange the to_date() function to get the month and then group by that value.
What I now want to get is the distinct amount of user who were active in the last n days (e.g. 3) grouped by date. Additionally, what I'm looking for is a solution that works for a variable time window and not only for one day (getting the amount of active user of the last 3 days on day x only would be easy).
The result is supposed to like somewhat like this:
date, 3d_active_user
2017-12-01, 111
2017-12-02, 234
2017-12-03, 254
2017-12-04, 100
2017-12-05, 103
2017-12-06, 103
2017-12-07, 230
Using a subquery in the first select (e.g. select x, (select max(x) from x) as y from z) building a workaround for the moving time window is not possible because it is not supported by the Hive version I'm using.
I tried my luck something like COUNT(DISTINCT IF(DATEDIFF(today,login_date)<=3,user_id,null)) but everything I tried so far is not working.
Do you have any idea on how to solve this issue?
Any help appreciated!
You can user "BETWEEN" function.
If you want to find the active users, log in from the particular date to till now.
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN startDate_timeStamp AND now()
GROUP BY to_date(login_timestamp)
ORDER BY date asc
If you want the active users, who are log in users for specific date range then:
NOTE:-
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN to_date(startDate_timeStamp) AND to_date(endDate_timeStamp)
GROUP BY to_date(login_timestamp)
ORDER BY date asc

Trying to figure out SQL query for monthly user churn based on an activity threshold

I have a table (we're on InfoBright columnar storage and I use MySQL Workbench as my interface) that essentially tracks users and a count of activities with a datestamp. It's a daily aggregate table. Schema is essentially
userid (int)
activity_count (int)
date (date)
What I'm trying to find is how many of my users are churning from month to month, with a basis of an active user defined as one with a monthly activity count that sums up to > 10
To find how many users are active in a given month I am currently using
select year, month, count(distinct user) as users
from
(
select YEAR(date) as year, MONTH(date) as month, userid as user, sum(activity_count) as activity
from table
group by YEAR(date), MONTH(date), userid
having activity > 10
order by YEAR(date), MONTH(date)
) t1
group by year, month
Not being a SQL expert, I am sure this can be improved and would appreciate the input on that.
My bigger goal though is to figure out from month to month, how many of the users who are in this count are new or repeat from the previous month. I don't know how to do that without what feels like ugly nesting or joining, and I feel like it should be fairly simple.
Thanks in advance.
I think that further nesting is the best way to achieve this. I would look to do something like selecting the user for the min concatenated Year & Month as a middle layer to the above (i.e. between outer and inner queries) so that you can establish the first month that the user became active. You can then add a where clause to the outer query to filter so that only the months you require are showing. Let me know if you need help with the syntax.

MySQL group by calculated field

I have members that have to pay for their membership. And I store: payment date, and membership length (they can pay for 1 month or several).
Now I'd like to know which payments are overdue, or soon to be.
My logic was: get an expiration date for each membership (last payment date + membership length) and then just look at the highest value of that for each member.
Here's my query, but I did want to explain my reasoning, as you may want to question that or even the format of the DB (but please don't ask me to store the expiration date).
SELECT tbl.company AS company,
MAX(ADDDATE( paydate, INTERVAL paylength MONTH ))) AS expiration,
tbl.id
FROM tblPayments
JOIN tbl ON tblPayments.comp_id = tbl.id
GROUP BY expiration
ORDER BY expiration ASC
I've read that grouping by calculated fields may not be possible, but my knowledge of MySQL is not strong enough to understand the workarounds. I'd appreciate any help you can provide! Thanks!
You are grouping by the aggregated result of your group which is not possible and not what you intended. Based on your explanation I would think you are trying to do this
SELECT
tbl.company AS company,
MAX(ADDDATE( paydate, INTERVAL paylength MONTH )) AS expiration,
tbl.id
FROM tblPayments
JOIN tbl
ON tblPayments.comp_id = tbl.id
GROUP BY tbl.company , tbl.id
HAVING expiration < NOW()
ORDER BY expiration ASC
And #dleiftah brings up a good point, you cannot always use the alias although MySQL seems to let you in a GROUP BY whereas MSSQL never does. I forget exactly when...

Group results by period

I have some data which I want to retrieve, but I want to have it grouped by a specific number of seconds. For example if my table looks like this:
| id | user | pass | created |
The created column is INT and holds a timestamp (number of seconds from 1970).
I would want the number of users that are created between last month and the current date, but show them grouped by let's say 7*24*3600 (a week). So if in the range there are 1000 new users, have them show up how many registered each week (100 the first week, 450 the second, 50 the third and 400 the 4th week -- something like this).
I've tried grouping the results by created / 7*24*3600, but that's not working.
How should my query look like?
You need to use integer division div otherwise the result will turn into a real and none of the weeks will resolve to the same value.
SELECT
(created div (7*24*60*60)) as weeknumber
, count(*) as NewUserCount
FROM users
WHERE weeknumber > 1
GROUP BY weeknumber
See: http://dev.mysql.com/doc/refman/5.0/en/arithmetic-functions.html
You've got to keep the integer part only of that division. You can do it with the floor() function.
Have you tried select floor(created/604800) as week_no, count(*) from users group by floor(created/604800) ?
I assume you've got the "select users created in the last month" part sorted out.
Okay here are the possible options you may try:
GROUP BY DAY
select count(*), DATE_FORMAT(created_at,"%Y-%m-%d") as created_day FROM widgets GROUP BY created_day
GROUP BY MONTH
select count(*), DATE_FORMAT(created_at,"%Y-%m") as created_month FROM widgets GROUP BY created_month
GROUP BY YEAR
select count(*), DATE_FORMAT(created_at,"%Y") as created_year FROM widgets GROUP BY created_year