MySql Sum of Count If - mysql

I have a table with every login by all users.
I want to run a query that will pull the number of times each user logs in but limit it to 4 if the user logged in more than 4 times on a day.
And then do a sum to get the total number of logins.
Further to this I want to pull back the time frame for the total number of logins. So I specify the total number of logins as 100 then the query must pull back the earliest date, going back from today and counting the number of logins (limited at 4 if above 4) per user.
My query so far to get the list of totals limited to 4 per user:
SELECT (case when (count(l.user_id) > 4) then 4 else count(l.user_id) end) as cappedcount
FROM `logins` l
where l.store_id = 908
and l.login_dt > '2018-04-17 00:00:00' and l.login_dt < '2018-04-18 23:59:59'
group by l.user_id order by cappedcount desc
I'm specifying the date range at the moment but don't want to do that in the final query.

If I understand correctly, you only want to look at the last four logins per user and day and ignore their earlier logins. From this set you want the last 100 logins.
So the first task is to get the four last logins per user and day, which would usually be solved with window functions, but MySQL doesn't feature them. So count in a subquery instead (which may take long):
select *
from logins
where
(
select count(*)
from logins later
where later.user_id = logins.user_id
and date(later.login_dt) = date(logins.login_dt)
and later.login_dt > logins.login_dt
) < 4
order by login_dt desc
limit 100;
I suggest to provide the following index for this query:
create index idx_logins on logins (user_id, login_dt);

What is the version of MySQL you user? Because as far as I know with clause is only supported in recent versions of MySQL.
I believe the answer to your first request is something like :
select sum(cntx) from (
select user_id, date(login_time), least(count(*), 4) cntx
from logins
where login_time between '2018-04-10 00:00:00' and '2018-04-17 00:00:00'
group by user_id, date(login_time)
) x
as you can view it in sqlfiddle.com.
For your second question, I have following answer, I believe it's not the best solution, but it works on MySQL 5.6. In next MySQL version (MySQL 8) you can use with clause which provides better solution for this question. I use views in the solution to skip duplicate queries:
create view xlogins as
select user_id, date(login_time) xdt, least(count(*), 4) xcnt
from logins
group by user_id, date(login_time);
create view xxlogins as
select distinct xdt, (select sum(x2.xcnt)
from xlogins x2
where x2.xdt >= x1.xdt) sumx
from xlogins x1;
select min(x1.xdt)
from xxlogins x1
join xxlogins x2 on x1.xdt < x2.xd
where x1.sumx >= 100
and x2.sumx <= 100
Find the solution in this sqlfiddle.com, I've just changed the 100 to 10.

Related

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

create a ranking and statistics with repeated database records

Today I want to get a help in creating scores per user in my database. I have this query:
SELECT
r1.id,
r1.nickname,
r1.fecha,
r1.bestia1,
r1.bestia2,
r1.bestia3,
r1.bestia4
r1.bestia5
FROM
reporte AS r1
INNER JOIN
( SELECT
nickname, MAX(fecha) AS max_date
FROM
reporte
GROUP BY
nickname ) AS latests_reports
ON latests_reports.nickname = r1.nickname
AND latests_reports.max_date = r1.fecha
ORDER BY
r1.fecha DESC
that's from a friend from this site who helped me in get "the last record per user in each day", based on this I am looking how to count the results in a ranking daily, weekly or monthly, in order to use statistics charts or google datastudio, I've tried the next:
select id, nickname, sum(bestia1), sum(bestia2), etc...
But its not giving the complete result which I want. That's why I am looking for help. Additionally I know datastudio filters where I can show many charts but still I can count completely.
for example, one player in the last 30 days reported 265 monsters killed, but when I use in datastudio my query it counts only the latest value (it can be 12). so I want to count correctly in order to use with charts
SQL records filtered with my query:
One general approach for get the total monsters killed by each user on the latest X days and make a score calculation like the one you propose on the commentaries can be like this:
SET #daysOnHistory = X; -- Where X should be an integer positive number (like 10).
SELECT
nickname,
SUM(bestia1) AS total_bestia1_killed,
SUM(bestia2) AS total_bestia2_killed,
SUM(bestia3) AS total_bestia3_killed,
SUM(bestia4) AS total_bestia4_killed,
SUM(bestia5) AS total_bestia5_killed,
SUM(bestia1 + bestia2 + bestia3 + bestia4 + bestia5) AS total_monsters_killed,
SUM(bestia1 + 2 * bestia2 + 3 * bestia3 + 4 * bestia4 + 5 * bestia5) AS total_score
FROM
reporte
WHERE
fecha >= DATE_ADD(DATE(NOW()), INTERVAL -#daysOnHistory DAY)
GROUP BY
nickname
ORDER BY
total_score DESC
Now, if you want the same calculation but only taking into account the days of the current week (assuming a week starts on Monday), you need to replace the previous WHERE clause by next one:
WHERE
fecha >= DATE_ADD(DATE(NOW()), INTERVAL -WEEKDAY(NOW()) DAY)
Even more, if you want all the same, but only taking into account the days of the current month, you need to replace the WHERE clause by:
WHERE
MONTH(fecha) = MONTH(NOW())
For evaluate the statistics on the days of the current year, you need to replace the WHERE clause by:
WHERE
YEAR(fecha) = YEAR(NOW())
And finally, for evaluation on a specific range of days you can use, for example:
WHERE
DATE(fecha) BETWEEN CAST("2018-10-15" AS DATE) AND CAST('2018-11-10' AS DATE)
I hope this guide will help you and clarify your outlook.
This will give you number of monster killed in the last 30 days per user :
SELECT
nickname,
sum(bestia1) as bestia1,
sum(bestia2) as bestia2,
sum(bestia3) as bestia3,
sum(bestia4) as bestia4,
sum(bestia5) as bestia5
FROM
reporte
WHERE fecha >= DATE_ADD(curdate(), interval -30 day)
GROUP BY nickName
ORDER BY

Get amount of active user of the last n days grouped by date

Suppose I have a Hive table logins with the following columns:
user_id | login_timestamp
I'm now interested in getting some activity KPIs. For instance, daily active user:
SELECT
to_date(login_timestamp) as date,
COUNT(DISTINCT user_id) daily_active_user
FROM
logins
GROUP BY to_date(login_timestamp)
ORDER BY date asc
Changing it from daily active to weekly/monthly active is not a great deal because I can just exchange the to_date() function to get the month and then group by that value.
What I now want to get is the distinct amount of user who were active in the last n days (e.g. 3) grouped by date. Additionally, what I'm looking for is a solution that works for a variable time window and not only for one day (getting the amount of active user of the last 3 days on day x only would be easy).
The result is supposed to like somewhat like this:
date, 3d_active_user
2017-12-01, 111
2017-12-02, 234
2017-12-03, 254
2017-12-04, 100
2017-12-05, 103
2017-12-06, 103
2017-12-07, 230
Using a subquery in the first select (e.g. select x, (select max(x) from x) as y from z) building a workaround for the moving time window is not possible because it is not supported by the Hive version I'm using.
I tried my luck something like COUNT(DISTINCT IF(DATEDIFF(today,login_date)<=3,user_id,null)) but everything I tried so far is not working.
Do you have any idea on how to solve this issue?
Any help appreciated!
You can user "BETWEEN" function.
If you want to find the active users, log in from the particular date to till now.
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN startDate_timeStamp AND now()
GROUP BY to_date(login_timestamp)
ORDER BY date asc
If you want the active users, who are log in users for specific date range then:
NOTE:-
SELECT to_date(login_timestamp) as date,COUNT(DISTINCT user_id) daily_active_user
FROM logins
WHERE login_timestamp BETWEEN to_date(startDate_timeStamp) AND to_date(endDate_timeStamp)
GROUP BY to_date(login_timestamp)
ORDER BY date asc

How to use group by, count, and where in MYSQL

I have a table, described like so:
Table1
id (int),
link (varchar512),
text (varchar80),
status (varchar10),
created (timestamp),
updated (timestamp),
user (varchar)
What I need to do is get the total count of rows per user between two timestamps.
So, for example, let's say I want to get the total number of rows for users in the database. That is just a simple
SELECT user, COUNT(*) FROM table_name GROUP BY user;
If I want to get all the rows, for say October, I can do:
SELECT * FROM table_name WHERE created > "2016-10-01 00:00:00" and created < "2016-11-31 23:59:59"
My problem, is I cannot combine the two. I try, and I get syntax errors. I think that I need to run the where query, and then do a count based on that, but I'm not sure how do to that.
Hope this helps.
SELECT user, count(*)
FROM table_name
WHERE created > "2016-10-01 00:00:00" and created < "2016-11-31 23:59:59"
GROUP BY user;
SELECT user, COUNT(*)
FROM table_name
WHERE created >= '2016-10-01'
and created < '2016-12-01'
GROUP BY user;
BTW there is no date 2016-11-31 since November has only 30 days.

MySQL checking if user last logins timestamps are always less than session timestamps

Hopefully, the image/diagram helps explain what I'm trying to do...
Been going round-and-round, nothing seems to work, this is the my most recent attempt:
SELECT * FROM sessions
(
SELECT sessions.timestamp AS stimestamp
users.last_login AS ulastlogin
FROM sessions, users
WHERE sessions.user_id = users.user_id
ORDER BY sessions.timestamp DESC LIMIT 1
)
WHERE ulastlogin < stimestamp;
I'd like to have a SQL query to check to make sure that users' last_login timestamps are always larger (more recent) than the actual user sessions...
Someone at work helped me and I ended-up just using this... Thanks D.B.!!!
I did have to de-dupe in Excel, but other than that, it seemed to do the trick:
SELECT * FROM users, sessions
WHERE sessions.timestamp > unix_timestamp() - 3600*24
AND users.user_id = sessions.user_id
AND users.last_login < sessions.timestamp;
It grabs what I need for the last 24 hours, so I can do sub-selections by date range.