How do I get this mysql query to include null values? - mysql

Im writing a pruning script, to delete content from my site that was uploaded over a week ago, and been accessed 0 or 1 times, also in the last week.
there are 2 tables:
daily_hits - which stores the item id, date, and number of hits that item got on that date.
videos - stores actual content
I came up with this.
$last_week_date = date('Y-m-d',$now-(60*60*24*7));
$last_week_timestamp = $now-(60*60*24*7);
SQL
SELECT
vid_id,
COALESCE(sum(hit_hits),0) as total_hits
FROM videos
LEFT JOIN daily_hits
ON vid_id = hit_itemid
WHERE (hit_date >= '$last_week_date') AND vid_posttime <= '$last_week_timestamp'
GROUP BY hit_itemid
HAVING total_hits < 2
This does output the items that were access once in the last week.... but not the ones that haven't been accessed at all. If an item wasn't accessed at all in that last week, there wont be any entries in the daily_hits table. I figured COALESE should take care of that, but that didnt work.
How can I fix this?

total_hits < 2
This guarantees "null hits" won't show up.
Make a second query that finds the nulls (it will show records from videos that have no corresponding key in daily_hits).
Make a UNION query to present the two datasets as one.

HAVING total_hits < 2 or total_hits is null

Related

MySQL: SUM in WHERE and WHERE last relationship row is more than 2 weeks ago

Frontend dev here trying to get a query working.
A bit of context, we have a site where users can keep track of time and our goal is to get them to 1000 hours of time tracked.
For this we have:
a pretty default users table, with a column track_outdoors (0 or 1, since they can enable or disable it) and a meta column (json field)
A timers table, where each row has a total_time column
What I want to do is select all users who:
Have tracking enabled (track_outdoors = 1),
Do not have MORE than 1000 hours total_time tracked,
Have not received the reminder email (check if meta column has 'ac_outdoors_outdoors_reminder_sent_at')
Where the last time they tracked time was more than 2 weeks ago
I've got the basic part done, which is retrieving the users who have enabled tracking, together with their total time tracked:
SELECT
u.id,
u.firstname,
u.track_outdoors,
SUM(t.total_time) AS total
FROM
users AS u
LEFT JOIN timers AS t ON u.id = t.user_id
WHERE
u.track_outdoors = 1
AND JSON_EXTRACT(u.meta, '$.ac_outdoors_outdoors_reminder_sent_at') IS NULL
GROUP BY
u.id
Now the two parts I'm having trouble with is using the sum to check if the total_time is smaller than 1000 and checking if the last time tracking was more than two weeks ago.
Apparently I cant use the SUM inside of the WHERE statement.
I tried searching on how to do a where last relationship is x time ago, but all I find is how to query records x days ago. (It needs to be the latest record x days ago, if that makes sense).
I think for the SUM in the WHERE statement I might need to use a subquery, but I'm not sure if that's true or how I'm supposed to do that. For the 2 weeks ago check, I understand how to check where the date is two weeks ago but not how to check that for the latest record for the user.
Any help would be much appreciated!
Thanks to the comment/answer provided by #Akina I was able to finish my query.
The result is:
SELECT
u.id,
u.firstname,
u.track_outdoors,
SUM(t.total_time) AS total
FROM
users AS u
LEFT JOIN timers AS t ON u.id = t.user_id
WHERE
u.track_outdoors = 1
AND JSON_EXTRACT(u.meta, '$.ac_outdoors_outdoors_reminder_sent_at') IS NULL
GROUP BY
u.id
HAVING total < 1000 AND MAX( t.created_at ) < CURRENT_DATE - INTERVAL 2 WEEK
So I needed to use HAVING for checking the total and MAX to check for the date of the tracker to be more than two weeks ago.

group some records and Ignore some differences in grouping records in mysql (PROBLEM IN GROUPING)

I want to group my records by the time they are modified. (time means the minute)
A screenshot from my table:
And the code is:
SELECT GROUP_CONCAT(text) as final_text
FROM main WHERE HSA = 'YES'
GROUP BY SUBSTRING(timestamps,1,16);
the result is:
you can see that 4 of my records are modified at 12:23 , 4 of them at 12:35 and one of them at 12:36.
But, I want mysql to group all records together if they are less than two minutes apart
(ignore differences less than 2 minutes!)
Please help me how to add my last record to the second group

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.
You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';
The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

Stop query from skipping over null values

I have a query that shows me the number of calls per day for the last 14 days within my app.
The query:
SELECT count(id) as count, DATE(FROM_UNIXTIME(timestamp)) as date FROM calls GROUP BY DATE(FROM_UNIXTIME(timestamp)) DESC LIMIT 14
On days where there were 0 calls, this query does not show those days. Rather than skip those days, I'd like to have a 0 or NULL in that spot.
Any ideas for how I can achieve this? If you have any questions as to what I'm asking please let me know.
Thanks
I don't believe your query is "skipping over NULL values", as your title suggests. Rather, your data probably looks something like this:
id | timestamp
----+------------
1 | 2014-01-01
2 | 2014-01-02
3 | 2014-01-04
As a result, there are no rows that contain the missing date, so there are no rows to be counted. The answer is that you need to generate a list of all the dates you want and then do a LEFT or RIGHT JOIN to it.
Unfortunately, MySQL doesn't make this as easy as other databases. There doesn't seem to be an effective way of generating a list of anything inline. So you'll need some sort of table.
I think I would create a static table containing a set of integers to be subtracted from the current date. Then you can use this table to generate your list of dates inline and JOIN to it.
CREATE TABLE days_ago_list (days_ago INTEGER);
INSERT INTO days_ago_list VALUES
(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13)
;
Then:
SELECT COUNT(id), list_date
FROM (SELECT SUBDATE(CURDATE(), days_ago) AS list_date FROM days_ago_list) dates_to_list
LEFT JOIN (SELECT id, DATE(FROM_UNIXTIME(timestamp)) call_date FROM calls) calls_with_date
ON calls_with_date.call_date = dates_to_list.list_date
GROUP BY list_date
It is very important that you group by list_date; call_date will be NULL for any days without calls. It is also important to COUNT on id since NULL ids will not be counted. (That ensures you get a correct count of 0 for days with no calls.) If you need to change the dates listed, you simply update the table containing the integer list.
Here is a SQL Fiddle demonstrating this.
Alternatively, if this is for a web application, you could generate the list of dates code side and match up the counts with the dates after the query is done. This would make your web app logic somewhat more complicated, but it would also simplify the query and eliminate the need for the extra table.
create a table that contains a row for each date you want to ensure is in the results, left outer join with results of your current query, use temp table's date, count of above query and 0 if that count is null

sum up multilple datediffs of datetimes in mysql

I have a table with one user and one day's worth of punches (clockin, breakout, breakin, clockout). Now say the user takes 2 or more breaks. I need to sum up the total time of all breaks taken. I have created a sqlfiddle to make it easier to show what I am trying to do. Here is my example: http://sqlfiddle.com/#!2/21542/6 Now I need to take (12:30:21 - 12:04:44) + (12:36:00 - 12:34:00) to get the total of all breaks taken. How can I do that in my query. Now pretend I have 10 users and 10 days in my table. I would need to group by day and user I know.
I would start by finding some way to link the punch-out records with the punch-in records from the same table. We can then put this data into a table and use it for querying against.
CREATE TEMPOARY TABLE breakPunchInOut (
SELECT
DATE(punchout.PunchDateTime) AS ShiftDate,
punchout.EmpId,
punchout.PunchId AS PunchOutID,
(SELECT
PunchId
FROM
timeclock
WHERE
timeclock.EmpId = punchout.EmpId
AND
timeclock.`In-Out` = 1
AND
timeclock.PunchDateTime > punchout.PunchDateTime
AND
DATE(timeclock.PunchDateTime) = DATE(punchout.PunchDateTime)
ORDER BY
timeclock.PunchDateTime ASC
LIMIT 1
) AS PunchInID
FROM
timeclock AS punchout
WHERE
punchout.`In-Out` = 0
HAVING
PunchInID IS NOT NULL
);
The way this query works is looking for all the "punch-outs" in a specific day, for each of these it then looks for the next "punch-in" which happened on the same day, by the same person. The HAVING clause filters out records where there is no punch-in after a punch-out - so maybe where the employee goes home for the day. This is something to remember because if someone goes home halfway through a shift then their break time will not be added to the total.
It's important to point out that this approach will only work for shifts which start and end on the same day. If you have a night shift which starts in the evening and finishes in the morning the next day, then you'll have to alter the way that you join the punch outs and punch ins together.
Now that we have this linking table, its relatively simple to use it to create a summary report for each employee and each shift:
SELECT
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId,
SUM(
TIMESTAMPDIFF(MINUTE, punchOut.PunchDateTime, punchIn.PunchDateTime)
) AS TotalBreakLengthMins
FROM
breakPunchInOut
INNER JOIN
timeclock AS punchOut
ON
punchOut.PunchId = breakPunchInOut.PunchOutId
INNER JOIN
timeclock AS punchIn
ON
punchIn.PunchId = breakPunchInOut.PunchInId
GROUP BY
breakPunchInOut.ShiftDate,
breakPunchInOut.EmpId
;
Notice we use the TIMESTAMPDIFF function, not the DATEDIFF. DATEDIFF only calculates the number of days between two dates - it's not used for time.