I am trying to generate a daily sales reports for a particular user based on this tutorial Using MySQL to generate daily sales reports with filled gaps.
To do this, I have three tables, records table, user table and calendar table
records user calendar
id id datefield
user_id
timestamp
The query below returns 0 as total and NULL as the user_id when data is not available for a particular day which is great:
SELECT calendar.datefield AS DATE,
IFNULL(COUNT(records.id),0) AS total, records.user_id
FROM records
RIGHT JOIN calendar ON (DATE(records.timestamp) = calendar.datefield)
WHERE (
calendar.datefield BETWEEN (NOW() - INTERVAL 14 DAY) AND NOW()
)
GROUP BY DATE DESC
The idea is to generate this report for a particular user so I modified the above query to what follows:
SELECT calendar.datefield AS DATE,
IFNULL(COUNT(records.id),0) AS total, records.user_id
FROM records
RIGHT JOIN calendar ON (DATE(records.timestamp) = calendar.datefield)
WHERE (
calendar.datefield BETWEEN (NOW() - INTERVAL 14 DAY) AND NOW()
AND records.user_id = SOME_EXISTING_USER_ID
)
GROUP BY DATE DESC
This return an empty result when there is no record but the idea is to return 0 for any particular day which does not have data.
How can I modify the first query to work for a particular user?
Wow. Been a while since I've seen a RIGHT JOIN in the wild! Anyway, try adding the user predicate from the WHERE clause into the RIGHT JOIN like this:
SELECT calendar.datefield AS DATE,
IFNULL(COUNT(records.id),0) AS total, records.user_id
FROM records
RIGHT JOIN calendar ON (DATE(records.timestamp) = calendar.datefield)
AND records.user_id = SOME_EXISTING_USER_ID
WHERE (
calendar.datefield BETWEEN (NOW() - INTERVAL 14 DAY) AND NOW()
)
GROUP BY DATE DESC;
For me this is one of the great benefits of explicit joins vs implicit joins...
Related
I have a Users table (id, name, created_at) and a Transaction table(id, user_id, created_at, amount).
For each month, I would like to know the number of users who did not have any transaction in the 3 months interval before that month.
For example, for April 2022, the query would return number of users who did not have a transaction in January 2022, February 2022 and March 2022. And so on for every month.
Can I do this with a single MySQL query, and without PHP loop?
If I wanted it for April 2022 only, then I guess this would do the trick:
SELECT count(distinct(users.id)) FROM users
INNER JOIN transactions
on users.id = transactions.user_id
WHERE transactions.user_id NOT IN
(SELECT user_id FROM transactions WHERE created_at > "2022-01-01" AND created_at < "2022-04-01" );
How to get it for all months?
In a normal situation, you would have a calendar table that, for examples, stores all starts of months over a wide period of time, like calendar(start_of_month).
From there on, you can cross join the calendar with the users table to generate all possible combinations of months and customers (with respect to the user's creation time). The last step is to check each user/month tuple for transations in the last 3 months.
select c.start_of_month, count(*) as cnt_inactive_users
from calendar c
cross join users u
where not exists (
select 1
from transactions t
where t.user_id = u.id
and t.created_at >= c.start_of_month - interval 3 month
and t.created_at < c.start_of_month
)
where c.start_of_month >= '2021-01-01' and c.start_of_month < '2022-01-01'
group by c.start_of_month
order by c.start_of_month
This gives you one row per month that has at least one "inactive" customers,with the corresponding count.
You control the range of months over which the query applies with the where clause to the query (as an example, the above gives you all year 2021).
SELECT count(*)
FROM users
WHERE NOT EXISTS (
SELECT NULL
FROM transactions
WHERE users.id = transactions.user_id
AND created_at > '2022-01-01' AND created_at < '2022-04-01'
);
I have two tables: Processes and Validations; p and v respectively.
For each process there are many validations.
The aim is to:
Retrieve the latest validation for each process.
Generate a
dynamic date (Due_Date) as to when the next validation is due (being 365 days
after the latest validation date).
Filter the results to any due
dates that fall in the current month.
In short terms; I want to see what processes are due to be validated in the current month.
I'm 99% there with the query code. Having read through some posts on here I'm fairly certain I'm on the right track. My problem is that my query still returns all of the results for each process, instead of the top 1.
FYI: The processes table uses "Process_ID" as a primary key; whereas the Validations Table uses "Validation_Process_ID" as a foreign key.
Code at present :
Select p.Process_ID,
p.Process_Name,
v.Validation_Date,
Date_Add(v.Validation_Date, Interval 365 Day) as Due_Date
From processes_active p
left JOIN processes_validations v
on p.Process_ID = (select v.validation_process_id
from processes_validations
order by validation_date desc
limit 1)
Having Month(Due_Date) = Month(Now()) and Year(Due_Date) = Year(Now())
Any help would be thoroughly appreciated! I'm probably pretty close just can't sort that final section!
Thanks
Your actual query is wrong, the subquery will return the very latest record in your validation table, instead of returning the latest per process id.
You should decompose to get what you need.
1) compute the latest validation for each process in the validation table:
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
2) For each process in the process table, get the latest validation, and compute the next validation date (use interval 1 YEAR and not 365 DAY... think leap years)
SELECT p.Process_ID, p.Process_Name, v.maxdate,
Date_Add(v.maxdate, Interval 1 year) as Due_Date
FROM processes_active p
LEFT JOIN
(
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
)
ON p.Process_ID = v.validation_process_id
3) Filter to keep only the due_date this month. This can be done with a WHERE on query 2, I just make a nested query for your understanding
SELECT * FROM
(
SELECT p.Process_ID, p.Process_Name, v.maxdate,
Date_Add(v.maxdate, Interval 1 year) as Due_Date
FROM processes_active p
LEFT JOIN
(
SELECT validation_process_id, MAX(validation_date) AS maxdate
FROM processes_validations
GROUP BY validation_process_id
)
ON p.Process_ID = v.validation_process_id
) T
WHERE Month(Due_Date) = Month(Now()) and Year(Due_Date) = Year(Now())
I have a table, activity that looks like the following:
date | user_id |
Thousands of users and multiple dates and activity for all of them. I want to pull a query that will, for every day in the result, give me the total active users in the last 30 days. The query I have now looks like the following:
select date, count(distinct user_id) from activity where date > date_sub(date, interval 30 day) group by date
This gives me total unique users on only that day; I can't get it to give me the last 30 for each date. Help is appreciated.
To do this you need a list of the dates and join that against the activities.
As such this should do it. A sub query to get the list of dates and then a count of user_id (or you could use COUNT(*) as I presume user_id cannot be null):-
SELECT date, COUNT(user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
However if there can be multiple records for a user_id on any particular date but you only want the count of unique user_ids on a date you need to count DISTINCT user_id (although note that if a user id occurs on 2 different dates within the 30 day date range they will only be counted once):-
SELECT activity.date, COUNT(DISTINCT user_id)
FROM
(
SELECT DISTINCT date, DATE_ADD(b.date, INTERVAL -30 DAY) AS date_minus_30
FROM activity
) date_ranges
INNER JOIN activity
ON activity.date BETWEEN date_ranges.date_minus_30 AND date_ranges.date
GROUP BY date
A bit cruder would be to just join the activity table against itself based on the date range and use COUNT(DISTINCT ...) to just eliminate the duplicates:-
SELECT a.date, COUNT(DISTINCT a.user_id)
FROM activity a
INNER JOIN activity b
ON a.date BETWEEN DATE_ADD(b.date, INTERVAL -30 DAY) AND b.date
GROUP by a.date
I've been at this for a few hours now to no avail, pulling my hair out.
Edit: Im wanting to calculate the difference between the overall_exp column by using the same data from 1 day ago to calculate the greatest 'gain' for each user
Currently I'm take a row, then select a row from 1 day ago based on the first rows timestamp then subtract the overall_exp column from the 2 rows and order by that result whilst grouping by user_id
SQL Fiddle: http://sqlfiddle.com/#!2/501c8
Here is what i currently have, however the logic is completely wrong so im pulling 0 results
SELECT rsn, ts.timestamp, #original_ts := SUBDATE( ts.timestamp, INTERVAL 1 DAY), ts.overall_exp, ts.overall_exp - previous.overall_exp AS gained_exp
FROM tracker AS ts
INNER JOIN (
SELECT user_id, MIN( TIMESTAMP ) , overall_exp
FROM tracker
WHERE TIMESTAMP >= #original_ts
GROUP BY user_id
) previous
ON ts.user_id = previous.user_id
JOIN users
ON ts.user_id = users.id
GROUP BY ts.user_id
ORDER BY gained_exp DESC
You can do this with a self-join:
select t.user_id, max(t.overall_exp - tprev.overall_exp)
from tracker t join
tracker tprev
on tprev.user_id = t.user_id and
date(tprev.timestamp) = date(SUBDATE(t.timestamp, INTERVAL 1 DAY))
group by t.user_id
A key here is converting the timestamps to dates, so the comparison is exact.
Try:
select u.*, max(t.`timestamp`)-min(t.`timestamp`) gain
from users u
left join tracker t
on u.id = t.user_id and
t.`timestamp` >= date_sub(date(now()), interval 1 day) and
t.`timestamp` < date_add(date(now()), interval 1 day)
group by u.id
order by gain desc
SQLFiddle here.
I have a table structure that looks like this:
I have a perfectly working query that counts how many records there have been per day the last 30 days. It looks likes this:
SELECT DATE(timestamp) AS date, COUNT(id) AS emails FROM 'emails WHERE timestamp >= now() - interval 1 month GROUP BY DATE(timestamp)
This outputs the following which is perfectly fine:
However, the next thing seems too difficult for me to imagine. Now I want to count how many records there have been per day the last 30 days BUT only where newsletter = 1.
I've tried to put a WHERE statement looking like this:
SELECT DATE(timestamp) AS date, COUNT(*) AS emails, nyhedsbrev FROM emails WHERE timestamp >= now() - interval 1 month AND nyhedsbrev = 1 GROUP BY DATE(timestamp)
... And that outputs the following:
The problem is, that its omitting the records with newsletter = 0 and there by I cant compare my first query against the new one, as the dates doesnt match. I know that is because I use WHERE newsletter = 1.
In stead of omitting the record I want a query that just puts a "0" from that date. How can I do this? The final query should be outputting this:
You should be able to simply use SUM() and IF() to get the desired output:
SELECT
DATE(timestamp) AS date,
COUNT(*) AS emails,
SUM(IF(nyhedsbrev > 0, 1, 0)) as nyhedsbrev_count
FROM
emails
WHERE
timestamp >= now() - interval 1 month
GROUP BY
DATE(timestamp)
SQLFiddle DEMO
Edit: You might even be able to simplify it, since it's a boolean, and simply use SUM(nyhedsbrev), but this REQUIRES that nyhedsbrev is only 0 or 1:
SELECT
DATE(timestamp) AS date,
COUNT(*) AS emails,
SUM(nyhedsbrev) as nyhedsbrev_count
FROM
emails
WHERE
timestamp >= now() - interval 1 month
GROUP BY
DATE(timestamp)
Possibly best to get a list of the dates and then left join that against sub queries to get the counts you require.
Something like this
SELECT Sub1.date, Sub2.emails, IFNULL(Sub3.emails, 0)
FROM (SELECT DISTINCT DATE(timestamp) AS date
FROM emails
WHERE timestamp >= now() - interval 1 month) Sub1
LEFT OUTER JOIN (SELECT DATE(timestamp) AS date, COUNT(id) AS emails
FROM emails WHERE timestamp >= now() - interval 1 month
GROUP BY DATE(timestamp)) Sub2
ON Sub2.date = Sub3.date
LEFT OUTER JOIN (SELECT DATE(timestamp) AS date, COUNT(*) AS emails
FROM emails
WHERE timestamp >= now() - interval 1 month AND nyhedsbrev = 1
GROUP BY DATE(timestamp)) Sub3
ON Sub1.date = Sub3.date
(you can probably optimise one subselect of this away, but I have done it in full to make it obvious how it is working)
Assuming newsletter is boolean 1/0 values then this might give you the table that you want:
SELECT DATE(timestamp) AS date, COUNT(*) AS emails, nyhedsbrev
FROM emails WHERE timestamp >= now() - interval 1 month GROUP BY DATE(timestamp),nyhedsbrev ;
Just adding another GROUP BY parameter.