So currently I have 2 tables called listings and logs table. The listings table holds a products reference number and it's current status. So suppose if it's status was Publish currently and it's sold, the status updates to Sold. Here the refno. in this table is unique since the status can change for 1 product.
Now I have another table called Logs table, this table records all the status changes that have happened for a particular product(referenced by refno) in a particular timeframe. Suppose the Product with refno. 5 was Publish on 1st October and Sold on 2nd October, The logs table will display as:
Refno
status_from
status_to
logtime
5
Stock
Publish
2021-10-01
5
Publish
Sold
2021-10-02
This is how my tables currently look like:
Listings table:('D'=>'Draft','N'=>'Action','Y'=>'Publish')
Logs Table which I'm getting using the following statement:
SELECT refno, logtime, status_from, status_to FROM (
SELECT refno, logtime, status_from, status_to, ROW_NUMBER() OVER(PARTITION BY refno ORDER BY logtime DESC)
AS RN FROM crm_logs WHERE logtime < '2021-10-12 00:00:00' ) r
WHERE r.RN = 1 UNION SELECT refno, logtime, status_from, status_to
FROM crm_logs WHERE logtime <= '2021-10-12 00:00:00' AND logtime >= '2015-10-02 00:00:00'
ORDER BY `refno` ASC
The logs table makes a new record every status change made and passes the current timestamp as the logtime, and the listings table changes/updates the status and updates its update_date. Now to get the total listings as of today I'm using the following statement:
SELECT SUM(status_to = 'D') AS draft, SUM(status_to = 'N') AS action, SUM(status_to = 'Y') AS publish FROM `crm_listings`
And this returns all the count data for status as of the current day.
Now this is where it gets confusing for me. So suppose today the count under action is 10 and tomorrow it'll be 15, and I want to retrieve the total that was present yesterday(10). So for this what I would've to do is take todays total(15) and subtract all the places where a product was changed to draft in between yesterday and today(Total count today in listing table - count(*) where status_to='Action' from logs table). Or vice versa, if yesterday it was 10 under action and today it is 5, it should add the values from the status_from column in logs table
Note: Refno isn't unique in my logs table since a product with the same refno can be marked as publish 1 day and unpublish another, but it is unique in my listings table.
Link to dbfiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=01cb3ccfda09f6ddbbbaf02ec92ca894
I am sure it can be simplifed or better. But its my query and logic :
I found status_changes per refno's and calculated total changes from the desired day to present :
select status_logs, sum(cnt_status) to_add from (
SELECT
status_to as status_logs, -1*count(*) cnt_status
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT
status_from, count(*) cnt_status_from
FROM logs lm
where
id = (select max(id) from logs l where l.refno = lm.refno) and
logtime >= '2021-10-01 00:00:00'
group by status_from ) total_changes
group by status_logs
I matched the keys between listings table and logs table by converting listings table keys :
select
case status
when 'D' THEN 'Draft'
when 'A' THEN 'Action'
when 'Y' THEN 'Publish'
when 'S' THEN 'Sold'
when 'N' THEN 'Let'
END status_l ,COUNT(*) c
from listings
group by status
I joined them and add the calculations to total sum of current data.
I had to use full outer join , so i have one left and one right join with the same subqueries.
Lastly I used distinct , since it will generate same result for each joined query and used ifnull to bring the other tables status to the other column .
select distinct IFNULL(status_l, status_logs) status, counts_at_2021_10_01
from (select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_10_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
left join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l
union all
select l.*,
logs.*,
l.c + ifnull(logs.to_add, 0) counts_at_2021_05_01
from (select case status
when 'D' THEN
'Draft'
when 'A' THEN
'Action'
when 'Y' THEN
'Publish'
when 'S' THEN
'Sold'
when 'N' THEN
'Let'
END status_l,
COUNT(*) c
from listings
group by status) l
right join (
select status_logs, sum(cnt_status) to_add
from (SELECT status_to as status_logs,
-1 * count(*) cnt_status
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_to
union all
SELECT status_from, count(*) cnt_status_from
FROM logs lm
where id = (select max(id)
from logs l
where l.refno = lm.refno)
and logtime >= '2021-10-01 00:00:00'
group by status_from) total_changes
group by status_logs) logs
on logs.status_logs = l.status_l) l
Related
So I have 2 tables called listings and logs with the following headers:
Logs Table:
The logs table makes a new record every status change made and passes the current timestamp as the logtime, and the listings table changes/updates the status and updates its update_date. Now to get the total listings as of today I'm using the following statement:
SELECT SUM(status = 'D') AS draft, SUM(status = 'N') AS action, SUM(status = 'Y') AS publish
FROM `crm_listings` where updated_date between '2021-05-29' and '2021-05-29'
Basically I want to return all the records that happened between a particular timeframe.
Note: Refno isn't unique in my logs table since a product with the same refno can be marked as publish 1 day and unpublish another, but it is unique in my listings table.
I suppose all listing entries appear in the log_table, even if it was created but no status change since then.
So it needs to lookup the log table only for the record with latest logtime up to the query date for each refno, the status_to for the selected record is the standing status for the date.
To archive this, sort the log_table by refno and logtime in descending order and find the first record for each refno.
SELECT SUM(status_to = 'Action'), SUM(status_to = 'Publish') FROM (
SELECT refno, status_to, ROW_NUMBER() OVER(PARTITION BY refno ORDER BY logtime DESC) AS RN
FROM log_table
WHERE logtime <= '2021-05-29 23:59:59'
) r
WHERE r.RN = 1
or this for older MySQL servers
SELECT SUM(status_to = 'Action'), SUM(status_to = 'Publish') FROM (
SELECT refno, status_to, CASE WHEN #RF != refno THEN #RN := 1 ELSE #RN := #RN + 1 END AS RN, #RF := refno
FROM log_table
JOIN ( SELECT #RN := 0 , #RF := '' ) f
WHERE logtime <= '2021-05-29 23:59:59'
ORDER BY refno ASC, logtime DESC
) r
WHERE r.RN = 1
If log_table is huge and this query is important, do consider to create another table to store such daily statistics.
I am unable to understand the use of this line in a code can someone please explain me about this or give some different way to approach to this question
Link to the question:https://www.hackerrank.com/challenges/15-days-of-learning-sql
Code:
select
submission_date ,
( SELECT COUNT(distinct hacker_id)
FROM Submissions s2
WHERE s2.submission_date = s1.submission_date
AND ( SELECT COUNT(distinct s3.submission_date)
FROM Submissions s3
WHERE
s3.hacker_id = s2.hacker_id
AND s3.submission_date < s1.submission_date
) = dateDIFF(s1.submission_date , '2016-03-01'))
, ( select hacker_id
from submissions s2
where s2.submission_date = s1.submission_date
group by hacker_id
order by count(submission_id) desc , hacker_id limit 1
) as shit
, ( select name
from hackers where hacker_id = shit
)
FROM
( select distinct submission_date
from submissions) s1
group by submission_date
Unable to understand why they have used this line from this part of the code:
(s3.submission_date < s1.submission_date) = dateDIFF(s1.submission_date , '2016-03-01'))
CREATE TABLE #max_submissions (
submission_date date,
hacker_id integer,
submission_count integer,
ordering_row integer
)
insert into #max_submissions
select
submission_date,
hacker_id,
submission_count,
row_number() over(partition by submission_date order by submission_count desc, hacker_id) as ordering_row
from (
select submission_date,
hacker_id,
count(hacker_id) as submission_count
from submissions
group by submission_date, hacker_id
) tbl_submission_count
CREATE TABLE #hacker_counts (
submission_date date,
hacker_count integer
)
insert into #hacker_counts
select tbl.submission_date,
COUNT(distinct tbl.hacker_id) as cc
from (
select *,
(case when (
(select count(*)
from (select distinct *
from (select s1.hacker_id,
s1.submission_date
from Submissions s1
where s1.hacker_id = s.hacker_id and
(s1.submission_date >= '2016-03-01' and
s1.submission_date <= s.submission_date)) t1
) t2
) >= (DATEDIFF(day, '2016-03-01', s.submission_date) + 1) )
then 1
else 0
end) as logic
from Submissions s
) tbl
where tbl.logic = 1
group by tbl.submission_date
select max_submissions.submission_date,
hacker_counts.hacker_count,
max_submissions.hacker_id,
h.name
from #max_submissions max_submissions
inner join hackers h on max_submissions.hacker_id = h.hacker_id
left join #hacker_counts hacker_counts on max_submissions.submission_date = hacker_counts.submission_date
where max_submissions.ordering_row = 1
order by max_submissions.submission_date
drop table #max_submissions
drop table #hacker_counts
To understand this line
( SELECT COUNT(distinct s3.submission_date)
FROM Submissions s3
WHERE
s3.hacker_id = s2.hacker_id
AND s3.submission_date < s1.submission_date)
= dateDIFF(s1.submission_date , '2016-03-01')
First understand left hand side:
(SELECT COUNT(distinct s3.submission_date) FROM Submissions s3 WHERE s3.hacker_id = s2.hacker_id AND s3.submission_date < s1.submission_date)
This line counts the unique submission dates for each hacker_id uptil the current date,
So if the date for one row is 2016-03-05, it will count unique submissions for a hacker_id uptil this date (note it will count multiple submissions by a single hacker on a day as 1 count only)
In other words, this takes a hacker_id and start checking if there is a submission by this hacker_id for each day from 1st day uptil this day,it will do this for each submission date
Then Understand Right Hand Side:
dateDIFF(s1.submission_date , '2016-03-01')
this will take the difference of this current date 2016-03-05 to first day 2016-03-01,
Understanding the whole statement now:
So if a hacker made at least one submission each day from 2016-03-05 to 2016-03-01, then both sides of the above code will be equal,
that is date difference from 5th to 1st will be 5 (Right Hand Side) and distinct submission date for a hacker who made at least one submission each day from 1st to 5th will also be 5 (left hand side)
I'm trying to select if a user rating (user.rating) is greater then 6 or if the user has more then 100 transactions (transaction table count). Basically count how many transactions the user has then where (transaction count >= 100 OR user rating >= 6).
SELECT *
FROM `user`
JOIN (SELECT COUNT(*)
FROM transaction
WHERE transaction.user_id=user.id
AND type='L'
AND status='S') AS tcount
WHERE (user.rating >= '6' OR tcount >= '100')
Just another possible answer. I've created simplified schemas to test it, please try it and let me know the result.
SELECT *
FROM user
WHERE user.rating >= 6 OR (SELECT COUNT(*) FROM transaction WHERE user_id = user.id and type = 'L' and status = 'S') >= 100;
Use an alias on COUNT(*)
SELECT *
FROM `user`
JOIN (SELECT user_id, COUNT(*) cnt
FROM transaction
WHERE type='L'
AND status='S'
GROUP BY user_id) AS tcount
ON user.id = tcount.user_id
WHERE (user.rating >= '6' OR tcount.cnt >= '100')
You can write that without the subquery, like this
SELECT u.id
FROM `user` u
JOIN `transaction` t
ON t.user_id=u.id
WHERE t.type = 'L' AND t.status = 'S'
GROUP BY u.id
HAVING sum(case when u.rating >= 6 then 1 end) > 0 OR count(*) >= 100
I have a table CONTACT with a field opt_out.
The field opt_out may have values 'Y', 'N' and NULL.
I have a table CONTACT_AUDIT with fields
date
contact_id
field_name
value_before
value_after
When I add a new contact, a new line is added in the CONTACT table, nothing the CONTACT_AUDIT table.
When I edit a contact, for example if I change the opt_out field value from NULL to 'Y', the opt_out field value in CONTACT table is changed and a new line is added to CONTACT_AUDIT table with values
date=NOW()
contact_id=<my contact's id>
field_name='opt_out'
value_before=NULL
value_after='Y'
I need to know the contacts who had opt_out='Y' at a given date.
I tried this :
SELECT count(*) AS nb
FROM contacts c
WHERE
( -- contact is optout now and has never been modified before
c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
)
OR ( -- we consider contacts where the last row before date in contacts_audit is optout = 'Y'
c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
ORDER BY date_created
LIMIT 1
)
)
But mysql does not support LIMIT in subquery.
So I tried with HAVING :
SELECT count(*) AS nb
FROM contacts c
WHERE
( -- contact is optout now and has never been modified before
c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
)
OR ( -- we consider contacts where the last row before date in contacts_audit is optout = 'Y'
c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
HAVING MAX(date_created)
)
)
The query runs, but now, I don't know how to know if the value corresponding to the subquery value is 'Y' or 'N'. If I add a WHERE clause to check only for 'Y' values, 'N' values will be filtred and I will not be able to know if the last value at date was 'Y' or 'N'...
Thank you for your help
If i understand your problem correctly you may want to use a union. I dont have mysql to test it right now but the code could be something like this. tell me if this helped
select c.id, c.optout
where c.optout = 'Y'
AND c.id NOT IN (SELECT DISTINCT contact_id FROM contacts_audit WHERE field_name = 'optout')
UNION
select c.id, c.optout where c.id IN (
SELECT ca.contact_id
FROM contacts_audit ca
WHERE date_created BETWEEN '2014-07-24' AND DATE_ADD( '2014-07-24', INTERVAL 1 DAY )
AND field_name = 'optout'
HAVING MAX(date_created)
)
I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID