Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I need to store tariffs connected to a port.
So, the table can look like this:
create table tariffs(
int NOT NULL AUTO_INCREMENT,
price decimal(12,2),
expiry bigint(11)
)
expiry represents a timestamp when that particular tariff will expire.
So I might have data like this:
id | price | expiry
1 | 11.00 | 30/Jan/2022
2 | 12.00 | 30/Feb/2022
3 | 13.00 | 30/Mar/2022
4 | 14.00 | 30/Apr/2022
5 | 15.00 | null
In this case, ID 5 isn't expired yet, meaning that it's current.
(I realise I put dates, there, rather than timestamps; I did so that it's easier to read)
The problem I have is in the logic to figure out which tariff to use given a specific date.
In an ideal world, if 5 were "Infinite", I could just do WHERE expiry > date_apply limit 1 -- however, I don't have that luxury since date_apply won't be returned at all.
I COULD assign a very big number to expiry for the "current" entry. It would make the query work regardless. But... it feels wrong.
Somebody recommended using TWO fields for each tariff, a "from" and a "to", telling me that otherwise querying will be a nightmare. I am beginning to see what they mean... but then I fear operators might unwillingly have "holes" in the timeframes for tariffs, which would be difficult to prevent.
How should I organise my table, and how should I query it? What's the best practices here?
SELECT COALESCE(t2.price, t1.price) AS price
FROM (SELECT price FROM tariffs WHERE expiry IS NULL LIMIT 1) AS t1
LEFT OUTER JOIN (SELECT price FROM tariffs WHERE expiry > ? ORDER BY expiry DESC LIMIT 1) AS t2
Demo: https://www.db-fiddle.com/f/wykqR5X7B9S424AWkA4aQy/1
The first subquery is bound to return 1 row if you have at least one unexpired tariff.
The second subquery may not return 1 row, if you put in a date too late. So I change this join to LEFT OUTER JOIN. If there is no matching row for the condition on expiry, the subquery will return no rows, and the outer join will replace these with NULLs.
So if t2.* is NULL, then the COALESCE() defaults to the unexpired value in t1.price.
You can leave the ultimate price with an expiry null and we can use coalesce to assign a value according to the logic needed at the time.
Here we start with only an expired tarif and the expiry = null tarif. We create a view that gives us the expiry as undefined. We then add a tarif which is valid and it is correctly returned by the same view.
create table tariffs(
id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
price decimal(12,2),
expiry date);
insert into tariffs (price,expiry) values (11,'2022-01-30'),(12,null);
create view current_tarif as
select id, price, coalesce(expiry,'undefined') expiry
from tariffs
where coalesce(expiry,'3000-12-31') > curdate()
order by coalesce(expiry,'3000-12-31')
limit 1;
select * from current_tarif;
id | price | expiry
-: | ----: | :--------
2 | 12.00 | undefined
insert into tariffs (price,expiry) values (15,'2022-12-30');
select * from current_tarif;
id | price | expiry
-: | ----: | :---------
3 | 15.00 | 2022-12-30
db<>fiddle here
Build your validity and expiry dates/timestamps as you go, using OLAP functions.
WITH
indata(id,price,expiry) AS (
SELECT 1,11.00,DATE '30-Jan-2022'
UNION ALL SELECT 2,12.00,DATE '28-Feb-2022'
UNION ALL SELECT 3,13.00,DATE '30-Mar-2022'
UNION ALL SELECT 4,14.00,DATE '30-Apr-2022'
UNION ALL SELECT 5,15.00,NULL
)
,
enriched AS (
SELECT
id
, price
, LAG(NVL(expiry, '9999-12-31'),1,'0001-01-01') OVER(ORDER BY id) AS validity
, NVL(expiry, '9999-12-31') AS expiry
FROM indata
-- chk id | price | validity | expiry
-- chk ----+-------+------------+------------
-- chk 1 | 11.00 | 0001-01-01 | 2022-01-30
-- chk 2 | 12.00 | 2022-01-30 | 2022-02-28
-- chk 3 | 13.00 | 2022-02-28 | 2022-03-30
-- chk 4 | 14.00 | 2022-03-30 | 2022-04-30
-- chk 5 | 15.00 | 2022-04-30 | 9999-12-31
)
SELECT
price
FROM enriched
WHERE '2022-04-22' >= validity
AND '2022-04-22 < expiry
;
You can write a CASE WHEN statement and change the expiry column to max value 999-12-31(253402270022) if expiry is null then sort and get max expiry. Then you can execute the condition expiry > date_apply
WITH maxTariffs AS
(SELECT id,
(CASE
WHEN expiry IS NULL
THEN 253402270022
ELSE expiry
END) AS expiry
FROM tariffs)
SELECT * FROM tariffs WHERE id IN (SELECT id FROM maxTariffs WHERE expiry > DATE_APPLY ORDER BY expiry ASC ) LIMIT 1
Demo in DBfiddle
Related
I'm trying to understand the logic behind the syntax below. Based on the following question, table and syntax:
Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
Column + Data Type:
id: int | user_id: int | item: varchar |created_at: datetime | revenue: int
SELECT DISTINCT(a1.user_id)
FROM amazon_transactions a1
JOIN amazon_transactions a2 ON a1.user_id=a2.user_id
AND a1.id <> a2.id
AND a2.created_at::date-a1.created_at::date BETWEEN 0 AND 7
ORDER BY a1.user_id
Why does the table need to be joined to itself in this case?
How does 'AND a1.id <> a2.id' portion of syntax contribute to the join?
You are looking for users that have 2 records on that table whose date distance is lower (or equal) than 7 days
To accomplish this, you treat the table as if it were 2 different (but equal tables) because you have to match a row on the first table with a row on the second table
Of course you don't want to match a row with itself, so
AND a1.id <> a2.id
accomplishes that
The table needs to be joined with itself because, you just have one table, and you want to find out returning users (by comparing the duration between transaction dates for the same user).
AND a1.id <> a2.id portion of the syntax removes the same transactions, i.e. prevents the transactions with the same id to be included in the joined table.
There are two scenarios I can think of based on the id column values. Are id column values generated based on timely sequence ? If so, to answer your first question ,we can but don't have to use join syntax. Here is how to achieve your goal using a correlated subquery , with sample data created.
create table amazon_transactions(id int , user_id int , item varchar(20),created_at datetime , revenue int);
insert amazon_transactions (id,user_id,created_at) values
(1,1,'2020-01-05 15:33:22'),
(2,2,'2020-01-05 16:33:22'),
(3,1,'2020-01-08 18:33:22'),
(4,1,'2020-01-22 17:33:22'),
(5,2,'2020-02-05 15:33:22'),
(6,2,'2020-03-05 15:33:22');
select * from amazon_transactions;
-- sample set:
| id | user_id | item | created_at | revenue |
+------+---------+------+---------------------+---------+
| 1 | 1 | NULL | 2020-01-05 15:33:22 | NULL |
| 2 | 2 | NULL | 2020-01-05 16:33:22 | NULL |
| 3 | 1 | NULL | 2020-01-08 18:33:22 | NULL |
| 4 | 1 | NULL | 2020-01-22 17:33:22 | NULL |
| 5 | 2 | NULL | 2020-02-05 15:33:22 | NULL |
| 6 | 2 | NULL | 2020-03-05 15:33:22 | NULL |
-- Here is the answer using a correlated subquery:
select distinct user_id
from amazon_transactions t
where datediff(
(select created_at from amazon_transactions where user_id=t.user_id and id-t.id>=1 limit 1 ),
created_at
)<=7
;
-- result:
| user_id |
+---------+
| 1 |
However,what if the id values are NOT transaction time based? Then the id values are not at all helpful in our requirement. In this case, a JOIN is more capable than a correlated subquery and we need to arrange the order based on transaction time for each user in order to make the necessary join condition. And to answer your second question, the AND a1.id <> a2.id portion of syntax contribute by excluding two of the same transaction making a pair. However, to my understanding the matching scope is too high to be effective. We only care if CONSECUTIVE transactions have a within-7-day gap, but the AND a1.id <> a2.id overdoes the job. For instance, we want to check the gap between transaction1 and transaction2,transaction2 and transaction3, NOT transaction1 and transaction3
Note: by using the user variable row_id trick, we can produce the row id which is used to match consecutive transactions for each user, thus eliminating the wasteful job of random transaction check.
select distinct t1.user_id
from
(select user_id,created_at,#row_id:=#row_id+1 as row_id
from amazon_transactions ,(select #row_id:=0) t
order by user_id,created_at)t1
join
(select user_id,created_at,#row_num:=#row_num+1 as row_num
from amazon_transactions ,(select #row_num:=0) t
order by user_id,created_at)t2
on t1.user_id=t2.user_id and t2.row_num-t1.row_id=1 and datediff(t2.created_at,t1.created_at)<=7
;
-- result
| user_id |
+---------+
| 1 |
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Trying to retrieve records by first summing their time_spent, then using max to retrieve the largest record by time. which seems to be working.
I need to now check if on the chance that the sum of time_spent are the same value for users (a tie, like in the example below, both users have time_spent as 10 so it should then select the user that has the latest post), if they are then I need to only get the user_id that was posted last (newer) using the created_at column. I just don't know what to use to do that check, is it a CASE, or IF Function? and if so where would it go in my query?
Here is a sql fiddle link: http://sqlfiddle.com/#!9/f24985/2
Table1 layout
+----+---------+-----------+---------+------------+------------+
| id | user_id | member_id | item_id | time_spent | created_at |
+----+---------+-----------+---------+------------+------------+
| 1 | 1 | 1 | 1 | 5 | 2019-06-01 |
| 2 | 2 | 1 | 1 | 1 | 2019-06-07 |
| 3 | 2 | 1 | 1 | 5 | 2019-06-08 |
| 4 | 2 | 1 | 2 | 4 | 2019-06-01 |
| 5 | 1 | 1 | 2 | 5 | 2019-06-07 |
+----+---------+-----------+---------+------------+------------+
Current SQL:
SELECT
MAX(attribute_time.sum_time), attribute_time.user_id
FROM (
SELECT
SUM(time_spent) AS sum_time, user_id
FROM
table1
WHERE
member_id = 1
AND item_id IN (1, 2)
AND (created_at BETWEEN '2019-06-1' AND '2019-06-30')
GROUP BY
user_id
ORDER BY
sum_time desc
) AS attribute_time;
In this example, both users have a total of 10 for time, currently returns the first record of the 2 and not based on the created_at date, which in this case, should be user 2.
Expected
+---------+
| user_id |
+---------+
| 2 |
+---------+
This is what you are looking for. http://sqlfiddle.com/#!9/a5306c/4/0
The MAX clause is problematic for sub-queries involving quantities unless you use some repetitive and verbose queries (DRY!), as seen in answer here: MySQL: Select MAX() from sub-query with COUNT() - it seems to decouple the rows, so you get the highest (max) sum_time with the wrong id (I thought I was seeing things, seemed so simple)
I used LIMIT to get around it. Sorting descending (the highest on top), and then LIMITing the result to 1 achieves the same thing as "Max".
Also - Im not sure if in event of a tie in max time you wanted to pick the earliest or latest record, but this picks the latest. I use MAX to pick the last day/time for each user, and orderby sum_of_time, then by date. If you want the opposite, sub MIN for MAX and/or DESC for ASC in the order-by. Regards! Thx for the exercise.
SELECT
SUM(time_spent) AS sum_time, user_id, MAX(created_at)
FROM
Table1
WHERE
member_id = 1
AND item_id IN (1, 2)
AND (created_at BETWEEN '2019-06-1' AND '2019-06-30')
GROUP BY
user_id
ORDER BY
sum_time DESC, created_at DESC
LIMIT 1
Try to use this it will give you user id 2
SELECT
MAX(attribute_time.sum_time), attribute_time.user_id
FROM (
SELECT
SUM(time_spent) AS sum_time, user_id
FROM
table1
WHERE
member_id = 1
AND item_id IN (1, 2)
AND (created_at BETWEEN '2019-06-1' AND '2019-06-30')
GROUP BY
user_id
ORDER BY
sum_time,user_id desc
) AS attribute_time;
Using MySQL, I have a table that keep track of user visit:
USER_ID | TIMESTAMP
--------+----------------------
1 | 2014-08-11 14:37:36
2 | 2014-08-11 12:37:36
3 | 2014-08-07 16:37:36
1 | 2014-07-14 15:34:36
1 | 2014-07-09 14:37:36
2 | 2014-07-03 14:37:36
3 | 2014-05-23 15:37:36
3 | 2014-05-13 12:37:36
Time is not important, more concern about answer to "how many days between entries"
How do I go about figuring how the average number of days between entries through SQL queries?
For example, the output should look like something like:
(output is just a sample, not reflection of the data table above)
USER_ID | AVG TIME (days)
--------+----------------------
1 | 2
2 | 3
3 | 1
MySQL has no direct "get something from a previous row" capabilities. Easiest workaround is to use a variable to store that "previous" value:
SET last = null;
SELECT user_id, AVG(diff)
FROM (
SELECT user_id, IF(last IS NULL, 0, timestamp - last) AS diff, #last := timestamp
FROM yourtable
ORDER BY user_id, timestamp ASC
) AS foo
GROUP BY user_id
The inner query does your "difference from previous row" calculations, and the outer query does the averaging.
Sorry for the specific title, I tied to think of a way to generalize it more but I'm not that knowledgeable - guess that's why I'm asking here...
I've got a table with millions of transactions and one of the columns is the ID of the department that performed that particular transaction:
+-----------------------------------+
| ID | DeptID | Amount | Date |
+-----------------------------------+
| 1 | 46 | 4.99 | 2010-01-01 |
+-----------------------------------+
| 2 | 46 | 2.99 | 2010-03-07 |
+-----------------------------------+
| 3 | 57 | 9.99 | 2010-04-04 |
+-----------------------------------+
I want to perform a query that will return any 1 department ID that contains at least 1 transaction for every month in the last year (today is 2011-07-28, I it to start with 2010-08-01 and end with 2011-07-28)
Is there a way to do this without multiple queries?
SELECT DeptID, COUNT(DISTINCT MONTH(`date`)) AS month_count
FROM Transactions
WHERE `date` >= CURDATE() - INTERVAL 1 YEAR
GROUP BY DeptID
HAVING month_count = 12;
I believe you would need a column to be updated when transactions are made, or have a table with id, transaction time and transactions columns, where id is the id, transaction time is whenever they make transactions and transaction column increments by 1 everytime transaction is made, then create a single query where transaction is greater than or equals 1 and transaction time is greater than or equals current time - 1 year (in seconds)
OK, have a variable which is the current date in same format as in the table you have now, then do where database's date + 1 year is greater than equal to current date, if not then its more than a year old
I have a database table which holds each user's checkins in cities. I need to know how many days a user has been in a city, and then, how many visits a user has made to a city (a visit consists of consecutive days spent in a city).
So, consider I have the following table (simplified, containing only the DATETIMEs - same user and city):
datetime
-------------------
2011-06-30 12:11:46
2011-07-01 13:16:34
2011-07-01 15:22:45
2011-07-01 22:35:00
2011-07-02 13:45:12
2011-08-01 00:11:45
2011-08-05 17:14:34
2011-08-05 18:11:46
2011-08-06 20:22:12
The number of days this user has been to this city would be 6 (30.06, 01.07, 02.07, 01.08, 05.08, 06.08).
I thought of doing this using SELECT COUNT(id) FROM table GROUP BY DATE(datetime)
Then, for the number of visits this user has made to this city, the query should return 3 (30.06-02.07, 01.08, 05.08-06.08).
The problem is that I have no idea how shall I build this query.
Any help would be highly appreciated!
You can find the first day of each visit by finding checkins where there was no checkin the day before.
select count(distinct date(start_of_visit.datetime))
from checkin start_of_visit
left join checkin previous_day
on start_of_visit.user = previous_day.user
and start_of_visit.city = previous_day.city
and date(start_of_visit.datetime) - interval 1 day = date(previous_day.datetime)
where previous_day.id is null
There are several important parts to this query.
First, each checkin is joined to any checkin from the previous day. But since it's an outer join, if there was no checkin the previous day the right side of the join will have NULL results. The WHERE filtering happens after the join, so it keeps only those checkins from the left side where there are none from the right side. LEFT OUTER JOIN/WHERE IS NULL is really handy for finding where things aren't.
Then it counts distinct checkin dates to make sure it doesn't double-count if the user checked in multiple times on the first day of the visit. (I actually added that part on edit, when I spotted the possible error.)
Edit: I just re-read your proposed query for the first question. Your query would get you the number of checkins on a given date, instead of a count of dates. I think you want something like this instead:
select count(distinct date(datetime))
from checkin
where user='some user' and city='some city'
Try to apply this code to your task -
CREATE TABLE visits(
user_id INT(11) NOT NULL,
dt DATETIME DEFAULT NULL
);
INSERT INTO visits VALUES
(1, '2011-06-30 12:11:46'),
(1, '2011-07-01 13:16:34'),
(1, '2011-07-01 15:22:45'),
(1, '2011-07-01 22:35:00'),
(1, '2011-07-02 13:45:12'),
(1, '2011-08-01 00:11:45'),
(1, '2011-08-05 17:14:34'),
(1, '2011-08-05 18:11:46'),
(1, '2011-08-06 20:22:12'),
(2, '2011-08-30 16:13:34'),
(2, '2011-08-31 16:13:41');
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT v.user_id,
COUNT(DISTINCT(DATE(dt))) number_of_days,
MAX(days) number_of_visits
FROM
(SELECT user_id, dt
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS days,
#last_dt := DATE(dt),
#last_user := user_id
FROM
visits
ORDER BY
user_id, dt
) v
GROUP BY
v.user_id;
----------------
Output:
+---------+----------------+------------------+
| user_id | number_of_days | number_of_visits |
+---------+----------------+------------------+
| 1 | 6 | 3 |
| 2 | 2 | 1 |
+---------+----------------+------------------+
Explanation:
To understand how it works let's check the subquery, here it is.
SET #i = 0;
SET #last_dt = NULL;
SET #last_user = NULL;
SELECT user_id, dt,
#i := IF(#last_user IS NULL OR #last_user <> user_id, 1, IF(#last_dt IS NULL OR (DATE(dt) - INTERVAL 1 DAY) > DATE(#last_dt), #i + 1, #i)) AS
days,
#last_dt := DATE(dt) lt,
#last_user := user_id lu
FROM
visits
ORDER BY
user_id, dt;
As you see the query returns all rows and performs ranking for the number of visits. This is known ranking method based on variables, note that rows are ordered by user and date fields. This query calculates user visits, and outputs next data set where days column provides rank for the number of visits -
+---------+---------------------+------+------------+----+
| user_id | dt | days | lt | lu |
+---------+---------------------+------+------------+----+
| 1 | 2011-06-30 12:11:46 | 1 | 2011-06-30 | 1 |
| 1 | 2011-07-01 13:16:34 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 15:22:45 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-01 22:35:00 | 1 | 2011-07-01 | 1 |
| 1 | 2011-07-02 13:45:12 | 1 | 2011-07-02 | 1 |
| 1 | 2011-08-01 00:11:45 | 2 | 2011-08-01 | 1 |
| 1 | 2011-08-05 17:14:34 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-05 18:11:46 | 3 | 2011-08-05 | 1 |
| 1 | 2011-08-06 20:22:12 | 3 | 2011-08-06 | 1 |
| 2 | 2011-08-30 16:13:34 | 1 | 2011-08-30 | 2 |
| 2 | 2011-08-31 16:13:41 | 1 | 2011-08-31 | 2 |
+---------+---------------------+------+------------+----+
Then we group this data set by user and use aggregate functions:
'COUNT(DISTINCT(DATE(dt)))' - counts the number of days
'MAX(days)' - the number of visits, it is a maximum value for the days field from our subquery.
That is all;)
As data sample provided by Devart, the inner "PreQuery" works with sql variables. By defaulting the #LUser to a -1 (probable non-existent user ID), the IF() test checks for any difference between last user and current. As soon as a new user, it gets a value of 1... Additionally, if the last date is more than 1 day from the new date of check-in, it gets a value of 1. Then, the subsequent columns reset the #LUser and #LDate to the value of the incoming record just tested against for the next cycle. Then, the outer query just sums them up and counts them for the final correct results per the Devart data set of
User ID Distinct Visits Total Days
1 3 9
2 1 2
select PreQuery.User_ID,
sum( PreQuery.NextVisit ) as DistinctVisits,
count(*) as TotalDays
from
( select v.user_id,
if( #LUser <> v.User_ID OR #LDate < ( date( v.dt ) - Interval 1 day ), 1, 0 ) as NextVisit,
#LUser := v.user_id,
#LDate := date( v.dt )
from
Visits v,
( select #LUser := -1, #LDate := date(now()) ) AtVars
order by
v.user_id,
v.dt ) PreQuery
group by
PreQuery.User_ID
for a first sub-task:
select count(*)
from (
select TO_DAYS(p.d)
from p
group by TO_DAYS(p.d)
) t
I think you should consider changing database structure. You could add table visits and visit_id into your checkins table. Each time you want to register new checkin you check if there is any checkin a day back. If yes then you add a new checkin with visit_id from yesterday's checkin. If not then you add new visit to visits and new checkin with new visit_id.
Then you could get you data in one query with something like that:
SELECT COUNT(id) AS number_of_days, COUNT(DISTINCT visit_id) number_of_visits FROM checkin GROUP BY user, city
It's not very optimal but still better than doing anything with current structure and it will work. Also if results can be separate queries it will work very fast.
But of course drawbacks are you will need to change database structure, do some more scripting and convert current data to new structure (i.e. you will need to add visit_id to current data).