SQL query that checks by week - mysql

I need an SQL query that checks for whether a person is active for two consecutive weeks in the year.
For example,
Table1:
Name | Activity | Date
Name1|Basketball| 08-08-2014
Name2|Volleyball| 08-09-2014
Name3|None | 08-10-2014
Name1|Tennis | 08-14-2014
I want to retrieve Name1 because that person has been active for two consecutive weeks in the year.
This is my query so far:
SELECT DISTINCT Name
FROM Table1
Where YEAR(Date) = 2014 AND
Activity NOT 'None' AND
This is where I would need the logic that checked for an activity in two consecutive weeks. A week can be described as 7 to 14 days later. I am working with MYSQL.

I have avoided using YEAR(Date) in the where clause deliberately, and recommend you do too. Using functions on multiple rows of data to suit a single criteria (2014) never makes sense to me, plus it destroys the effectiveness of indexes (see "sargable" at wikipedia). Way easier to just define a filter by a date range IMHO.
I've used a correlated subquery to derive nxt_date which might not scale very well, but overall the performance will depend on your indexes most probably.
select distinct
name
from (
select
t.name
, t.Activity
, t.`Date`
, (
select min(table1.`Date`) from table1
where t.name = table1.name
and table1.Activity <> 'None'
and table1.`Date` > t.`Date`
) as nxt_date
from table1 as t
where ( t.`Date` >= '2014-01-01' and t.`Date` < '2015-01-01' )
and t.Activity <> 'None'
) as sq
where datediff(sq.nxt_date, sq.`Date`) <= 14
;
see: http://sqlfiddle.com/#!9/cbbb3/9

You can do the logic using an exists subquery:
select t.*
from table1 t
where exists (select 1
from table1 t2
where t2.name = t.name and
t2.date between t.date + 7 and t.date + 14
);

I don't know if it is performance relevant, but I like concise queries:
SELECT t1.Name
FROM Table1 t1, Table1 t2
Where t1.Name=t2.Name AND
t1.Date >= '2014-01-01' AND t1.Date < '2015-01-01' AND
t1.Activity <> 'None' AND
t1.Date < t2.Date AND
datediff(t2.Date, t1.Date) <= 14
I liked the hint of #user2067753 about the YEAR(date).
I used the sqlfiddle of the answer above to check the performance using the explain syntax. It seems that avoiding sub queries as in VACN's answer or mine is beneficial (see join vs sub query)

From the top of my head, I suggest this query:
SELECT DISTINCT t1.Name
FROM Table1 AS t1, Table1 AS t2
WHERE t1.Name = t2.Name
AND t2.Date BETWEEN t1.Date-7 AND t1.Date+7;
The idea is basically: you call your table twice, select the rows whose names match, and then keep only those whose second date are up to 7 days away from the first date.

Related

How to create SQL based on complex rule?

I have 3 columns (id, date, amount) and trying to calculate 4th column (calculated_column).
How to create SQL query to do following:
The way that needs to be calculated is to look at ID (e.g. 1) and see all same IDs for that month (e.g. for first occurrence - 1-Sep it should be calculated as 5 and for second occurrence - it would be 5+6=11 -> all amounts from beginning of that month including that amount).
Then for the next month (Oct) - it will find first occurrence of id=1 and store 3 in calculated_column and for the second occurrence of id=1 in Oct it will do sum from beginning of that month for the same id (3+2=5)
Assuming I've understood correctly, I would suggest a correlated subquery such as:
select t.*,
(
select sum(u.amount) from table1 u
where
u.id = t.id and
date_format(u.date, '%Y-%m') = date_format(t.date, '%Y-%m') and u.date <= t.date
) as calculated_column
from table1 t
(Change the table name table1 to suit your data)
In Oracle and MySQL 8+, you can use window functions. The corresponding date arithmetic varies, but here is the idea:
select t.*,
(case when date = max(date) over (partition by to_char(date, 'YYYY-MM') and
id = 1
then sum(amount) over (partition by to_char(date, 'YYYY-MM')
end) as calculated_column
from t;
The outer case is simply to put the value on the appropriate row of the result set. The code would be simpler if all rows in the month had the same value.
Here is a solution for oracle. Since you did not gave the table name I named it my_table, change it to the real name
select
t1.id,
t1.date,
t1.amount,
decode(t1.id, 1, sum(nvl(t2.amount, 0)), null) calculated_column
from my_table1 t1
left join my_table t2
on trunc(t2.date, 'month') = trunc(t1.date, 'month')
and t1.id = 1
group by t1.id, t1.date, t1.amount
If your version supports window function (e.g. MySQL 8 upwards)
# MySQL 8+
select
t.*
, sum(amount) over (partition by id, date_format(date, '%Y-%m-01') order by date) as calculated_column
from t
;
-- Oracle
select
t.*
, sum(amount) over (partition by id, trunc(date, 'MM') order by date) as calculated_column
from t
;

Use JOIN instead of WHERE OR... IN sub query

Is there a better way to write the query below using JOIN?
SELECT
t1.id
FROM
t1
WHERE
(
t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.id2 IN (SELECT id FROM t2)
OR t1.id3 IN (SELECT id FROM t2)
OR t1.id4 IN (SELECT id FROM t2)
);
Update: Only date2 can be NULL
I can't think of a way to use a join here, but if the intent is to make the query more elegant and/or easy to maintain, as hinted by the comments, there are several things you can do.
First, the date condition. Instead of repeating the same term twice, you can just check if the greatest of the dates is greater or equal to the given term - assuming that neither date1 or date2 can be null.
Second, the IDs condition. Instead of repeating the same term thrice you can use a single exists condition. I don't have a MySQL database handy to check it, but chances are it will actually run faster than your original version and not just look cleaner:
SELECT
t1.id
FROM
t1
WHERE
GREATEST(t1.date1, t1.date2) >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR EXISTS (SELECT * FROM t2 WHERE id IN (t1.id2, t1.id3, t1.id4));
The OR operations are probably going to mess with your speed the most, OR's kill index performance.
This converts it to the most succinct join equivalent:
SELECT t1.id
FROM t1 INNER JOIN t2 ON t2.id IN (t1.id2, t1.id3, t1.id4)
WHERE t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
;
IN is logically an OR, but I think more recent MySQL versions have optimized it a bit; if not, this might have better performance:
SELECT t1.id
FROM t1
LEFT JOIN t2 AS t2_2 ON t2_2.id = t1.id2
LEFT JOIN t2 AS t2_3 ON t2_3.id = t1.id3
LEFT JOIN t2 AS t2_4 ON t2_4.id = t1.id4
WHERE (t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
)
AND (t2_2.id IS NOT NULL OR t2_3.id IS NOT NULL OR t2_4.id IS NOT NULL)
;
It avoids the OR's on the id fields, so may take advantage of indexing on those fields; but JOINs three times as much, so could still end up being slower.

SQL query Compare two WHERE clauses using same table

I am looking to compare two sets of data that are stored in the same table. I am sorry if this is a duplicate SO post, I have read some other posts but have not been able to implement it to solve my problem.
I am running a query to show all Athletes and times for the most recent date (2017-05-20):
SELECT `eventID`,
`location`,<BR>
`date`,
`barcode`,
`runner`,
`Gender`,
`time` FROM `TableName` WHERE `date`='2017-05-20'
I would like to compare the time achieved on the 20th May with the previous time for each athlete.
SELECT `time` FROM `TableName` WHERE `date`='2017-05-13'
How can I structure my query showing all of the ATHLETES, TIME on 13th, TIME on 20th
I have tried some methods such as UNION ALL for example
You can get the previous time using a correlated subquery:
SELECT t.*,
(SELECT t2.time
FROM TableName t2
WHERE t2.runner = t.runner AND t2.eventId = t.eventId AND
t2.date < t.date
ORDER BY t2.date DESC
LIMIT 1
) prev_time
FROM `TableName` t
WHERE t.date = '2017-05-20';
For performance, you want an index on (runner, eventid, date, time).

Use a sum of two values from different tables in MYSQL WHERE clause

I know there are similar questions but I've been struggling with this for almost two days and can't make solutions given to questions to work.
I have two tables, for clarity lets say they have following structure:
table1:
id, timestamp
1, 1481631111
2, 1481632222
3, 1481633333
table2:
id, extra_days
2, 2
3, 1
I need to select the most recent entry from table1 where timestamp has not passed CURRENT_TIMESTAMP even if added extra_days from table2. In other words, exclude rows if timestamp + extra_days * 68400 is greater than NOW.
My latest attempt looks like this but the WHERE clause seems to be ignored:
SELECT t1.id, t1.timestamp FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id WHERE t1.timestamp + (68400 * COALESCE(t2.extra_days,0)) < CURRENT_TIMESTAMP ORDER BY t1.timestamp DESC LIMIT 1
You are comparing an integer (the unix timestamp calculated by adding the number of seconds) to a date and time.
Try using UNIX_TIMESTAMP() function instead of CURRENT_TIMESTAMP()
SELECT t1.id, t1.timestamp
FROM table1 t1
LEFT JOIN table2 t2 ON t1.id = t2.id
WHERE t1.timestamp + (68400 * COALESCE(t2.extra_days,0)) < UNIX_TIMESTAMP()
ORDER BY t1.timestamp DESC LIMIT 1;
Depending on the SQL-DB Software you use, look at their function for date arithmetic. They may provide handy function for this, like mariadb does for dates:
MariaDB ADDDATE Function

Select the first element in each day of the month

How to select the first element of each day in a month with mysql query ?
I have table with offers - startdate, so i can check for each day,month,year i'm getting the element but, i'm wondering how to get only the first element in each day of some month ?
Assume the following
Table is called mytable
Table has id as primary key
Table has dt as datatime
You want the first id of everyday in February 2012
Try this:
SELECT B.id FROM
(
SELECT DATE(dt) date_dt,MIN(dt) dt
FROM mytable
WHERE dt >= '2012-02-01 00:00:00'
AND dt < '2012-03-01 00:00:00'
GROUP BY DATE(dt)
) A
LEFT JOIN mytable B USING (dt);
If any dt has multiple B.id values try this:
SELECT dt,MIN(id) id
(
SELECT B.id,B.dt FROM
(
SELECT DATE(dt) date_dt,MIN(dt) dt
FROM mytable
WHERE dt >= '2012-02-01 00:00:00'
AND dt < '2012-03-01 00:00:00'
GROUP BY DATE(dt)
) A
LEFT JOIN mytable B USING (dt)
) AA GROUP BY dt;
Assuming startdate is a DATETIME type, and the earliest entry is the one with the earliest DATETIME value, for March, 2012:
SELECT DISTINCT *
FROM tbl t1
LEFT JOIN tbl t2
ON (t2.startdate BETWEEN '2012-02-01 00:00:00' AND '2012-02-29 23:59:59')
AND t2.startdate < t1.startdate
WHERE (t1.startdate BETWEEN '2012-02-01 00:00:00' AND '2012-02-29 23:59:59')
AND t2.startdate IS NULL
If there are no duplicate dates, then you don't need the DISTINCT.
This query works by joining with any earlier record for the same month, so if nothing was joined, it's the earliest, through process of elimination.
This technique is explained in detail in the book SQL Antipatterns.
This could also be solved with subqueries, but this type of JOIN is supposed to be easier to optimize by MySQL than subqueries, which often negate the use of indexes.
without knowing the exact structure of your table something like this should work:
SELECT MIN(offerId) FROM offers WHERE startdate <= '2012-03-06' AND startdate >= '2012-02-06' GROUP BY date(startdate)
It sounds like you are trying to do something like the following:
SELECT col_1, date_col, col_3 FROM tbl
WHERE
date_col = ( SELECT min(date_col) FROM tbl
WHERE
year(date_col) = 2006 AND
month(date_col) = 02
);
This can also be used to find the max( date_col ) . Hope this helps.
Just to offer a different way to skin this cat (much easier in SQL Server for once actually)
SELECT
t0.offerId
FROM
offers AS t0 LEFT JOIN
offers AS t1 ON t0.offerId = t1.offerId AND t1.startDate > t0.startDate AND
(t0.startDate BETWEEN '2012-02-01' AND '2012-03-01') AND
(t1.startDate BETWEEN '2012-02-01' AND '2012-03-01')
WHERE
t1.col1 IS NULL;
If you have multiple rows with the same exact time you will get multiple values returned, which you can weed out in your application logic or with a sub-query. BTW this is called a groupwise minimum/maximum.