Is there a better way to write the query below using JOIN?
SELECT
t1.id
FROM
t1
WHERE
(
t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.id2 IN (SELECT id FROM t2)
OR t1.id3 IN (SELECT id FROM t2)
OR t1.id4 IN (SELECT id FROM t2)
);
Update: Only date2 can be NULL
I can't think of a way to use a join here, but if the intent is to make the query more elegant and/or easy to maintain, as hinted by the comments, there are several things you can do.
First, the date condition. Instead of repeating the same term twice, you can just check if the greatest of the dates is greater or equal to the given term - assuming that neither date1 or date2 can be null.
Second, the IDs condition. Instead of repeating the same term thrice you can use a single exists condition. I don't have a MySQL database handy to check it, but chances are it will actually run faster than your original version and not just look cleaner:
SELECT
t1.id
FROM
t1
WHERE
GREATEST(t1.date1, t1.date2) >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR EXISTS (SELECT * FROM t2 WHERE id IN (t1.id2, t1.id3, t1.id4));
The OR operations are probably going to mess with your speed the most, OR's kill index performance.
This converts it to the most succinct join equivalent:
SELECT t1.id
FROM t1 INNER JOIN t2 ON t2.id IN (t1.id2, t1.id3, t1.id4)
WHERE t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
;
IN is logically an OR, but I think more recent MySQL versions have optimized it a bit; if not, this might have better performance:
SELECT t1.id
FROM t1
LEFT JOIN t2 AS t2_2 ON t2_2.id = t1.id2
LEFT JOIN t2 AS t2_3 ON t2_3.id = t1.id3
LEFT JOIN t2 AS t2_4 ON t2_4.id = t1.id4
WHERE (t1.date1 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
OR t1.date2 >= (UTC_TIMESTAMP() + INTERVAL - 20 Hour)
)
AND (t2_2.id IS NOT NULL OR t2_3.id IS NOT NULL OR t2_4.id IS NOT NULL)
;
It avoids the OR's on the id fields, so may take advantage of indexing on those fields; but JOINs three times as much, so could still end up being slower.
Related
I have below two tables in pasted scree shot,
Query with these table is fine for me to take data for last 24 hrs as this will give me 1st device in the table.
now I need new query that 2nd device in T1 also should come because this device create time in T1 is within 24 hrs of T2 insert time.
3rd device in T1 should not come in my query result because its create time in T1 is greater than 24 hrs than insert time in T2.
I am looking query for the last two points.
Select a.device,[a.create time], b.device, [b.insert time]
from T1 a, T2 b
where a.device = b.device and a.time >= DATE_SUB(NOW(), INTERVAL 24 HOUR)
The a.device = b.device should be your join condition.
Select a.device,[a.create time], b.device, [b.insert time]
from T1 a inner join T2 b on a.device = b.device
where
a.time >= DATE_SUB(NOW(), INTERVAL 24 HOUR) --first condtion
or a.time <= DATE_SUB(b.time, INTERVAL 24 HOUR) --second condition
I have a table
id, date
a , 2017-01-01
a , 2017-01-02
b , 2017-02-03
...
and I'd like to compute for each day D, how many distinct user appeared exactly 7 days ago (on that day), but not in-between D-7 and D. Don't care about if they appear before day D
And the output shall be
date, count
2017-01-01, 23
2017-01-02, 33
etc
I've been thinking about this for quite a while, but can't figure out the D to D+7 part out. Easily converted into python, but I'd like to sharpen my SQL skills :)
I know basic select, group by clauses, but I'm just wondering if there're any advanced techniques I should know about.
Any help would be appreciated
You can check if the user appeared on that day and 7 days ago
SELECT DDate,
COUNT(*) cnt
FROM tablename a
WHERE id IN (SELECT id
FROM tablename
WHERE DDate = DATE_SUB(a.DDate, INTERVAL 7 DAY)
)
GROUP BY DDate
I'm just trying to help you with the assumption of what I understand about your question
just from the documentation
select count(date), date from tablename where date<=CURDATE() + interval 7 day group by date
You can use a left join on the same table for the 7 days in the future to see if the ID shows up. If it doesn't show up, the left joined table's id will be null.
select count(distinct t1.id), t1.date + interval 7 day
from table t1
left join table t2 on t2.id = t1.id and t2.date < t1.date + interval 7 day and t2.date >= t1.date
where t2.id is null;
Similar to Ferdinand Gasper's answer, but this excludes the users who appeared less than 7 days before:
SELECT date, COUNT(DISTINCT id)
FROM yourTable AS t1
WHERE id IN (SELECT id
FROM yourTable AS t2
WHERE t2.date = DATE_SUB(t1.date, INTERVAL 7 DAY))
AND id NOT IN (SELECT id
FROM yourTable AS t2
WHERE t2.date BETWEEN DATE_SUB(t1.date, INTERVAL 6 DAY) AND DATE_SUB(t1.date, INTERVAL 1 DAY))
GROUP BY date
I need LEFT JOIN ON Table1.userid=Table2.id and delete users from Table1 which has more than 90 days since register date in Table2.registerDate (datetime format). How to build SQL query for this?
It would look something like this:
delete t1
from table1 t1 join
table2 t2
on t1.userid = t2.id
where t1.date > t2.registerdate + interval 90 day;
I am not sure if "90 days since" means before or after. The above tests for dates that are 90 days after the register date. < t2.registerdate - interval 90 day would be for "before".
I need an SQL query that checks for whether a person is active for two consecutive weeks in the year.
For example,
Table1:
Name | Activity | Date
Name1|Basketball| 08-08-2014
Name2|Volleyball| 08-09-2014
Name3|None | 08-10-2014
Name1|Tennis | 08-14-2014
I want to retrieve Name1 because that person has been active for two consecutive weeks in the year.
This is my query so far:
SELECT DISTINCT Name
FROM Table1
Where YEAR(Date) = 2014 AND
Activity NOT 'None' AND
This is where I would need the logic that checked for an activity in two consecutive weeks. A week can be described as 7 to 14 days later. I am working with MYSQL.
I have avoided using YEAR(Date) in the where clause deliberately, and recommend you do too. Using functions on multiple rows of data to suit a single criteria (2014) never makes sense to me, plus it destroys the effectiveness of indexes (see "sargable" at wikipedia). Way easier to just define a filter by a date range IMHO.
I've used a correlated subquery to derive nxt_date which might not scale very well, but overall the performance will depend on your indexes most probably.
select distinct
name
from (
select
t.name
, t.Activity
, t.`Date`
, (
select min(table1.`Date`) from table1
where t.name = table1.name
and table1.Activity <> 'None'
and table1.`Date` > t.`Date`
) as nxt_date
from table1 as t
where ( t.`Date` >= '2014-01-01' and t.`Date` < '2015-01-01' )
and t.Activity <> 'None'
) as sq
where datediff(sq.nxt_date, sq.`Date`) <= 14
;
see: http://sqlfiddle.com/#!9/cbbb3/9
You can do the logic using an exists subquery:
select t.*
from table1 t
where exists (select 1
from table1 t2
where t2.name = t.name and
t2.date between t.date + 7 and t.date + 14
);
I don't know if it is performance relevant, but I like concise queries:
SELECT t1.Name
FROM Table1 t1, Table1 t2
Where t1.Name=t2.Name AND
t1.Date >= '2014-01-01' AND t1.Date < '2015-01-01' AND
t1.Activity <> 'None' AND
t1.Date < t2.Date AND
datediff(t2.Date, t1.Date) <= 14
I liked the hint of #user2067753 about the YEAR(date).
I used the sqlfiddle of the answer above to check the performance using the explain syntax. It seems that avoiding sub queries as in VACN's answer or mine is beneficial (see join vs sub query)
From the top of my head, I suggest this query:
SELECT DISTINCT t1.Name
FROM Table1 AS t1, Table1 AS t2
WHERE t1.Name = t2.Name
AND t2.Date BETWEEN t1.Date-7 AND t1.Date+7;
The idea is basically: you call your table twice, select the rows whose names match, and then keep only those whose second date are up to 7 days away from the first date.
What's the best practice of using subqueries versus calculations multiple times? I've used subqueries until now, but they seem so ridiculous to have when you just need a variable calculated from the previous query (in the following example we're talking about a query with a subquery with a subquery).
So which is the right / best practice method? Personally, being a programmer, everything in me tells me to use method a, seeing as it seems stupid to copy paste calculations, but at the same time, subqueries aren't always good seeing as it can make the query use filesort instead of index sorts (correct me if I'm wrong in this, please).
Method a - subqueries:
SELECT
tmp2.*
FROM
(
SELECT
tmp.*,
(NOW() < tmp.expire_time) as `active`
FROM
(
SELECT
tr.orderid,
tr.transactiontime,
pa.months as `months`,
DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) as `expire_time`
FROM
`transactions` as `tr`
INNER JOIN
`packages` as `pa`
ON
tr.productid = pa.productid
WHERE
tr.isprocessed = '1'
ORDER BY
tr.transactiontime ASC
) as `tmp`
) as `tmp2`
WHERE
tmp2.active = 1
Explain:
Method b - reusing calculations:
SELECT
tr.orderid,
tr.transactiontime,
pa.months as `months`,
DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) as `expire_time`,
(NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH)) as `active`
FROM
`transactions` as `tr`
INNER JOIN
`packages` as `pa`
ON
tr.productid = pa.productid
WHERE
tr.isprocessed = '1'
AND
(NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH))
ORDER BY
tr.transactiontime ASC
Explain:
Notice how DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH) is repeated 3 times, and (NOW() < DATE_ADD(tr.transactiontime, INTERVAL pa.months MONTH)) is repeated 2 times.
With the EXPLAINs it seems that method B is much better, but I still dislike the fact that it has to do the same calculation 3 times (I'm assuming it does this, and doesn't save the result and replace all instances itself.).
You should look at MySQL's EXPLAIN command:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
which tells you how MySQL executes the queries.