I have the following query;
SELECT count(*) as aggregate
FROM cars
INNER JOIN snapshots ON cars.snapshot_id=snapshots.id
WHERE cars.snapshot_id=194340
That works correctly - counting all cars with a specific snapshot_id.
But now I want to only count cars who have been there for greater than 1 hour from when the snapshot was created.
I've tried this - but it doesnt work:
SELECT count(*) as aggregate
FROM cars
INNER JOIN snapshots ON cars.snapshot_id=snapshots.id
WHERE cars.snapshot_id=194340
AND time_arrive_destination <= DATE_SUB('snapshots.created_at', INTERVAL 1 HOUR)
I know I'm close - because if I hard code the timestamp from snapshots.created_at - it works:
SELECT count(*) as aggregate
FROM cars
INNER JOIN snapshots ON cars.snapshot_id=snapshots.id
WHERE cars.snapshot_id=194340
AND time_arrive_destination <= DATE_SUB("2015-08-31 20:29:49", INTERVAL 1 HOUR)
So how do I use a joined column field snapshots.created_at as a variable for date_sub()?
If you need (or want) to escape an identifier (e.g. a column name, because it's a reserved word), enclose the identifier in backtick characters, not single quotes, e.g.
AND time_arrive_destination <= DATE_SUB(`snapshots`.`created_at`, INTERVAL 1 HOUR)
Related
I have a table that lists an ID, a month, and a value. I'd like to query this table to find the min(month) where value <= 0.
I'm having trouble writing this in a way that doesn't call the same table multiple times as the table is about 10mm rows.
So far, what I've written uses a HAVING clause to check if the month between min(month) and min(month) + 11 but it isn't functioning correctly. The query returns no data.
select id, month from table
group by id
having month between min(month) and date_add(min(month), interval 11 month)
Is there a clean way to do this without nesting queries and calling the same table multiple times?
You basically need to scan the table twice. Basically, the query is something like this:
select t.*
from t join
(select id, min(yyyymm) as minyyyymm
from t
where val < 0
group by id
) tt
on t.id = tt.id and t.yyyymm >= minyyyymm and
t.yyyymm <= minyyyymm + interval 11 month;
One option for making this faster is to materialize the subquery and add an index on id.
I have the following query that is quite complex and even though I tried to understand how to do using various sources online, all the examples uses simple queries where mine is more complex, and for that, I don't find the solution.
Here's my current query :
SELECT id, category_id, name
FROM orders AS u1
WHERE added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (executed IS NULL OR executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND category_id NOT IN (SELECT category_id
FROM orders AS u2
WHERE executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
GROUP BY category_id)
GROUP BY category_id
ORDER BY added ASC
LIMIT 10;
The table orders is like this:
id
category_id
name
added
executed
The purpose of the query is to list n orders (here, 10) that belong in different categories (I have hundreds of categories), so 10 category_id different. The orders showed here must be older than a minute ago (INTERVAL 60 SECOND) and never executed (IS NULL) or executed more than a month ago.
The NOT IN query is to avoid treating a category_id that has already been treated less than 5 seconds ago. So in the result, I remove all the categories that have been treated less than 5 seconds ago.
I've tried to change the NOT IN in a LEFT JOIN clause or a NOT EXISTS but the switch results in a different set of entries so I believe it's not correct.
Here's what I have so far :
SELECT u1.id, u1.category_id, u1.name, u1.added
FROM orders AS u1
LEFT JOIN orders AS u2
ON u1.category_id = u2.category_id
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
WHERE u1.added < (UTC_TIMESTAMP() - INTERVAL 60 SECOND)
AND (u1.executed IS NULL OR u1.executed < (UTC_DATE() - INTERVAL 1 MONTH))
AND u2.category_id IS NULL
GROUP BY u1.category_id
LIMIT 10
Thank you for your help.
Here's a sample data to try. In that case, there is no "older than 5 seconds" since it's near impossible to get a correct value, but it gives you some data to help out :)
Your query is using a column which doesn't exist in the table as a join condition.
ON u1.domain = u2.category_id
There is no column in your example data called "domain"
Your query is also using the incorrect operator for your 2nd join condition.
AND u2.executed > (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
should be
AND u2.executed < (UTC_TIMESTAMP() - INTERVAL 5 SECOND)
as is used in your first query
DB has 3 columns (thing1, thing2, datetime). What I want to do is pull all the records for thing1 that has more than 1 unique thing2 entry for it.
SELECT thing1,thing2 FROM db WHERE datetime >= DATE_SUB(NOW(), INTERVAL 1 HOUR) GROUP BY thing1 HAVING COUNT(DISTINCT(thing2)) > 1;
Gets me almost what I need but of course the "GROUP BY" makes it so it only returns 1 entry for the thing1 column, but I need all the thing1,thing2 entries.
Any suggestions would be greatly appreciated.
I think you should use group by this way
SELECT thing1,thing2
FROM db WHERE datetime >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
GROUP BY thing1, thing2 HAVING COUNT(*) > 1;
Shamelessly copying Matt S' original answer as a starting point to provide an alternative...
SELECT db.thing1, db.thing2
FROM db
INNER JOIN (
SELECT thing1, MIN(`datetime`) As `datetime`
FROM db
WHERE `datetime` >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
GROUP BY thing1
HAVING COUNT(DISTINCT thing2) > 1
) AS subQ ON db.thing1 = subQ.thing1 AND db.`datetime` >= subQ.`datetime`
;
MySQL is very finicky, performance-wise, when it comes to subqueries in WHERE clauses; this JOIN alternative may perform faster than such a query.
It may also perform faster, than in it's current form, with the MIN removed from the subquery (and the join condition), and a redundant datetime condition on the outer WHERE supplied instead.
Which is best will depend on data, hardware, configuration, etc...
Sidenote: I would caution against using keywords such as datetime as field (or table) names; they tend to bite their user when least expected, and at very least should always be escaped with ` as in the example.
If I'm understanding what you're looking for, you'll want to use your current query as a sub-query:
SELECT thing1, thing2 FROM db WHERE thing1 IN (
SELECT thing1 FROM db
WHERE datetime >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
GROUP BY thing1
HAVING COUNT(DISTINCT thing2) > 1
);
The subquery is already getting the thing1s you want, so this lets you get the original rows back from the table, limited to just those thing1s.
Is that possible to make this query more efficient ?
SELECT DISTINCT(static.template.name)
FROM probedata.probe
INNER JOIN static.template ON probedata.probe.template_fk = static.template.pk
WHERE creation_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
Thanks.
First, I'm going to rewrite it using table aliases, so I can read it:
SELECT DISTINCT(t.name)
FROM probedata.probe p INNER JOIN
static.template t
ON p.template_fk = t.pk
WHERE creation_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH);
Let me make two assumptions:
name is unique in static.template
creation_time comes from probe
The first assumption is particularly useful. You can rewrite the query as:
SELECT t.name
FROM static.template t
WHERE EXISTS (SELECT 1
FROM probedata.probe p
WHERE p.template_fk = t.pk AND
p.creation_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
);
The second assumption only affects the indexing. For this query, you want indexes on probe(template_fk, creation_time).
If template has wide records, then an index on template(pk, name) might also prove useful.
This will change the execution plan to be a scan of the template table with a fast look up using the index into the probe table. There will be no additional processing to remove duplicates.
Could help:
If you use this statement in a script, assign the result of the DATE_SUB(NOW(), INTERVAL 6 MONTH) in a variable before the select statement and use that variable in the where condition (because the functions to calculate last X months would execute just once)
Instead of distinct, try and see if there is an improvement using just the column in the select clause (so no distinct) and add the GROUP BY static.template.name
I'm looking for a function to return the most predominant non numeric value from a table.
My database table records readings from a weatherstation. Many of these are numeric, but wind direction is recorded as one of 16 text values - N,NNE,NE,ENE,E... etc in a varchar field. Records are added every 15 minutes so 95 rows represent a day's weather.
I'm trying to compute the predominant wind direction for the day. Manually you would add together the number of Ns, NNEs, NEs etc and see which there are most of.
Has MySQL got a neat way of doing this?
Thanks
It's difficult to answer your question without seeing your schema, but this should help you.
Assuming the wind directions are stored in the same column as the numeric values you want to ignore, you can use REGEXP to ignore the numeric values, like this:
select generic_string, count(*)
from your_table
where day = '2014-01-01'
and generic_string not regexp '^[0-9]*$'
group by generic_string
order by count(*) desc
limit 1
If wind direction is the only thing stored in the column then it's a little simpler:
select wind_direction, count(*)
from your_table
where day = '2014-01-01'
group by wind_direction
order by count(*) desc
limit 1
You can do this for multiple days using sub-queries. For example (assuming you don't have any data in the future) this query will give you the most common wind direction for each day in the current month:
select this_month.day,
(
select winddir
from weatherdatanum
where thedate >= this_month.day
and thedate < this_month.day + interval 1 day
group by winddir
order by count(*) desc
limit 1
) as daily_leader
from
(
select distinct date(thedate) as day
from weatherdatanum
where thedate >= concat(left(current_date(),7),'-01') - interval 1 month
) this_month
The following query should return you a list of wind directions along with counts sorted by most occurrences:
SELECT wind_dir, COUNT(wind_dir) AS count FROM `mytable` GROUP BY wind_dir ORDER DESC
Hope that helps