Joining two tables by date MySQL - mysql

I have this:
SELECT * FROM history JOIN value WHERE history.the_date >= value.the_date
is it possible to somehow to ask this question like, where history.the_date is bigger then or equal to biggest possible value of value.the_date?
HISTORY
the_date amount
2014-02-27 200
2015-02-26 2000
VALUE
the_date interest
2010-02-10 2
2015-01-01 3
I need to pair the correct interest with the amount!

So value.the_date is the date since when the interest is valid. Interest 2 was valid from 2010-02-10 till 2014-12-31, because since 2015-01-01 the new interest 3 applies.
To get the current interest for a date you'd use a subquery where you select all interest records with a valid-from date up to then and only keep the latest:
select
the_date,
amount,
(
select v.interest
from value v
where v.the_date <= h.the_date
order by v.the_date desc
limit 1
) as interest
from history h;

use join condition after ON not in where clause...
SELECT * FROM history JOIN (select max(value.the_date) as d from value) as x on history.the_date >= x.d
WHERE 1=1

Presumably, you want this:
select h.*
from history h
where h.the_date >= (select max(v.the_date) from value v);

Related

mysql finding the sum of subgroup maximums

If I have the following table in MySQL:
date type amount
2017-12-01 3 2
2018-01-01 1 100
2018-02-01 1 50
2018-03-01 2 2000
2018-04-01 2 4000
2018-05-01 3 2
2018-06-01 3 1
...is there a way to find the sum of the amounts corresponding to the latest dates of each type? There are guaranteed to be no duplicate dates for any given type.
The answer I'd be looking to get from the data above could broken down like this:
The latest date for type 1 is 2018-02-01, where the amount is 50;
The latest date for type 2 is 2018-04-01, where the amount is 4000;
The latest date for type 3 is 2018-06-01, where the amount is 1;
50 + 4000 + 1 = 4051
Is there a way to arrive directly at 4051 in a single query? This is for a Django project using MySQL if that makes a difference; I wasn't able to find an ORM-related solution either, so figured a raw SQL query might be a better place to start.
Thanks!
Not sure for Django but in raw sql you could use a self join to pick latest row for each type based on latest date and then aggregate your results to get the sum of amounts for each type
select sum(a.amount)
from your_table a
left join your_table b on a.type = b.type
and a.date < b.date
where b.type is null
Demo
Or
select sum(a.amount)
from your_table a
join (
select type, max(date) max_date
from your_table
group by type
) b on a.type = b.type
and a.date = b.max_date
Demo
Or by using a correlated subuery
select sum(a.amount)
from your_table a
where a.date = (
select max(date)
from your_table
where type = a.type
)
Demo
For Mysql 8 you can use window functions to get you desired result as
select sum(amount)
from (select *, row_number() over (partition by type order by date desc) as seq
from your_table
) t
where seq = 1;
Demo

Get closest value lower than a specific value and group by

Is there a possibility to get the closest value lower than a specific value with a group function without a join?
date productId stock
2014-12-27 1 10
2014-12-31 1 20
2015-01-05 1 30
2014-12-28 2 10
2015-01-04 2 20
The value is for example the date and should be lower than 2015-01-01 but the highest date value and the result should be ordered by the stock sac, so the result should be:
date productId stock
2014-12-28 2 10
2014-12-31 1 20
Of course, this could be solves with a join, but a join is slower in large tables, isn't it?
You're looking for the last day of 2014, it seems, for each distinct product id.
You do that with
SELECT MAX(date) date, product_id
FROM yourtable
WHERE date < '2015-01-01'
GROUP BY product_id
That gives you a collection of date, product_id. A compound index on (date, product_id) will make this query very efficient to evaluate.
Then you join that to your main table, like so.
SELECT a.*
FROM yourtable AS a
JOIN (
SELECT MAX(date) date, product_id
FROM yourtable
WHERE date < '2015-01-01'
GROUP BY product_id
) AS b USING(date,product_id)
ORDER BY a.product_id, a.date
and that retrieves the detail records for the last item in 2014. The same compound index will accelerate the JOIN.
You're worried about JOIN performance, and that's legitimate. But it can be improved with proper indexing. There really isn't a better way to do it.

mysql : Get latest value and sum of values from previous hour

I would like to return a product together with its latest value and values from last hour.
I have a product-table :
id, name, type (and so on)...
I have a values-table :
id_prod, timestamp, value
Something like :
12:00:00 = 10
12:15:00 = 10
12:30:00 = 10
12:45:00 = 10
13:00:00 = 10
13:15:00 = 10
13:30:00 = 10
I would like a query that returns the latest value (13:30:00) together with the sum of values one hour back. This should return:
time = 13:30:00
latestread = 10
lasthour = 40
What I almost got working was:
SELECT *,
(SELECT value FROM values S WHERE id_prod=P.id
ORDER BY timestamp DESC LIMIT 1) as latestread,
(SELECT sum(value) FROM values WHERE id_prod=D.id and
date_created>SUBTIME(S.date_created,'01:00:00')) as trendread
FROM prod P ORDER BY name
But this fails with "Unknown column 'S.date_created' in 'where clause'"
Any suggestions?
If I understand correctly what you're trying to do, then You would have something like:
SELECT p.id, max(date_created), sum(value), mv.max_value
FROM product p
JOIN values v on p.id = v.product_id
JOIN (SELECT product_id, value as max_value
FROM values v2
WHERE date_created = (SELECT max(date_created) FROM values WHERE product_id=v2.product_id)) mv on product_id=p.id
WHERE date_created between DATE_SUB(now(), INTERVAL 1 HOUR)) and now()
GROUP BY p.id
ORDER BY p.id
Aleks G and mhasan gave solutions, but not the reason why this fails. The reason this fails is because the alias S is not known inside the subquery. Subqueries have no knowledge about the tables outside their scope.
You have missed providing alias for table Values in subquery below
SELECT *,
(SELECT value FROM values S WHERE id_prod=P.id
ORDER BY timestamp DESC LIMIT 1) as latestread,
(SELECT sum(value) FROM values S WHERE id_prod=P.id and
date_created>SUBTIME(S.date_created,'01:00:00')) as trendread
FROM prod P ORDER BY name
I think this is the query that you are trying to write:
SELECT p.*,
(SELECT v.value
FROM values v
WHERE v.id_prod = p.id
ORDER BY v.timestamp DESC
LIMIT 1
) as latestread,
(SELECT sum(v.value)
FROM values v
WHERE v.id_prod = p.id and
v.timestamp > SUBTIME(now(), '01:00:00')
) as trendread
FROM prod p
ORDER BY p.name;
This changes all the aliases to be abbreviations for the table name. It also fixes the expression for the last hour by using now() and gets rid of date_created which doesn't seem to be in either table based on the question. The query conveniently assumes that timestamp is a datetime. If it is a unix timestamp, then somewhat different time logic is necessary.
This should be reasonably efficient with an index on values(id_prod, timestamp, value).

mysql moving average of N rows

I have a simple MySQL table like below, used to compute MPG for a car.
+-------------+-------+---------+
| DATE | MILES | GALLONS |
+-------------+-------+---------+
| JAN 25 1993 | 20.0 | 3.00 |
| FEB 07 1993 | 55.2 | 7.22 |
| MAR 11 1993 | 44.1 | 6.28 |
+-------------+-------+---------+
I can easily compute the Miles Per Gallon (MPG) for the car using a select statement, but because the MPG varies widely from fillup to fillup (i.e. you don't fill the exact same amount of gas each time), I would like to computer a 'MOVING AVERAGE' as well. So for any row the MPG is MILES/GALLON for that row, and the MOVINGMPG is the SUM(MILES)/SUM(GALLONS) for the last N rows. If less than N rows exist by that point, just SUM(MILES)/SUM(GALLONS) up to that point.
Is there a single SELECT statement that will fetch the rows with MPG and MOVINGMPG by substituting N into the select statement?
Yes, it's possible to return the specified resultset with a single SQL statement.
Unfortunately, MySQL does not support analytic functions, which would make for a fairly simple statement. Even though MySQL does not have syntax to support them, it is possible to emulate some analytic functions using MySQL user variables.
One of the ways to achieve the specified result set (with a single SQL statement) is to use a JOIN operation, using a unique ascending integer value (rownum, derived by and assigned within the query) to each row.
For example:
SELECT q.rownum AS rownum
, q.date AS latest_date
, q.miles/q.gallons AS latest_mpg
, COUNT(1) AS cnt_rows
, MIN(r.date) AS earliest_date
, SUM(r.miles) AS rtot_miles
, SUM(r.gallons) AS rtot_gallons
, SUM(r.miles)/SUM(r.gallons) AS rtot_mpg
FROM ( SELECT #s_rownum := #s_rownum + 1 AS rownum
, s.date
, s.miles
, s.gallons
FROM mytable s
JOIN (SELECT #s_rownum := 0) c
ORDER BY s.date
) q
JOIN ( SELECT #t_rownum := #t_rownum + 1 AS rownum
, t.date
, t.miles
, t.gallons
FROM mytable t
JOIN (SELECT #t_rownum := 0) d
ORDER BY t.date
) r
ON r.rownum <= q.rownum
AND r.rownum > q.rownum - 2
GROUP BY q.rownum
Your desired value of "n" to specify how many rows to include in each rollup row is specified in the predicate just before the GROUP BY clause. In this example, up to "2" rows in each running total row.
If you specify a value of 1, you will get (basically) the original table returned.
To eliminate any "incomplete" running total rows (consisting of fewer than "n" rows), that value of "n" would need to be specified again, adding:
HAVING COUNT(1) >= 2
sqlfiddle demo: http://sqlfiddle.com/#!2/52420/2
Followup:
Q: I'm trying to understand your SQL statement. Does your solution do a select of twenty rows for each row in the db? In other words, if I have 1000 rows will your statement perform 20000 selects? (I'm worried about performance)...
A: You are right to be concerned with performance.
To answer your question, no, this does not perform 20,000 selects for 1,000 rows.
The performance hit comes from the two (essentially identical) inline views (aliased as q and r). What MySQL does with these (basically) is create temporary MyISAM tables (MySQL calls them "derived tables"), which are basically copies of mytable, with an extra column, each row assigned a unique integer value from 1 to the number of rows.
Once the two "derived" tables are created and populated, MySQL runs the outer query, using those two "derived" tables as a row source. Each row from q, is matched with up to n rows from r, to calculate the "running total" miles and gallons.
For better performance, you could use a column already in the table, rather than having the query assign unique integer values. For example, if the date column is unique, then you could calculate "running total" over a certain period of days.
SELECT q.date AS latest_date
, SUM(q.miles)/SUM(q.gallons) AS latest_mpg
, COUNT(1) AS cnt_rows
, MIN(r.date) AS earliest_date
, SUM(r.miles) AS rtot_miles
, SUM(r.gallons) AS rtot_gallons
, SUM(r.miles)/SUM(r.gallons) AS rtot_mpg
FROM mytable q
JOIN mytable r
ON r.date <= q.date
AND r.date > q.date + INTERVAL -30 DAY
GROUP BY q.date
(For performance, you would want an appropriate index defined with date as a leading column in the index.)
For the first query, any predicates included (in the inline view definition queries) to reduce the number of rows returned (for example, return only date values in the past year) would reduce the number of rows to be processed, and would also likely improve performance.
Again, to your question about running 20,000 selects for 1,000 rows... a nested loops operation is another way to get the same result set. For a large number of rows, this can exhibit slower performance. (On the other hand, this approach can be fairly efficient, when only a few rows are being returned:
SELECT q.date AS latest_date
, q.miles/q.gallons AS latest_mpg
, ( SELECT SUM(r.miles)/SUM(r.gallons)
FROM mytable r
WHERE r.date <= q.date
AND r.date >= q.date + INTERVAL -90 DAY
) AS rtot_mpg
FROM mytable q
ORDER BY q.date
Something like this should work:
SELECT Date, Miles, Gallons, Miles/Gallons as MilesPerGallon,
#Miles:=#Miles+Miles overallMiles,
#Gallons:=#Gallons+Gallons overallGallons,
#RunningTotal:=#Miles/#Gallons runningTotal
FROM YourTable
JOIN (SELECT #Miles:= 0) t
JOIN (SELECT #Gallons:= 0) s
SQL Fiddle Demo
Which produces the following:
DATE MILES GALLONS MILESPERGALLON RUNNINGTOTAL
January, 25 1993 20 3 6.666667 6.666666666667
February, 07 1993 55.2 7.22 7.645429 7.358121330724
March, 11 1993 44.1 6.28 7.022293 7.230303030303
--EDIT--
In response to the comment, you can add another Row Number to limit your results to the last N rows:
SELECT *
FROM (
SELECT Date, Miles, Gallons, Miles/Gallons as MilesPerGallon,
#Miles:=#Miles+Miles overallmiles,
#Gallons:=#Gallons+Gallons overallGallons,
#RunningTotal:=#Miles/#Gallons runningTotal,
#RowNumber:=#RowNumber+1 rowNumber
FROM (SELECT * FROM YourTable ORDER BY Date DESC) u
JOIN (SELECT #Miles:= 0) t
JOIN (SELECT #Gallons:= 0) s
JOIN (SELECT #RowNumber:= 0) r
) t
WHERE rowNumber <= 3
Just change your ORDER BY clause accordingly. And here is the updated fiddle.

How to use query results in another query?

I am trying to write a query which will give me the last entry of each month in a table called transactions. I believe I am halfway there as I have the following query which groups all the entries by month then selects the highest id in each group which is the last entry for each month.
SELECT max(id),
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm
Gives the correct results
id yyyymm
100 201006
105 201007
111 201008
118 201009
120 201010
I don’t know how to then run a query on the same table but select the balance column where it matches the id from the first query to give results
id balance date
120 10000 2010-10-08
118 11000 2010-09-29
I've tried subqueries and looked at joins but i'm not sure how to go about using them.
You can make your first select an inline view, and then join to it. Something like this (not tested, but should give you the idea):
SELECT x.id
, t.balance
, t.date
FROM your_table t
/* here, we make your select an inline view, then we can join to it */
, (SELECT max(id) id,
EXTRACT(YEAR_MONTH FROM date) as yyyymm
FROM transactions
GROUP BY yyyymm) x
WHERE t.id = x.id