MySql JOIN on most recent start_date? - mysql

I have two tables, one with transactions (with date). The other with a percentage and date the percentage it went into effect (assume 00:00:00). The percentage remains in effect until a new percent goes into effect. I need to join on the percentage that was in effect when the transaction happened.
transactions_table
event_date amount
2011-01-01 230
2011-02-18 194
2011-03-22 56
2011-04-30 874
percent_table
effective percent
2010-12-30 15
2011-03-05 25
2011-04-12 30
The result I'm looking for is:
event_date amount percent
2011-01-01 230 15
2011-02-18 194 15
2011-03-22 56 25
2011-04-30 874 30
I've tried:
SELECT t.event_date, t.amount, p.percent
FROM transactions_table AS t
LEFT JOIN percent_table AS p ON t.event_date >= p.effective
ORDER BY `t`.`event_date` DESC LIMIT 0 , 30;
That gives me, seemingly random percentages. It seems to me like I need to get the greatest date >= p.effective, not just any random date >= p.effective.
I tried:
SELECT t.event_date, p.percent
FROM bedic_sixsummits_transactions AS t
LEFT JOIN bedic_sixsummits_percent AS p ON MAX(t.event_date >= p.effective)
ORDER BY `t`.`event_date` DESC LIMIT 0 , 30
but MySQL just laughed at my feeble attempt.
How can I do this?

SELECT t.event_date, t.amount, p.percent
FROM bedic_sixsummits_transactions AS t
LEFT JOIN bedic_sixsummits_percent AS p
ON p.effective =
( SELECT MAX( p2.effective ) FROM bedic_sixsummits_percent AS p2
WHERE p2.effective <= t.event_date
)
ORDER BY t.event_date DESC LIMIT 0 , 30

Even more simpler and with no subquery:
SELECT event_date, amount, MAX(_percent) as _percent
FROM transactions_table
LEFT JOIN percent_table p1 ON event_date >= effective
GROUP BY event_date, amount
ORDER BY event_date;
http://sqlfiddle.com/#!3/e8ca3/17/0
Note that it is possible because of the business model involved. If you wan't to retrieve other fields of the percent_table it won't be appropriate anymore :/

Related

MySQL : Selecting the rows with the highest group by count

I have a table with records that are updated every minute with a decimal value (10,2). To ignore measure errors I want to have the number that has been inserted the most.
Therefor I tried:
SELECT date_time,max(sensor1),count(ID)
FROM `weigh_data
group by day(date_time),sensor1
This way I get the number of records
Datetime sensor1 count(ID)
2020-03-19 11:49:12 33.22 3
2020-03-19 11:37:47 33.36 10
2020-03-20 07:32:02 32.54 489
2020-03-20 00:00:43 32.56 891
2020-03-20 14:20:51 32.67 5
2020-03-21 07:54:16 32.50 1
2020-03-21 00:00:58 32.54 1373
2020-03-21 01:15:16 32.56 9
2020-03-22 08:35:12 32.52 2
2020-03-22 00:00:40 32.54 575
2020-03-22 06:50:54 32.58 1
What I actually want is for each day one row which has the highest count(ID)
Anyone can help me out on this?
With newer MySQL (8.0 and later) you can use the RANK window function to rank the rows according to the count.
Note that this will return all "ties" which means if there are 100 readings of X and 100 readings of Y (and 100 is the max), both X and Y will be returned.
WITH cte AS (
SELECT
DATE(date_time), sensor1,
RANK() OVER (PARTITION BY DATE(date_time) ORDER BY COUNT(*) DESC) rnk
FROM `weigh_data` GROUP BY DATE(date_time), sensor1
)
SELECT * FROM cte WHERE rnk=1
If you just want to pick one (non deterministic) of the ties, you can instead use ROW_NUMBER in place of RANK
A DBfiddle to test with.
Here is a solution based on a correlated subquery, that works in all versions of MySQL:
select w.*
from weigh_data w
where w.datetime = (
select w1.datetime
from weigh_data w1
where w1.datetime >= date(w.datetime) and w1.datetime < date(w.datetime) + interval 1 day
order by sensor1 desc
limit 1
)
Just like the window function solution using rank(), this allows top ties.
For performance, you want an index on (datetime, sensor1).

Selecting from 2 tables with possibly corresponding dates

I am looking for the correct query for my mysql db that has 2 seperate tables for lengths and weights.
I want to have the result returned as 1 query with 3 columns: datetime, length and weight.
The query should also allow to specify the user.
Eg.:
Table heights:
id user_id created_on height
1 2 2019-01-01 00:00:01 180
2 2 2019-01-02 00:00:01 181
3 3 2019-01-03 00:00:01 182
4 3 2019-01-04 00:00:01 183
5 2 2019-01-07 00:00:01 184
Table weights:
id user_id created_on weight
1 2 2019-01-01 00:00:01 80
2 2 2019-01-04 00:00:01 81
3 3 2019-01-05 00:00:01 82
4 3 2019-01-06 00:00:01 83
5 2 2019-01-07 00:00:01 84
I am looking to get the following result with a single query:
user_id created_on weight height
2 2019-01-01 00:00:01 80 180
2 2019-01-02 00:00:01 null 181
2 2019-01-04 00:00:01 81 null
2 2019-01-07 00:00:01 84 184
I have tried working with JOIN statements but fail to get the required result.
This join statement
SELECT w.* , h.* FROM weight w
JOIN height h
ON w.created_on=h.created_on
AND w.user_id=h.user_id AND user_id=2
will return only those results that have both a height and weight item for user_id and created_on
A full outer join would do the trick, however this is not supported by mysql.
The following query seems to be returning the required result, however it is very slow:
SELECT r.* FROM
(SELECT w.user_id as w_user, w.created_on as weightdate, w.value as weight, h.created_on as heightdate ,h.user_id as h_user, h.value as height FROM weight w
LEFT JOIN height h ON w.user_id = h.user_id
AND w.created_on=h.created_on
UNION
SELECT w.user_id as w_user, w.created_on as weightdate, w.value as weight, h.created_on as heightdate ,h.user_id as h_user, h.value as height FROM weight w
RIGHT JOIN height h ON w.user_id = h.user_id
AND w.created_on=h.created_on ) r
WHERE h_user=2 OR w_user =2
The query takes more than 3 seconds if the 2 tables have around 3000 entries.
Is there a way to speed this up, possibly using a different approach?
For extra bonus points: is it possible to allow for a small time discrepancy between both created_on datetimes? (eg. 10 minutes or within the same hour). Eg. if column weight has an entry for 2019-01-01 00:00:00 and table height has an entry for height at 2019-01-01 00:04:00 they appear in the same row.
Instead of using a calendar table to select dates of interest, you can use a UNION to select all the distinct dates from the heights and weights tables. To deal with matching times within an hour of each other, you can compare the times using TIMESTAMPDIFF and truncate the created_on time to the hour. Since this might create duplicate entries, we add the DISTINCT qualifier to the query:
SELECT DISTINCT COALESCE(h.user_id, w.user_id) AS user_id,
DATE_FORMAT(COALESCE(h.created_on, w.created_on), '%y-%m-%d %H:00:00') AS created_on,
w.weight,
h.height
FROM (SELECT created_on FROM heights
UNION
SELECT created_on FROM weights) d
LEFT JOIN heights h ON ABS(TIMESTAMPDIFF(HOUR, h.created_on, d.created_on)) = 0 AND h.user_id = 2
LEFT JOIN weights w ON ABS(TIMESTAMPDIFF(HOUR, w.created_on, d.created_on)) = 0 AND w.user_id = 2
WHERE h.user_id IS NOT NULL OR w.user_id IS NOT NULL
ORDER BY created_on
Output (from my demo, where I've modified your times to allow for matching within the hour):
user_id created_on weight height
2 19-01-01 01:00:00 80 180
2 19-01-02 00:00:00 181
2 19-01-04 04:00:00 81
2 19-01-07 06:00:00 84 184
Demo on dbfiddle
This is probably best handled using a calendar table, containing all dates of interest for the query. We can start the query with the calendar table, then left join to the heights and weights tables:
SELECT
COALESCE(h.user_id, w.user_id) AS user_id,
d.dt AS created_on,
w.weight,
h.height
FROM
(
SELECT '2019-01-01 00:00:01' AS dt UNION ALL
SELECT '2019-01-02 00:00:01' UNION ALL
SELECT '2019-01-03 00:00:01' UNION ALL
SELECT '2019-01-04 00:00:01' UNION ALL
SELECT '2019-01-05 00:00:01' UNION ALL
SELECT '2019-01-06 00:00:01' UNION ALL
SELECT '2019-01-07 00:00:01'
) d
LEFT JOIN heights h
ON d.dt = h.created_on AND h.user_id = 2
LEFT JOIN weights w
ON d.dt = w.created_on AND w.user_id = 2
WHERE
h.user_id IS NOT NULL OR w.user_id IS NOT NULL
ORDER BY
d.dt;
Demo

MySQL: Greatest n per group with joins and conditions

Table Structure
I have a table similar to the following:
venues
The following table describes a list of businesses
id name
50 Nando's
60 KFC
rewards
The table describes a number of rewards, the venue it corresponds to and the number of points needed to redeem the reward.
id venue_id name points
1 50 5% off 10
2 50 10% off 20
3 50 11% off 30
4 50 15% off 40
5 50 20% off 50
6 50 30% off 50
7 60 30% off 70
8 60 60% off 100
9 60 65% off 120
10 60 70% off 130
11 60 80% off 140
points_data
The table describes the number of points remaining a user has for each venue.
venue_id points_remaining
50 30
60 90
Note that this query is actually computed within SQL like so:
select * from (
select venue_id, (total_points - points_redeemed) as points_remaining
from (
select venue_id, sum(total_points) as total_points, sum(points_redeemed) as points_redeemed
from (
(
select venue_id, sum(points) as total_points, 0 as points_redeemed
from check_ins
group by venue_id
)
UNION
(
select venue_id, 0 as total_points, sum(points) as points_redeemed
from reward_redemptions rr
join rewards r on rr.reward_id = r.id
group by venue_id
)
) a
group by venue_id
) b
GROUP BY venue_id
) points_data
but for this question you can probably just ignore that massive query and assume the table is just called points_data.
Desired Output
I want to get a single query that gets:
The top 2 rewards the user is eligible for each venue
The lowest 2 rewards the user is not yet eligible for for each venue
So for the above data, the output would be:
id venue_id name points
2 50 10% off 20
3 50 11% off 30
4 50 15% off 40
5 50 20% off 50
7 60 30% off 70
8 60 60% off 100
9 60 65% off 120
What I got so far
The best solution I found so far is first getting the points_data, and then using code (i.e. PHP) to dynamically write the following:
(
select * from rewards
where venue_id = 50
and points > 30
ORDER BY points desc
LIMIT 2
)
union all
(
select * from rewards
where venue_id = 50
and points <= 30
ORDER BY points desc
LIMIT 2
)
UNION ALL
(
select * from rewards
where venue_id = 60
and points <= 90
ORDER BY points desc
LIMIT 2
)
UNION ALL
(
select * from rewards
where venue_id = 60
and points > 90
ORDER BY points desc
LIMIT 2
)
ORDER BY venue_id, points asc;
However, I feel the query can get a bit too long and in-efficient. For example, if a user has points in 400 venues, that is 800 sub-queries.
I tried also doing a join like so, but can't really get better than:
select * from points_data
INNER JOIN rewards on rewards.venue_id = points_data.venue_id
where points > points_remaining;
which is far from what I want.
Correlated subqueries counting the number of higher or lower rewards to determine the top or bottom entries are one way.
SELECT r1.*
FROM rewards r1
INNER JOIN points_data pd1
ON pd1.venue_id = r1.venue_id
WHERE r1.points <= pd1.points_remaining
AND (SELECT count(*)
FROM rewards r2
WHERE r2.venue_id = r1.venue_id
AND r2.points <= pd1.points_remaining
AND (r2.points > r1.points
OR r2.points = r1.points
AND r2.id > r1.id)) < 2
OR r1.points > pd1.points_remaining
AND (SELECT count(*)
FROM rewards r2
WHERE r2.venue_id = r1.venue_id
AND r2.points > pd1.points_remaining
AND (r2.points < r1.points
OR r2.points = r1.points
AND r2.id < r1.id)) < 2
ORDER BY r1.venue_id,
r1.points;
SQL Fiddle
Since MySQL 8.0 a solution using the row_number() window function would be an alternative. But I suppose you are on a lower version.
SELECT x.id,
x.venue_id,
x.name,
x.points
FROM (SELECT r.id,
r.venue_id,
r.name,
r.points,
pd.points_remaining,
row_number() OVER (PARTITION BY r.venue_id,
r.points <= pd.points_remaining
ORDER BY r.points DESC) rntop,
row_number() OVER (PARTITION BY r.venue_id,
r.points > pd.points_remaining
ORDER BY r.points ASC) rnbottom
FROM rewards r
INNER JOIN points_data pd
ON pd.venue_id = r.venue_id) x
WHERE x.points <= x.points_remaining
AND x.rntop <= 2
OR x.points > x.points_remaining
AND x.rnbottom <= 2
ORDER BY x.venue_id,
x.points;
db<>fiddle
The tricky part is here to partition the set also into the subset where the points of the user are enough to redeem the reward and the one where the points aren't enough, per venue. But as in MySQL logical expressions evaluate to 0 or 1 (in non Boolean context), the respective expressions can be used for that.

How to determine a query for a specific time interval in MySQL?

I'm running some crontabs which trigger R-Scripts where I load Google Analytics Data for a specific time interval. Usually its the interval:
Today - 1 to Today - 14 days which corresponds to the following statement:
subset(mydata, date >= Sys.Date()-14 & date <= Sys.Date()-1)
I would like to add some MySQL-Query to that R-Scriptin order to get some data, which uses the same time interval. My tables have the following form:
`pictures` `music` `likes`
id date_of_upload id pictures_id id pictures_id
1 2012-01-16 50 1283 287 12
2 2012-02-17 25 736 2366 39
... ... ... ... ... ...
6000 2016-01-23
My query has the following form where I would like to meet the upper time interval:
SELECT
COUNT(p.id) AS pictures,
COUNT(m.id) AS songs,
COUNT(l.id) AS likes,
CAST(p.date_of_upload AS DATE) AS Posted
FROM pictures p
LEFT JOIN
music m ON p.id = m.pictures_id
LEFT JOIN
likes l ON p.id = l.pictures_id
WHERE p.date_of_upload > DATE_ADD(CURRENT_DATE(), INTERVAL - 14 DAY)
But that doesn't seem to be the right implementation for the time interval.
The required output may look as following:
posted songs likes picture
2016-01-23 20 30 3
2016-01-22 10 8 1
2016-01-21
...
2016-01-07
I think the simplest solution is to use COUNT(DISTINCT):
SELECT COUNT(DISTINCT p.id) AS pictures,
COUNT(DISTINCT m.id) AS songs,
COUNT(DISTINCT l.id) AS likes,
CAST(p.date_of_upload AS DATE) AS Posted
FROM pictures p LEFT JOIN
music m
ON p.id = m.pictures_id LEFT JOIN
likes l
ON p.id = l.pictures_id
WHERE p.date_of_upload > DATE_ADD(CURRENT_DATE(), INTERVAL - 14 DAY)
The problem is probably that you are getting Cartesian products between the two tables -- a separate row for each combination of pictures, music, and likes.
COUNT(DISTINCT) is the easiest way, but if you have large values, then it is inefficient.

Missing records from one table in SQL Server 2008R2

Table 1:
Date PlacementID CampaignID Impressions
04/01/2014 100 10 1000
04/01/2014 101 10 1500
04/01/2014 100 11 500
Table 2:
Date PlacementID CampaignID Cost
04/01/2014 100 10 5000
04/01/2014 101 10 6000
04/01/2014 100 11 7000
04/01/2014 103 10 8000
When I have joined this table using Full Join and Left Join statement, I am not able to get uncommon record which is last row in table2 that display PlacementID 103 and campaignID 10 and Cost 8000. However I have searched all raw data and file but this missing records are not common between two sources. However, I want to include this records in final table. How can I do that? This two table are two different source and I have got results only common records.
Moreover, when I found out that missing value is exact value that are required in final figure so want to include every thing. I am including my SQL script below:
SELECT A.palcementid,
A.campaignid,
A.date,
Sum(A.impressions) AS Impressions,
Sum(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.placementid = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT Count(A.placementid) AS Placement_Count,
placementid. campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.placementid = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY A.placementid,
A.campaignid,
A.date
I am dividing Cost by placement because in source the cost was allocated for one placement only and one time so I have to divide those because in actual table the same Placementid repeat more than 1 times on same date.
As you didn't provide any expected output I guessing here but if the result you want is this:
PlacementID CampaignID Date Impressions Cost
----------- ----------- ----------------------- ----------- -----------
100 10 2014-04-01 02:00:00.000 1000 5000
100 11 2014-04-01 02:00:00.000 500 7000
101 10 2014-04-01 02:00:00.000 1500 6000
103 10 2014-04-01 02:00:00.000 NULL 8000
Then the following query should do it:
SELECT COALESCE(A.PlacementID,b.placementid) AS PlacementID,
COALESCE(A.campaignid, b.campaignid) AS CampaignID,
COALESCE(A.date, b.date) AS [Date],
SUM(A.impressions) AS Impressions,
SUM(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END ) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.[PlacementID] = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT COUNT(PlacementID) AS Placement_Count,
placementid, campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.[PlacementID] = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY COALESCE(A.PlacementID, B.PlacementID),
COALESCE(A.campaignid, b.campaignid),
COALESCE(A.date, b.date)
Sample SQL Fiddle