Big Query: Join single latest row from second table - mysql

I have two tables. One is a list of Orders, and one is a list of Events.
For each Order, I want to join the single last Event that happened (using clicked_at) before the created_at of the Order.
I have tried numerous ways to get this to work and tried several other answers on Stack Overflow but I am struggling to return the correct data.
The sudo logic for the subquery in my mind is something like:
SELECT campaign, user_id, created_at
FROM `Events`
WHERE order.user_id = user_id AND clicked_at < order.created_at
ORDER created_at DESC
LIMIT 1
Please see the example data below:
# Orders
| order_id | user_id | created_at |
-----------------------------------
| 123 | abc | 2020-07-04 |
| 456 | abc | 2020-05-01 |
# Events
| campaign | keyword | user_id | clicked_at |
----------------------------------------------
| facebook | shoes | abc | 2020-07-03 |
| google | hair | abc | 2020-07-01 |
My desired result
# Orders with campaign attribution
| order_id | user_id | created_at | campaign | keyword |
---------------------------------------------------------
| 123 | abc | 2020-07-04 | facebook | shoes |
| 456 | abc | 2020-06-04 | null | null |
Thanks!
Alex

Below is for BigQuery Standard SQL
#standardSQL
SELECT a.*, campaign, keyword
FROM `project.dataset.orders` a
LEFT JOIN (
SELECT
ANY_VALUE(o).*,
ARRAY_AGG(STRUCT(campaign, keyword) ORDER BY clicked_at DESC LIMIT 1)[OFFSET(0)].*
FROM `project.dataset.orders` o
JOIN `project.dataset.events` e
ON o.user_id = e.user_id
AND clicked_at < created_at
GROUP BY FORMAT('%t', o)
)
USING(order_id)
if applied to sample data from our question - result is
Row order_id user_id created_at campaign keyword
1 123 abc 2020-07-04 facebook shoes
2 456 abc 2020-05-01 null null

with orders as (
select 123 as order_id, 'abc' as user_id, cast('2020-07-04' as date) as created_at union all
select 456, 'abc', '2020-05-01'
),
events as (
select 'facebook' as campaign, 'shoes' as keyword, 'abc' as user_id, cast('2020-07-03' as date) as clicked_at union all
select 'google', 'hair', 'abc', '2020-07-01'
),
logic as (
select
orders.order_id,
orders.user_id,
orders.created_at,
events.clicked_at,
events.campaign,
events.keyword,
row_number() over (partition by orders.order_id order by events.clicked_at desc) as rn
from orders
left join events
on orders.user_id = events.user_id and events.clicked_at < orders.created_at
)
select * except(rn)
from logic
where rn = 1

Related

Joining 2 SQL SELECT result into one query

I wanted to know if there's a way to join two or more result sets into one.
i have the following two queries
First query:
SELECT
CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)) as day_month_year,
db.country.country ,
count(concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on))) as count ,
COUNT(DISTINCT db.prod_id.email) AS MAIL
from db.prod_id
left join db.country on db.prod_id.branch_id = db.country.id
where db.prod_id.created_on > '2020-11-17' and (db.country.type = 1 or db.country.type = 2)
group by
concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)),
db.country.country
order by db.prod_id.created_on
The second query:
select
CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)) as day_month_year,
db.country.country,
count(CONCAT(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on))) as count_BUY
from db.prod_id
left join db.prod_evaluations on db.prod_id.id = db.prod_evaluations.id
left join db.country on db.prod_id.branch_id = db.country.id
left join (Select prod_properties.prod_id, prod_properties.value From prod_properties Where prod_properties.property_id = 5) as db3 on db3.prod_id = db.prod_id.id
where db.prod_id.created_on > '2020-11-17'
and db3.value = 'online-buy' and db.prod_id.status_id <> 25
group by
concat(day(db.prod_id.created_on),"-",month(db.prod_id.created_on),"-",year(db.prod_id.created_on)),
db.country.country
order by db.prod_id.created_on
The first query give the following result:
+------------+---------+-------+------+
| day | Country | Count | Mail |
+------------+---------+-------+------+
| 17-11-2020 | IT | 200 | 100 |
| 17-11-2020 | US | 250 | 100 |
| 18-11-2020 | IT | 350 | 300 |
| 18-11-2020 | US | 200 | 100 |
+------------+---------+-------+------+
The second query give:
+------------+---------+-----------+
| day | Country | Count_BUY |
+------------+---------+-----------+
| 17-11-2020 | IT | 50 |
| 17-11-2020 | US | 70 |
| 18-11-2020 | IT | 200 |
| 18-11-2020 | US | 50 |
+------------+---------+-----------+
Now i want to merge these two result in one:
+------------+---------+-------+------+-----------+
| day | Country | Count | Mail | Count_BUY |
+------------+---------+-------+------+-----------+
| 17-11-2020 | IT | 200 | 100 | 50 |
| 17-11-2020 | US | 250 | 100 | 70 |
| 18-11-2020 | IT | 350 | 300 | 200 |
| 18-11-2020 | US | 200 | 100 | 50 |
+------------+---------+-------+------+-----------+
How can i perform this query?
I'm using mysql
Thanks
The simple way: You can join queries.
select *
from ( <your first query here> ) first_query
join ( <your second query here> ) second_query using (day_month_year, country)
order by day_month_year, country;
This is an inner join. You can also outer join of course. MySQL doesn't support full outer joins, though. If you want that, you'll have to look up how to emulate a full outer join in MySQL.
The hard way ;-) Merge the queries.
If I am not mistaken, your two queries can be reduced to
select
date(created_on),
branch_id as country,
count(*) as count_products,
count(distinct p.email) as count_emails
from db.prod_id
where created_on >= date '2020-11-17'
and branch_id in (select country from db.country where type in (1, 2))
group by date(created_on), branch_id
order by date(created_on), branch_id;
and
select
date(created_on),
branch_id as country,
count(*) as count_buy
from db.prod_id
where created_on >= date '2020-11-17'
and status_id <> 25
and prod_id in (select prod_id from prod_properties where property_id = 5 and status_id <> 25)
group by date(created_on), branch_id
order by date(created_on), branch_id;
The two combined should be
select
date(created_on),
branch_id as country,
sum(branch_id in (select country from db.country where type in (1, 2)) as count_products,
count(distinct case when branch_id in (select country from db.country where type in (1, 2) then p.email end) as count_emails,
sum(status_id <> 25 and prod_id in (select prod_id from prod_properties where property_id = 5 and status_id <> 25)) as count_buy
from db.prod_id
where created_on >= date '2020-11-17'
group by date(created_on), branch_id
order by date(created_on), branch_id;
You see, the conditions the queries have in common remain in the where clause and the other conditions go inside the aggregation functions.
sum(boolean) is short for sum(case when boolean then 1 else 0 end), i.e. this counts the rows where the condition is met in MySQL.

Subquery select with outer value for inner where clause

Users can have multiple records in the subscriptions table.
What I want to do is return their first name, last name, email, start date (of their first subscription, select start_date from subscriptions order by start_date asc limit 1, but I need it for that specific user)
// users
id
first_name
last_name
email
// subscriptions
id
email
start_date (TIMESTAMP)
end_date (TIMESTAMP)
status
I thought this would work, but it does not seem to:
select
distinct(users.email), status, first_name, last_name,
(select start_date from subscriptions where subscriptions.email = users.email order by start_date asc limit 1) as start_date
from
subscriptions sub
join
users u on sub.email = u.email
order by
sub.end_date desc
That returns the same start_date for everyone, since it's probably pulling the first one it matches.
SQL fiddle with the schema: http://sqlfiddle.com/#!9/245c05/5
This query:
select s.*
from subscriptions s
where s.start_date = (select min(start_date) from subscriptions where email = s.email)
returns the row for each user's first subscription.
Join it to users:
select u.*, t.status, t.start_date
from users u
left join (
select s.*
from subscriptions s
where s.start_date = (select min(start_date) from subscriptions where email = s.email)
) t on t.email = u.email
See the demo.
Results:
| id | email | first_name | last_name | status | start_date |
| --- | -------------- | ---------- | --------- | -------- | ------------------- |
| 1 | john#aol.com | John | Smith | active | 2018-02-12 23:34:02 |
| 2 | jim#aol.com | Jim | Smith | canceled | 2016-03-02 23:34:02 |
| 3 | jerry#aol.com | Jerry | Smith | active | 2017-12-12 23:34:02 |
| 4 | jackie#aol.com | Jackie | Smith | active | 2018-05-22 23:34:02 |

Select top most non-duplicated entry after ordering by other columns [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 5 years ago.
I would like to select the "top most" entry for each row with a duplicated column value.
Performing the following query -
SELECT *
FROM shop
ORDER BY shop.start_date DESC, shop.created_date DESC;
I get the result set -
+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1 | 1 | 2017-02-01 | 2017-01-01 |
| 2 | 1 | 2017-01-01 | 2017-02-01 |
| 3 | 2 | 2017-01-01 | 2017-07-01 |
| 4 | 2 | 2017-01-01 | 2017-01-01 |
+--------+---------+------------+--------------+
Can I modify the SELECT so that I only get back the "top rows" for each unique shop_id -- in this case, row_ids 1 and 3. There can be 1..n number of rows with the same shop_id.
Similarly, if my query above returned the following order, I'd want to only SELECT row_ids 1 and 4 since those would be the "top most" entries each shop_id.
+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1 | 1 | 2017-02-01 | 2017-01-01 |
| 2 | 1 | 2017-01-01 | 2017-02-01 |
| 4 | 2 | 2017-01-01 | 2017-07-01 |
| 3 | 2 | 2017-01-01 | 2017-01-01 |
+--------+---------+------------+--------------+
You can do this by using a subquery:
select s.*
from shop s
where s.row_id = (
select row_id
from shop
where shop_id = s.shop_id
order by start_date desc, created_date desc
limit 1
)
Mind the assumption of row_id being uniq for each shop_id in this query example.
Demonstration
Or like this:
select t.*
from shop t
join (
select t2.shop_id, t2.start_date, max(t2.created_date) as created_date
from shop t2
join (
select max(start_date) as start_date, shop_id
from shop
group by shop_id
) t3 on t3.shop_id = t2.shop_id and t3.start_date = t2.start_date
group by t2.shop_id, t2.start_date
) t1 on t1.shop_id = t.shop_id and t.start_date = t1.start_date and t.created_date = t1.created_date
Mind that in case there can be records with the same start_date and created_date for the same shop_id you would need to use another group by s.shop_id, s.start_date, s.created_date in the outer query (adding min(row_id) with other columns listed in the group by in select)
Demonstration
Try joining to a subquery which finds the "top" rows for each shop_id:
SELECT t1.*
FROM shop t1
INNER JOIN
(
SELECT shop_id, MIN(row_id) AS min_id
FROM shop
GROUP BY shop_id
) t2
ON t1.shop_id = t2.shop_id AND
t1.row_id = t2.min_id
ORDER BY
t1.start_date DESC,
t1.created_date DESC;
Demo

Get the duplicate entries from table mysql

I have the table structure as shown below. The database is MariaDB.
+-----------+----------+--------------+-----------------+
| id_object | name | value_double | value_timestamp |
+-----------+----------+--------------+-----------------+
| 1 | price | 1589 | null |
| 1 | payment | 1590 | null |
| 1 | date | null | 2012-04-17 |
| 2 | price | 1589 | null |
| 2 | payment | 1590 | null |
| 2 | date | null | 2012-04-17 |
| 3 | price | 1589 | null |
| 3 | payment | 1590 | null |
| 3 | date | null | 2012-09-25 |
| ... | ... | ... | .. |
+-----------+----------+--------------+-----------------+
1) I need to get the duplicates by three entries: price & payment & date;
For example: the record with id_object=2 is duplicate because price, payment and date are the same as values of the record with id_object=1. Record with id_object = 3 is not the duplicate because the date is different (2012-09-25 != 2012-04-17)
2) I should remove the duplicates except one copy of them.
I thought to do three select operations and join each select on id_object. I can get the duplicates by one entry (price | payment | date). I faced the problem doing the joins
SELECT `id_object`,`name`,{P.`value_double` | P.`value_timestamp`}
FROM record P
INNER JOIN(
SELECT {value_double | value_timestamp}
FROM record
WHERE name = {required_entry}
GROUP BY {value_double | value_timestamp}
HAVING COUNT(id_object) > 1
)temp ON {P.value_double = temp.value_double | P.value_timestamp = temp.value_timestamp}
WHERE name = {required_entry}
Can someone help and show the pure (better) solution?
Though less efficient than certain alternatives, I find an approach along these lines easier to read...
SELECT MIN(id_object) id_object
, price
, payment
, date
FROM
( SELECT id_object
, MAX(CASE WHEN name = 'price' THEN value_double END) price
, MAX(CASE WHEN name = 'payment' THEN value_double END) payment
, MAX(CASE WHEN name = 'date' THEN value_timestamp END) date
FROM eav
GROUP
BY id_object
) x
GROUP
BY price
, payment
, date;
I would just group_concat() the values together and do the test that way:
select t.*
from t join
(select min(id_object) id_object
from (select id_object,
group_concat(name, ':', coalesce(value_double, ''), ':', coalesce(value_timestamp, '') order by name) pairs
from t
where name in ('price', 'payment', 'date')
group by id_object
) tt
group by pairs
) tt
on t.id_object = tt.id_object;
To actually delete the ones that are not the minimum id for each group of related values:
delete t
from t left join
(select min(id) as id
from (select id, group_concat(name, ':', coalesce(value_double, ''), ':', coalesce(value_timestamp, '' order by name) as pairs,
from t
where name in ('price', 'payment', 'date')
group by id
) tt
group by pairs
) tt
on t.id = tt.id
where tt.id is null;

How to select only the latest rows for each user?

My table looks like this:
id | user_id | period_id | completed_on
----------------------------------------
1 | 1 | 1 | 2010-01-01
2 | 2 | 1 | 2010-01-10
3 | 3 | 1 | 2010-01-13
4 | 1 | 2 | 2011-01-01
5 | 2 | 2 | 2011-01-03
6 | 2 | 3 | 2012-01-13
... | ... | ... | ...
I want to select only the latest users periods entries, bearing in mind that users will not all have the same period entries.
Essentially (assuming all I have is the above table) I want to get this:
id | user_id | period_id | completed_on
----------------------------------------
3 | 3 | 1 | 2010-01-13
4 | 1 | 2 | 2011-01-01
6 | 2 | 3 | 2012-01-13
Both of the below queries always resulted with the first user_id occurance being selected, not the latest (because the ordering happens after the rows are selected from what I understand):
SELECT
DISTINCT user_id,
period_id,
completed_on
FROM my_table
ORDER BY
user_id ASC,
period_id DESC
SELECT *
FROM my_table
GROUP BY user_id
ORDER BY
user_id ASC,
period_id DESC
Seems like this should work using MAX and a subquery:
SELECT t.Id, t.User_Id, t.Period_Id, t.Completed_On
FROM my_table t
JOIN (SELECT Max(completed_on) Max_Completed_On, t.User_Id
FROM my_table
GROUP BY t.User_ID
) t2 ON
t.User_Id = t2.User_Id AND t.Completed_On = t2.Max_Completed_On
However, if you potentially have multiple records where the completed_on date is the same per user, then this could return multiple records. Depending on your needs, potentially adding a MAX(Id) in your subquery and joining on that would work.
try this:
SELECT t.Id, t.User_Id, t.Period_Id, t.Completed_On
FROM table1 t
JOIN (SELECT Max(completed_on) Max_Completed_On, t.User_Id
FROM table1 t
GROUP BY t.User_ID) t2 ON t.User_Id = t2.User_Id AND t.Completed_On = t2.Max_Completed_On
DEMO HERE