SQL: Window function after Join - mysql

I have two tables: Subscriptions and Items:
Subscriptions (sub_id PK, user_id, value)
Downloads (download_id PK, user_id, category_id)
My goal is to get a result table of form (user_id, sum_subscription_value, num_download_categories). In other words: each row is unique to a user_id, in which the total value of subscriptions that user has purchased is given alongside the number of categories the user has downloaded things from.
I've attempted by solving using the following code, but the categories aren't being counted correctly. I think the issue might be the join, as the value rows are being repeated, but I'm not sure how to exactly circumvent the issue. Any help is appreciated
SELECT
DISTINCT subscriptions.user_id,
SUM(value) OVER (PARTITION by subscriptions.user_id, category_id) AS user_purchases,
COUNT(category_id) OVER (PARTITION by subscriptions.user_id) AS user_downloads
FROM subscriptions
LEFT JOIN downloads on subscriptions.user_id = downloads.user_id;

My goal is to get a result table of form (user_id, sum_subscription_value, num_download_categories).
One method is to aggregate before joining. But in this case, I suppose that a user could be missing from either table. And aggregating before would require full join to avoid losing data.
Instead, you can use union all and group by:
select user_id,
sum(value) as subscription_value,
count(distinct category_id) as num_categories
from ((select user_id, value, null as category_id
from subscriptions
) union all
(select user_id, NULL as value, category_id
from downloads
)
) sd
group by user_id;

Related

Count Distinct on multiple values within same column in SQL Aggregation

Objective:
I wanted to show the number of distinct IDs for any combination selected.
In the below example, I have data at a granular level: ID level data.
I wanted to show the number of distinct IDs for each combination.
For this, I use count distinct which will give me '1' for the below combinations.
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in that case, if I just use this data, I would be showing the sum of E-comm and Face to face and the result would be '2' instead of '1'.
And this is not limited to Ecom/Face to face. I wanted to apply the same logic for all columns.
Please let me know if you have any other alternative approach to address this issue.
First aggregate in your table to get the distinct ids for each TranType:
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
and then join to the table:
SELECT t.*, g.counter_distinct
FROM tablename t
INNER JOIN (
SELECT TranType, COUNT(DISTINCT id) counter_distinct
FROM tablename
GROUP BY TranType
) g ON g.TranType = t.TranType
Or use a correlated subquery:
SELECT t1.*,
(SELECT COUNT(DISTINCT t2.id) FROM tablename t2 WHERE t2.TranType = t1.TranType) counter_distinct
FROM tablename t1
But let's say if I wanted to find the number of IDs who made both E-commerce and Face to face transactions, in
You can get the list of ids using:
select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2;
You can get the count using a subquery:
select count(*)
from (select id
from t
where tran_type in ('Ecomm', 'Face to face')
group by id
having count(distinct tran_type) = 2
) i;

Proper way to use MySQL GROUP BY for returning one result from a referenced table

I often have a situation with two tables in MySQL where I need one record for each foreign key. For example:
table post {id, ...}
table comment {id, post_id, ...}
SELECT * FROM comment GROUP BY post_id ORDER BY id ASC
-- Oldest comment for each post
or
table client {id, ...}
table payment {id, client_id, ...}
SELECT * FROM payment GROUP BY client_id ORDER BY id DESC
-- Most recent payment from each client
These queries often fail because the "SELECT list is not in GROUP BY clause" and contains nonaggregated columns.
Failed Solutions
I can usually work around this with a min()/max() but that creates a very slow query with mis-matched results (row with min(id) isn't equal to row with min(textfield))
SELECT min(id), min(textfield), ... FROM table GROUP BY fk_id
Adding all the columns to GROUP BY results in duplicate records (from the fk_id) which defeats the purpose of GROUP BY.
SELECT id, textfield, ... FROM table GROUP BY fk_id, id, textfield
Same idea as #GurV but using a join instead of a correlated subquery. The basic idea here is that the subquery finds, for each post which has comments, the oldest post and its corresponding id in the comments table. We then join back to comments again to restrict to the records we want.
SELECT t1.*
FROM comments t1
INNER JOIN
(
SELECT post_id, MIN(id) AS min_id
FROM comments
GROUP BY post_id
) t2
ON t1.post_id = t2.post_id AND
t1.id = t2.min_id
You can use a correlated query with aggregation to find out the earliest comment for each post:
select *
from comments c1
where id = (
select min(id)
from comments c2
where c1.post_id = c2.post_id
)
Compound index - comments(id, post_id) should be helpful.
If you are querying the whole table with many rows, then it will. This query is more useful and performant if you are querying for a small subset of posts. If you are querying the whole table, then #Tim's answer is better suited I think.

How to combine these two SQL statements to avoid GROUP BY error?

I am trying to combine two SQL statements into one, but am running into GROUP BY errors for full_group_only, which I understand why, but not how to solve.
In the first query, I am simply getting the number of actions per item, and in the second query, I am getting the latest action.
Assume a simple setup as:
actions (id, item_id, description)
items (id, name)
With the two queries
SELECT item_id, COUNT(*) AS actions_number FROM actions GROUP BY item_id
SELECT * FROM actions WHERE id in (SELECT max(id) FROM actions GROUP BY item_id)
How do I easily combine these two statements into one?
Is this what you want?
SELECT
a.*, b.actions_number
FROM
actions a
INNER JOIN
(SELECT
MAX(id) id, COUNT(*) actions_number
FROM
actions
GROUP BY item_id) b ON a.id = b.id;

Delete duplicates from `MySQL` and keep single record based on an order

I need to remove all the duplicates and keep only one with the highest amount . Perhaps I should do some kind of JOIN operation but I'm not very experienced with . I have this query :
SELECT *
FROM invoices
GROUP BY user
ORDER BY amount DESC
it queries all the rows, orders them by amount and "removes" the duplicates as it groups by user but obviously doesn't delete the duplicates. Any help is appreciated . To make it clear the duplicates must be deleted permanently.
Schema :
user varchar(125), amount int
If you do a SELECT * that's not going to filter out records, even with a GROUP BY.
SELECT user, MAX(amount) amount
FROM invoices
GROUP BY user
ORDER BY amount DESC
For the sake of just finding duplicates, you can try:
SELECT id, COUNT(amount) AS cnt, MAX(amount) AS mx
FROM invoices
GROUP BY user HAVING cnt > 1
ORDER BY amount DESC
From there, you can proceed removing these records.
Be aware that you won't get the desired result because of the way you're using GROUP BY. MySQL extends it's functionality. You want to always specify the columns being selected in the GROUP BY:
SELECT col1, col2, AGGREGATE(col3)
FROM table
GROUP BY col1, col2
I need to select all the rows find duplicates
To find the MAX amount for each user:
SELECT user,
Max(amount) AS amount
FROM invoices
GROUP BY user
and keep only the row with the highest amount
Option 1
Use a LEFT JOIN (thanks JW):
DELETE invoices
FROM invoices
LEFT JOIN
(SELECT user, MAX(amount) AS amount
FROM invoices
GROUP BY user) j
ON j.user = invoices.user
AND j.amount = invoices.amount
WHERE j.amount IS NULL
http://sqlfiddle.com/#!2/ce2f8/1
Option 2
Create a staging table:
CREATE TABLE invoices (
user int,
amount decimal(5,2));
INSERT INTO invoices VALUES
(1, 100.00),
(1, 200.00),
(1, 300.00);
CREATE TABLE invoicesStg (
user int,
amount decimal(5,2));
INSERT INTO invoicesStg
(SELECT user, MAX(amount) AS amount
FROM invoices
GROUP BY user);
TRUNCATE invoices;
INSERT INTO invoices
SELECT user, amount
FROM invoicesStg;
DROP TABLE invoicesStg;
http://sqlfiddle.com/#!2/0381e/1
If you want the row with the highest amount, try this:
select *
from invoices
order by amount desc
limit 1
I'm not sure what you mean by "delete". Do you really want to delete all rows but the one with the highest amount?

Using sql to find duplicate records and delete in same operation

I'm using this SQL statement to find duplicate records:
SELECT id,
user_id,
activity_type_id,
source_id,
source_type,
COUNT(*) AS cnt
FROM activities
GROUP BY id, user_id, activity_type_id, source_id, source_type
HAVING COUNT(*) > 1
However, I want to not only find, but delete in the same operation.
delete from activities where id not in (select max(id) from activities group by ....)
Thanks to #OMG Ponies and his other post here is revised solution (but not exactly the same). I assumed here that it does not matter which specific rows are left undeleted. Also the assumption is that id is primary key.
In my example, I just set up one extra column name for testing but it can be easily extended to more columns via GROUP BY clause.
DELETE a FROM activities a
LEFT JOIN (SELECT MAX(id) AS id FROM activities GROUP BY name) uniqId
ON a.id=uniqId.id WHERE uniqId.id IS NULL;