Sql join with sum up duplicates - mysql

1)The sample data set looks like this:
create table user(
user_id int,
name varchar(10),
surname varchar(10)
);
insert into user(user_id, name, surname) values
(1, 'a', 'aa'),
(2, 'b', 'bb'),
(3, 'c', 'cc');
create table books(
user_id int,
book_name varchar(10)
);
insert into books(user_id, book_name) values
(1, 'book1'),
(1, 'book2'),
(1, 'book3'),
(2, 'book1');
create table expanses(
id int,
user_id int,
amount_spent int,
date timestamp
);
insert into expanses(id, user_id, amount_spent, date)
values
(1,1,10, '2020-02-03'),
(2,1,10, '2020-02-03'),
(3,1,30, '2020-02-02'),
(4,1,12, '2020-02-01'),
(5,1,15, '2020-01-31'),
(6,1,13, '2020-01-15'),
(7,2,15, '2020-02-01'),
(8,3,20, '2020-02-01');
2)The result which I want:
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 77 |
Explanation: I want to count
a) how many users have book1 or book2 and
b) how much total they spend on a date between 2020-02-01 - 2020-02-03.
Now how the query should look like?
I am using MySQL version 8.
I have tried:
SELECT count(*)
, sum(amount_spend) as total_amount_spend
FROM
( select sum(amount_spend) as amount_spend
FROM expanses e
LEFT
JOIN books b
ON b.user_id = e.user_id
WHERE (b.book_name ='book1' or b.book_name ='book2')
and e.date between '2020-02-01' and '2020-02-03'
GROUP
BY e.user_id) src'
And the result is wrong because the select clause from the inside(a little bit modified to show you more clearly):
select amount_spend as amount_spend
FROM expanses
LEFT JOIN books ON books.user_id = expanses.user_id WHERE (books.book_name ='book1' or books.book_name ='book2') and expanses.date between '2020-02-01' and '2020-02-03'
3)Will return something like this:
| user_id | amount_spent | book_name |
|---------|--------------|-----------|
| 1 | 10 | book1 |
| 1 | 30 | book1 |
| 1 | 30 | book1 |
| 1 | 12 | book1 |
| 1 | 10 | book2 |
| 1 | 10 | book2 |
| 1 | 30 | book2 |
| 1 | 12 | book2 |
| 2 | 15 | book1 |
4)So if sum this up, we will get
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 139 |
5)Which is wrong, because there are duplicates.
If we add DISTINCT to sum(DISTINCT amount_spend)
it will be also wrong because it will give the following result
| CountUsers | amount_spent |
|---------|--------------|
| 2 | 67 |
To summarize, you can see in table 3) there are some duplicates of amount_spent cause the book_name.(one to many relationships)
How to avoid duplicating amount_spent but stay with book_name?
Fiddle

select count(distinct user_id)
, sum(amount_spent)
from expanses
where expanses.date between '2020-02-01' and '2020-02-03'
and user_id in (select user_id from books where book_name in('book1','book2'))
https://www.db-fiddle.com/f/26ifPWyRRKGp9YVQXg1qje/0

Here's an idea for a)
SELECT COUNT(DISTINCT user_id) total FROM books WHERE book_name IN ('book1','book2');
Here's an idea for b)
SELECT SUM(amount_spent) total_spent
FROM expanses e
WHERE e.date BETWEEN '2020-02-01' AND '2020-02-03'
AND EXISTS
( SELECT *
FROM books b
WHERE b.user_id = e.user_id
AND b.book_name IN ('book1','book2')
);
Here's and idea for combining a) and b)
SELECT SUM(amount_spent) total_spent
, (SELECT COUNT(DISTINCT user_id) total FROM books WHERE book_name IN ('book1','book2')) total_customers
FROM expanses e
WHERE e.date BETWEEN '2020-02-01' AND '2020-02-03'
AND EXISTS
( SELECT *
FROM books b
WHERE b.user_id = e.user_id
AND b.book_name IN ('book1','book2')
);

Related

SQL - LEFT JOIN, but I want COUNT(*) to only count the results from the INNER part of the join

I want to display the number of purchases each customer has made. If they've made 0 purchases, I want to display 0.
Desired Output:
-------------------------------------
| customer_name | number_of_purchases |
-------------------------------------
| Marg | 0 |
| Ben | 1 |
| Phil | 4 |
| Steve | 0 |
-------------------------------------
Customer Table:
-----------------------------
| customer_id | customer_name |
-----------------------------
| 1 | Marg |
| 2 | Ben |
| 3 | Phil |
| 4 | Steve |
-----------------------------
Purchases Table:
--------------------------------------------------
| purchase_id | customer_id | purchase_description |
--------------------------------------------------
| 1 | 2 | 500 Reams |
| 2 | 3 | 6 Toners |
| 3 | 3 | 20 Staplers |
| 4 | 3 | 2 Copiers |
| 5 | 3 | 9 Name Plaques |
--------------------------------------------------
My current query is as follows:
SELECT customer_name, COUNT(*) AS number_of_purchaes
FROM customer
LEFT JOIN purchases ON customer.customer_id = purchases.customer_id
GROUP BY customer.customer_id
However, since it's a LEFT JOIN, the query results in rows for customers with no purchases, which makes them part of the COUNT(*). In other words, customers who've made 0 purchases are displayed as having made 1 purchase, like so:
LEFT JOIN Output:
-------------------------------------
| customer_name | number_of_purchases |
-------------------------------------
| Marg | 1 |
| Ben | 1 |
| Phil | 4 |
| Steve | 1 |
-------------------------------------
I've also tried an INNER JOIN, but that results in customers with 0 purchases not showing at all:
INNER JOIN Output:
-------------------------------------
| customer_name | number_of_purchases |
-------------------------------------
| Ben | 1 |
| Phil | 4 |
-------------------------------------
How could I achieve my Desired Output where customers with 0 purchases are shown?
Instead of count(*) use count(purchase_id)
SELECT customer_name, COUNT(purchase_id) AS number_of_purchaes
FROM customer
LEFT JOIN purchases ON customer.customer_id = purchases.customer_id
GROUP BY customer_id,customer_name
You can try like this:
Sample Data:
create table customer(customer_id integer, customer_name varchar(20));
create table purchaser(purchaser_id varchar(20), customer_id integer, description varchar(20));
insert into customer values(1, 'Marg');
insert into customer values(2, 'Ben');
insert into customer values(3, 'Phil');
insert into customer values(4, 'Steve');
insert into purchaser values(1, 2, '500 Reams');
insert into purchaser values(2, 3, '6 toners');
insert into purchaser values(3, 3, '20 Staplers');
insert into purchaser values(4, 3, '20 Staplers');
insert into purchaser values(5, 3, '20 Staplers');
SELECT c.customer_id, c.customer_name, COUNT(p.purchaser_id) AS number_of_purchaes
FROM customer c
LEFT JOIN purchaser p ON c.customer_id = p.customer_id
GROUP BY c.customer_id;
SQL fiddle: http://sqlfiddle.com/#!9/32ff0a/2
COUNT(*) returns the number of items in a group. This includes NULL values and duplicates.
COUNT(ALL expression) evaluates expression for each row in a group, and returns the number of nonnull values.
CREATE table customer(customer_id integer , customer_name varchar(20));
create table purchases(purchase_id integer , customer_id integer , purchase_description varchar(30));
INSERT INTO customer ( customer_id, customer_name )
VALUES ( 1, 'Marg' )
, ( 2, 'Ben' )
, ( 3, 'Phil' )
, ( 4, 'Steve' );
INSERT INTO purchases ( purchase_id, customer_id, purchase_description )
VALUES ( 1, 2, '500 Reams' )
, ( 2, 3, '6 toners' )
, ( 3, 3, '20 Staplers' )
, ( 4, 3, '2 Copiers' )
, ( 5, 3, '9 Name Plaques' );
SELECT c.customer_name
, COUNT(p.purchase_id) AS number_of_purchases
FROM customer c
LEFT JOIN purchases p
ON c.customer_id = p.customer_id
GROUP BY c.customer_name
COUNT(*) counts rows. You want to count matches, so count from the second table as following:
select customer.customer_name , a.number_of_purchases from (
SELECT customer_id, COUNT(purchases.purchase_id) AS number_of_purchaes
FROM customer
LEFT JOIN purchases ON customer.customer_id = purchases.customer_id
GROUP BY customer.customer_id) as a
inner join customer on customer.customer_id=a.customer_id;
In other words, the LEFT JOIN returns a row when there is no match. That row has a NULL value for all the columns in the purchases table.
SELECT
customer_name, COUNT(purchase_id) AS number_of_purchases
FROM
customer AS c
LEFT JOIN purchases AS p ON (c.cid = p.cid)
GROUP BY c.name
Instead of count(*) use COUNT(purchases.customer_id)
SELECT customer_name, COUNT(purchases.customer_id) AS number_of_purchaes
FROM customer
LEFT JOIN purchases ON customer.customer_id = purchases.customer_id
GROUP BY customer.customer_id
SELECT c.customer_name,count(p.purchase_id)number_of_purchases FROM Customer c
LEFT JOIN
Purchases AS p ON c.customer_id = p.customer_id
GROUP BY c.customer_name

Sum, Group Concat and Join with 3 tables.

I am working on a tool to administer customer and payment data.
I use MySQL and have the following tables: customers and payments:
customers:
ID | invoiceID | supreme_invoiceID
1 123 a123
2 124 a123
3 103 a103
4 110 a110
payments:
ID | supreme_invoiceID | amount | date
1 a123 10 10.10.2010
2 a103 105 10.11.2017
3 a123 5 11.10.2010
And my result should look like this:
view_complete:
ID | supreme_invoideID | number_invoices | GROUP_CONCAT(invoiceID) | SUM(payments.amount) | GROUP_CONCAT(payments.amount)
1 a123 2 123;124 15 10;15
Unfortunately, I cannot get it directly into one table. Instead I create 2 views and query the payments table separately for aggregate data on payments.
First, I create an auxiliary view:
CREATE VIEW precomplete as
SELECT *, COUNT(supreme_invoiceID) as number_invoices FROM customers
GROUP BY supreme_invoiceID;
Then, a second one:
Then I take a second VIEW
CREATE VIEW complete AS
SELECT precomplete.*, SUM(payments.amount)
LEFT JOIN payments p ON precomplete.supreme_invoiceID = p.supreme_invoiceID
GROUP BY precomplete.supreme_invoiceID;
And the concatenated Values I receive in an additional query. But I would like to receive my data all in one query and hopefully, without such view hierarchy. PhpMyAdmin is already pretty slow in loading my views even with few entries.
Any help is greatly appreciated.
Thanks!
The db design forces an approach which builds the aggregates separately to avoid duplicates before joining on a common field for example
drop table if exists c,p;
create table c(ID int, invoiceID int, supreme_invoiceID varchar(4));
insert into c values
(1 , 123 , 'a123'),
(2 , 124 , 'a123'),
(3, 103 , 'a103'),
(4 , 110 , 'a110');
create table p(ID int, supreme_invoiceID varchar(4), amount int, date varchar(10));
insert into p values
(1 , 'a123' , 10 , '10.10.2010'),
(2 , 'a103' , 105 , '10.11.2017'),
(3 , 'a123' , 5 , '11.10.2010');
select c.*,p.*
from
(select min(c.id) minid,count(*) nofinvoices,group_concat(c.invoiceid) gciid, max(supreme_invoiceid) maxsid
from c
group by supreme_invoiceid
) c
join
(select group_concat(supreme_invoiceid) gcsid, sum(amount),group_concat(amount),max(supreme_invoiceid) maxsid
from p
group by supreme_invoiceid
) p
on p.maxsid = c.maxsid
order by minid
;
+-------+-------------+---------+--------+-----------+-------------+----------------------+--------+
| minid | nofinvoices | gciid | maxsid | gcsid | sum(amount) | group_concat(amount) | maxsid |
+-------+-------------+---------+--------+-----------+-------------+----------------------+--------+
| 1 | 2 | 123,124 | a123 | a123,a123 | 15 | 10,5 | a123 |
| 3 | 1 | 103 | a103 | a103 | 105 | 105 | a103 |
+-------+-------------+---------+--------+-----------+-------------+----------------------+--------+
2 rows in set (0.15 sec)
Much like your view approach. Note there doesn't appear to be a customer in the customer table

How to limit a query by column value

Following query...
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2)
...gives me the following result:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 1 | 1 |
| 5 | 1 |
| 4 | 1 |
| 6 | 1 |
| 4 | 2 |
| 2 | 2 |
| 1 | 2 |
| 5 | 2 |
+----------+---------+
Now, I want to modify the above query so that I only get for example two rows for each user_id, eg:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 4 | 2 |
| 5 | 2 |
+----------+---------+
I am thinking about something like this, which of course does not work:
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2) LIMIT 2 by user_id
Ideally, this should work with offsets as well because I want to use it for paginations.
For performance reasons it is essential to use the WHERE user_id IN (1, 2) part of the query.
One method -- assuming you have at least two rows for each user -- would be:
(select min(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
) union all
(select max(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
);
Admittedly, this is not a "general" solution, but it might be the simplest solution for what you want.
If you want the two biggest or smallest, then an alternative also works:
select t.*
from t
where t.user_id in (1, 2) and
t.event_id >= (select t2.event_id
from t t2
where t2.user_id = t.user_id
order by t2.event_id desc
limit 1, 1
);
Here is a dynamic example for such problems, Please note that this example is working in SQL Server, could not try on mysql for now. Please let me know how it works.
CREATE TABLE mytable
(
number INT,
score INT
)
INSERT INTO mytable VALUES ( 1, 100)
INSERT INTO mytable VALUES ( 2, 100)
INSERT INTO mytable VALUES ( 2, 120)
INSERT INTO mytable VALUES ( 2, 110)
INSERT INTO mytable VALUES ( 3, 120)
INSERT INTO mytable VALUES ( 3, 150)
SELECT *
FROM mytable m
WHERE
(
SELECT COUNT(*)
FROM mytable m2
WHERE m2.number = m.number AND
m2.score >= m.score
) <= 2
How about this?
SELECT event_id, user_id
FROM (
SELECT event_id, user_id, row_number() OVER (PARTITION BY user_id) AS row_num
FROM EventUser WHERE user_id in (1,2)) WHERE row_num <= n;
And n can be whatever
Later but help uses a derived table and the cross join.
For the example in this post the query will be this:
SELECT
#row_number:=CASE
WHEN #user_no = user_id
THEN
#row_number + 1
ELSE
1
END AS num,
#user_no:=user_id userid, event_id
FROM
EventUser,
(SELECT #user_no:=0,#row_number:=0) as t
group by user_id,event_id
having num < 3;
More information in this link.

SQL JOIN : Prefix fields with table name

I have the following tables
CREATE TABLE `constraints` (
`id` int(11),
`name` varchar(64),
`type` varchar(64)
);
CREATE TABLE `groups` (
`id` int(11),
`name` varchar(64)
);
CREATE TABLE `constraints_to_group` (
`groupid` int(11),
`constraintid` int(11)
);
With the following data :
INSERT INTO `groups` (`id`, `name`) VALUES
(1, 'group1'),
(2, 'group2');
INSERT INTO `constraints` (`id`, `name`, `type`) VALUES
(1, 'cons1', 'eq'),
(2, 'cons2', 'inf');
INSERT INTO `constraints_to_group` (`groupid`, `constraintid`) VALUES
(1, 1),
(1, 2),
(2, 2);
I want to get all constraints for all groups, so I do the following :
SELECT groups.*, t.* FROM groups
LEFT JOIN
(SELECT * FROM constraints
LEFT JOIN constraints_to_group
ON constraints.id=constraints_to_group.constraintid) as t
ON t.groupid=groups.id
And get the following result :
id| name | id | name type groupid constraintid
-----------------------------------------------------
1 | group1 | 1 | cons1 | eq | 1 | 1
1 | group1 | 2 | cons2 | inf | 1 | 2
2 | group2 | 2 | cons2 | inf | 2 | 2
What I'd like to get :
group_id | group_name | cons_id | cons_name | cons_type | groupid | constraintid
-------------------------------------------------------------------------------------
1 | group1 | 1 | cons1 | eq | 1 | 1
1 | group1 | 2 | cons2 | inf | 1 | 2
2 | group2 | 2 | cons2 | inf | 2 | 2
This is an example, in my real case my tables have much more columns so using the SELECT groups.name as group_name, ... would lead to queries very hard to maintains.
Try this way
SELECT groups.id as group_id, groups.name as group_name ,
t.id as cons_id, t.name as cons_name, t.type as cons_type,
a.groupid , a.constraintid
FROM constraints_to_group as a
JOIN groups on groups.id=a.groupid
JOIN constraints as t on t.id=a.constraintid
The only difference I see are the names of the columns? Use for that mather an AS-statement.
SELECT
groups.id AS group_id,
groups.name AS group_name,
t.id AS cons_id,
t.name AS cons_name,
t.groupid, t.constraintid
FROM groups
LEFT JOIN
(SELECT * FROM constraints
LEFT JOIN constraints_to_group
ON constraints.id=constraints_to_group.constraintid) as t
ON t.groupid=groups.id
Besides, a better join-construction is:
SELECT G.id AS group_id,
G.name AS group_name,
CG.id AS cons_id,
CG.name AS cons_name,
C.groupid, C.constraintid
FROM constraints_to_group CG
LEFT JOIN constraints C
ON CG.constraintid = C.id
LEFT JOIN groups G
ON CG.groupid = G.id;
Possible duplicate of this issue

mySQL select occurrences in two columns given third colum value

i have a table like this
Sender_id | Receiver_id | Topic
456 | 123 | 1
123 | 456 | 3
123 | 456 | 2
456 | 123 | 2
123 | 789 | 1
123 | 456 | 1
123 | 789 | 4
456 | 123 | 1
I want to know for every given Topic who are the speakers (can be both sender or receiver) involved in that conversation
eg. for Topic 2, Mr. 123 and Mr. 456 are the only two speakers
The desired result would be to have just the values: 123, 456
Speakers | Topic
123 | 2
456 | 2
this is my first attempt but i want ot merge the two resulting columns
SELECT *
FROM (
SELECT sender_id
FROM [table]
WHERE topic = '2'
AND sender_id !=0
GROUP BY sender_id) a
INNER JOIN (
SELECT receiver_id
FROM [table]
WHERE topic = '2'
AND receiver_id !=0
GROUP BY receiver_id) b
ON sender_id
You don't need a JOIN, you need a UNION:
SELECT sender_id Speakers
FROM [table]
WHERE topic = '2'
AND sender_id !=0
UNION
SELECT receiver_id
FROM [table]
WHERE topic = '2'
AND receiver_id !=0
You don't need to group to remove duplicate values as the UNION does that for you.
SELECT Sender_id as speakers, Topic from Table1 WHERE Topic ='2'
UNION
SELECT Receiver_id as speakers, Topic FROM Table1 WHERE Topic = '2'
Working fiddle http://sqlfiddle.com/#!9/4cf5e3/2
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,Sender_id INT NOT NULL
,Receiver_id INT NOT NULL
,Topic INT NOT NULL
);
INSERT INTO my_table (sender_id,receiver_id,topic) VALUES
(456,123,1),
(123,456,3),
(123,456,2),
(456,123,2),
(123,789,1),
(123,456,1),
(123,789,4),
(456,123,1);
SELECT speaker FROM
(
SELECT sender_id speaker,topic FROM my_table
UNION
SELECT receiver_id,topic FROM my_table
) x
WHERE topic = 2;
+---------+
| speaker |
+---------+
| 123 |
| 456 |
+---------+