I'm stuck on a rather complex query.
I'm looking to write a query that shows the "top five customers" as well as some key metrics (counts with conditions) about each of those customers. Each of the different metrics uses a totally different join structure.
+-----------+------------+ +-----------+------------+ +-----------+------------+
| customer | | | metricn | | | metricn_lineitem |
+-----------+------------+ +-----------+------------+ +-----------+------------+
| id | Name | | id | customer_id| |id |metricn_id |
| 1 | Customer1 | | 1 | 1 | | 1 | 1 |
| 2 | Customer2 | | 2 | 2 | | 2 | 1 |
+-----------+------------+ +-----------+------------+ +-----------+------------+
The issue this is that I always want to group by this customer table.
I first tried to put all of my joins into the original query, but the query was abysmal with performance. I then tried using subqueries, but I couldn't get them to group by the original hospital id.
Here's a sample query
SELECT
customer.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = customer_id
) as metric_2
FROM customer
GROUP BY customer.name
SORT BY COUNT(metric1.id) DESC
LIMIT 5
Any advice? Thanks!
SELECT name, metric_1, metric_2
FROM customer AS c
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_1
FROM metric1 AS m
INNER JOIN metric1_lineitem AS l ON m.id = l.metric1_id
GROUP BY customer_id) m1
ON m1.customer_id = c.customer_id
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_2
FROM metric2 AS m
INNER JOIN metric2_lineitem AS l ON m.id = l.metric2_id
GROUP BY customer_id) m1
ON m2.customer_id = c.customer_id
ORDER BY metric_1 DESC
LIMIT 5
You should also avoid using COUNT(columnname) when you can use COUNT(*) instead. The former has to test every value to see if it's null.
Although your data structure may be lousy, your query may not be so bad, with two exceptions. I don't think you need the aggregation on the outer level. Also, the "correlation"s in the where clause (such as metric1.customer_id = customer_id) are not doing anything, because customer_id is coming from the local tables. You need metric1.customer_id = c.customer_id:
SELECT c.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN
metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = c.customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN
metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = c.customer_id
) as metric_2
FROM customer c
ORDER BY 1 DESC
LIMIT 5;
How can you make this run faster? One way is to introduce indexes. I would recommend metric1(customer_id), metric2(customer_id), metric1_lineitem(metric1_id) and metric2_lineitem(metric2_id).
This may be faster than the aggregation method (proposed by Barmar) because MySQL is inefficient with aggregations. This should allow the aggregations to take place only using indexes instead of the base tables.
Related
I have a test table set up as such: Table Rows setup
My objective is to try and get the count of departments that were established before the current department. My SQL Is:
SELECT A.Department, IFNULL(COUNT(*), 0)
FROM Departments A
INNER JOIN Departments B ON B.YearOfEstablishment < A.YearOfEstablishment
GROUP BY Department
ORDER BY COUNT(*);
However I've tried both LEFT JOIN and INNER JOIN, the last department that was found first never is returned because I can assume it is null. Despite having IFNULL, the department is not shown.
What am I doing wrong here?
I think that this is the query you need :
SELECT A.Department, COUNT(B.Department)
FROM Departments A
LEFT JOIN Departments B ON B.YearOfEstablishment < A.YearOfEstablishment
GROUP BY A.Department
ORDER BY 2;
See this db fiddle demo.
| Department | cnt |
| ----------------- | --- |
| Office Management | 0 |
| Business | 1 |
| Sales Management | 2 |
| ComputerScience | 3 |
| Liberal Arts | 4 |
| Farming | 4 |
| Communications | 6 |
| Digital Science | 7 |
NB : as commented by #fifonik, IFNULL is not needed since COUNT already returns 0 when no records are available.
In MySQL 8+, you can better do this using rank() or row_number():
SELECT d.Department,
ROW_NUMBER() OVER (ORDER BY YearOfEstablishment) - 1 as seqnum
FROM Departments d
ORDER BY seqnum;
With no ties, this would be the same as your query. It might be better to do:
SELECT d.Department,
RANK() OVER (ORDER BY YearOfEstablishment) - 1 as seqnum
FROM Departments d
ORDER BY seqnum;
This should be the count you are looking for.
You can also do something like this:
SELECT
A.Department
, A.YearOfEstablishment
, COUNT(*) - 1
FROM
Departments A
INNER JOIN Departments B ON (
A.id = B.id
OR B.YearOfEstablishment < A.YearOfEstablishment
)
GROUP BY
Department
ORDER BY
COUNT(*);
I've been trying for two days, without luck.
I have the following simplified tables in my database:
customers:
| id | name |
| 1 | andrea |
| 2 | marco |
| 3 | giovanni |
access:
| id | name_id | date |
| 1 | 1 | 5000 |
| 2 | 1 | 4000 |
| 3 | 2 | 1500 |
| 4 | 2 | 3000 |
| 5 | 2 | 1000 |
| 6 | 3 | 6000 |
| 7 | 3 | 2000 |
I want to return all the names with their last access date.
At first I tried simply with
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id
But I got 7 rows instead of 3 as expected. So I understood I need to use GROUP BY statemet as the following:
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id GROUP BY customers.id
As far I know, GROUP BY combines using a random row. In fact I got unordered access dates with several tests.
Instead I need to group every customer id with its corresponding latest access! How this can be done?
You have to get the latest date from the access table with a group by on the the name_id, then join this result with the customer table. Here is the query:
select c.id, c.name, a.last_access_date from customers c left join
(select id, name_id, max(access_date) last_access_date from access group by name_id) a
on c.id=a.name_id;
Here is a DEMO on sqlfiddle.
I think this is what you'd like to achieve:
SELECT c.id, c.name, max(a.date) last_access
FROM customers c
LEFT JOIN access a ON c.id = a.name_id
GROUP BY c.id, c.name
The LEFT join will return all entries in table customers regardless if the join criteria (c.id = a.name_id) is satisfied. This means that you might get some NULL entries.
Example:
Simply add a new row in the customers table (id: 4, name: manuela). The output will have 4 rows and the newest row will be (id: 4, last_access: null)
I would do this using a correlated subquery in the ON clause:
SELECT a.*, c.*
FROM customers c LEFT JOIN
access a
ON c.id = a.name_id AND
a.DATE = (SELECT MAX(a2.date) FROM access a2 WHERE a2.name_id = a.name_id);
If this statement is true:
I need to group every customer id with its corresponding latest access! How this can be done?
Then you can simply do:
select a.name_id, max(a2.date)
from access a
group by a.name_id;
You do not need the customers table because:
All customers are in access, so the left join is not necessary.
You need no columns from customers.
I have two tables, one is the cost table and the other is the payment table, the cost table contains the cost of product with the product name.
Cost Table
id | cost | name
1 | 100 | A
2 | 200 | B
3 | 200 | A
Payment Table
pid | amount | costID
1 | 10 | 1
2 | 20 | 1
3 | 30 | 2
4 | 50 | 1
Now I have to sum the total of cost by the same name values, and as well sum the total amount of payments by the costID, like the query below
totalTable
name | sum(cost) | sum(amount) |
A | 300 | 80 |
B | 200 | 30 |
However I have been working my way around this using the query below but I think I am doing it very wrong.
SELECT
b.name,
b.sum(cost),
a.sum(amount)
FROM
`Payment Table` a
LEFT JOIN
`Cost Table` b
ON
b.id=a.costID
GROUP by b.name,a.costID
I would be grateful if somebody would help me with my queries or better still an idea as to how to go about it. Thank you
This should work:
select t2.name, sum(t2.cost), coalesce(sum(t1.amount), 0) as amount
from (
select id, name, sum(cost) as cost
from `Cost`
group by id, name
) t2
left join (
select costID, sum(amount) as amount
from `Payment`
group by CostID
) t1 on t2.id = t1.costID
group by t2.name
SQLFiddle
You need do the calculation in separated query and then join them together.
First one is straight forward.
Second one you need to get the name asociated to that payment based in the cost_id
SQL Fiddle Demo
SELECT C.`name`, C.`sum_cost`, COALESCE(P.`sum_amount`,0 ) as `sum_amount`
FROM (
SELECT `name`, SUM(`cost`) as `sum_cost`
FROM `Cost`
GROUP BY `name`
) C
LEFT JOIN (
SELECT `Cost`.`name`, SUM(`Payment`.`amount`) as `sum_amount`
FROM `Payment`
JOIN `Cost`
ON `Payment`.`costID` = `Cost`.`id`
GROUP BY `Cost`.`name`
) P
ON C.`name` = P.`name`
OUTPUT
| name | sum_cost | sum_amount |
|------|----------|------------|
| A | 300 | 80 |
| B | 200 | 30 |
A couple of issues. For one thing, the column references should be qualified, not the aggregate functions.
This is invalid:
table_alias.SUM(column_name)
Should be:
SUM(table_alias.column_name)
This query should return the first two columns you are looking for:
SELECT c.name AS `name`
, SUM(c.cost) AS `sum(cost)`
FROM `Cost Table` c
GROUP BY c.name
ORDER BY c.name
When you introduce a join to another table, like Product Table, where costid is not UNIQUE, you have the potential to produce a (partial) Cartesian product.
To see what that looks like, to see what's happening, remove the GROUP BY and the aggregate SUM() functions, and take a look at the detail rows returned by a query with the join operation.
SELECT c.id AS `c.id`
, c.cost AS `c.cost`
, c.name AS `c.name`
, p.pid AS `p.pid`
, p.amount AS `p.amount`
, p.costid AS `p.costid`
FROM `Cost Table` c
LEFT
JOIN `Payment Table` p
ON p.costid = c.id
ORDER BY c.id, p.pid
That's going to return:
c.id | c.cost | c.name | p.pid | p.amount | p.costid
1 | 100 | A | 1 | 10 | 1
1 | 100 | A | 2 | 20 | 1
1 | 100 | A | 4 | 50 | 1
2 | 200 | B | 3 | 30 | 2
3 | 200 | A | NULL | NULL | NULL
Notice that we are getting three copies of the id=1 row from Cost Table.
So, if we modified that query, adding a GROUP BY c.name, and wrapping c.cost in a SUM() aggregate, we're going to get an inflated value for total cost.
To avoid that, we can aggregate the amount from the Payment Table, so we get only one row for each costid. Then when we do the join operation, we won't be producing duplicate copies of rows from Cost.
Here's a query to aggregate the total amount from the Payment Table, so we get a single row for each costid.
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
That would return:
costid | tot_amount
1 | 80
2 | 30
We can use the results from that query as if it were a table, by making that query an "inline view". In this example, we assign an alias of v to the query results. (In the MySQL venacular, an "inline view" is called a "derived table".)
SELECT c.name AS `name`
, SUM(c.cost) AS `sum_cost`
, IFNULL(SUM(v.tot_amount),0) AS `sum_amount`
FROM `Cost Table` c
LEFT
JOIN ( -- inline view to return total amount by costid
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
) v
ON v.costid = c.id
GROUP BY c.name
ORDER BY c.name
I have these three tables in my database:
tblCustomer (id,name,address)
tblLoan (id,customerId,LoanAmount,date)
tblPayment (id,customerId,ReceivedAmount,date)
I want to find the total loanAmount for a customer and how much they have paid.
I wrote this query:
SELECT c.fname, SUM(l.amount), SUM(p.amount)
FROM tblCustomer c
JOIN tblLoan l ON (l.customerId = c.id)
JOIN tblPayment p ON (p.customerId = c.id)
WHERE c.id = 3;
It returns results but they are incorrect.
First, as others have mentioned, your syntax is likely incorrect because you do not have matching column names, but you said you had incorrect results, so I would assume that's not your problem as you were able to run your query..
The problem that I think you are most likely having is that by joining the two tables together like that, rows appear twice for each customer. Am I correct in assuming that your 'incorrect' results are double what you would expect? Let me illustrate for those who don't understand. Consider this data set, with shortened column values:
tblCustomer:
| id | name |
+----+------+
| 1 | Adam |
| 2 | John |
| 3 | Jane |
tblLoan, and for simplicity we'll say the payment table looks the same:
| customerID | loanAmount |
+------------+------------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
| 3 | 300 |
| 2 | 200 |
If I perform the following query (without summing values, just getting the values I want:
SELECT c.id, c.name, l.loanAmount, p.receivedAmount
FROM tblCustomer c
JOIN tblLoan l ON l.customerid = c.id
JOIN tblPayment p ON p.customerid = c.id
WHERE c.id = 3;
It returns this result set:
| id | name | loanAmount | receivedAmount |
+----+------+------------+----------------+
| 3 | Jane | 100 | 100 |
| 3 | Jane | 100 | 300 |
| 3 | Jane | 300 | 100 |
| 3 | Jane | 300 | 300 |
So notice that because we're joining two tables based on a relationship to a third table, were actually creating a cartesian product which is causing the problem. So, what I recommend you do is use subqueries for these two tables. One subquery will pull the loan values, one the payment values, and you can join those together on the id value.
It will look like this:
SELECT t.id, t.totalLoan, w.totalReceived
FROM(SELECT c.id, SUM(l.loanAmount) AS totalLoan
FROM tblCustomer c
JOIN tblLoan l ON l.customerid = c.id
WHERE c.id = 3) t
JOIN(SELECT c.id, SUM(p.receivedAmount) AS totalReceived
FROM tblCustomer c
JOIN tblPayment p ON p.customerid = c.id
WHERE c.id = 3) w
ON t.id = w.id;
And this should give you the values you want. Here is what I tested on SQL Fiddle.
FYI, YOUR COLUMN NAMES ARE WRONG!!!
There is no such column named fname in table tblCustomer
There is no such column named amount in table tblLoan
There is no such column named amount in table tblPayment
You won't get the right result if you don't have the appropriate column names. Even when using aliases, your column name should be EXACTLY THE SAME as in your database table. That's because, you are aliasing TABLES in JOIN queries, not COLUMNS.
So, re-write your query in the following way:
SELECT c.name, SUM(l.LoanAmount), SUM(p.ReceivedAmount)
FROM tblCustomer c
JOIN tblLoan l ON l.customerId = c.id
JOIN tblPayment p ON p.customerId = c.id
WHERE c.id = 3
Note that there's no need to get brackets around the ON clause in JOIN.
I have 2 tables in my database that look like so:
clients
+-------------+
| id | sms |
|------+------|
| 1 | 0 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
+------+------+
clients_lists_relationships
+----------------------+
| listid | clientid |
|----------+-----------|
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 3 | 1 |
+----------+-----------+
Now what I'm trying to do is get a list of clients who are in a bunch of lists. I do that like so:
SELECT c.id,
l.*
FROM clients AS c,
clients_lists_relationships AS l
WHERE c.id = l.clientid
AND c.sms = '1'
AND ( l.listid = '1'
OR l.listid = '2' );
This does give me a list of the clients that I need. But because a client can be in more than one list I get the same client more than once. How would I limit this to only one row for each client no matter how many lists they are in?
If you just need any client that is in a list, you can just query the relationship table:
SELECT DSITINCT(clientid) FROM clients_lists_relationships
You can also use that distinct on your combined query, but be aware that the "listid" you'll get is just one.
Use GROUP BY:
SELECT c.id,
l.listid
FROM clients c
INNER JOIN clients_lists_relationships l
ON c.id = l.clientid
WHERE c.sms = 1
AND l.listid IN (1,2)
GROUP BY c.id
Note that by doing this you lose information on which lists the client was a member of. This means that you should probably not select anything from client_lists_relationships as this information is either redundant (clientid) or incomplete (listid).
First of all take a look at MySQL:: JOIN It's much better than the WHERE statements you use now.
I think you are looking for GROUP BY.
In total, the query look like:
SELECT
c.id,
l.*
FROM
clients AS c
INNER JOIN
clients_lists_relationships AS l
ON
l.clientid = c.id
AND
c.sms = '1'
AND
( l.listid = '1'
OR l.listid = '2' );
GROUP BY
c.id
To return just the clients participating in more than 1 list you may want to consider using the HAVING clause:
SELECT c.id
FROM Clients c
INNER JOIN Client_Lists_Relationships l
ON l.clientid = c.id
WHERE c.sms = 1
HAVING COUNT(L.listid) > 1
GROUP BY c.id