I am trying following query...
SELECT b.name AS batch_name, b.id AS batch_id,
COUNT( s.id ) AS total_students,
COALESCE( sum(s.open_bal), 0 ) AS open_balance,
sum( COALESCE(i.reg_fee,0) + COALESCE(i.tut_fee,0) + COALESCE(i.other_fee,0) ) AS gross_fee
FROM batches b
LEFT JOIN students s on s.batch = b.id
LEFT JOIN invoices i on i.student_id = s.id
GROUP BY b.name, b.id;
result set
| batch_name | batch_id | total_students | open_balance | gross_fee |
+------------+-----------+----------------+--------------+-----------+
| ba | 11 | 44 | 0 | 1782750 |
+------------+-----------+----------------+--------------+-----------+
But its giving unexpted results, and if i remove sum( COALESCE(i.reg_fee,0) + COALESCE(i.tut_fee,0) + COALESCE(i.other_fee,0) ) AS gross_fee and LEFT JOIN fm_invoices i on i.student_id = s.id, it gives expected/correct results as following...
| batch_name | batch_id | total_students | open_balance | gross_fee |
+------------+-----------+----------------+--------------+-----------+
| ba | 11 | 34 | 0 | 0 |
+------------+-----------+----------------+--------------+-----------+
I am sure, i am doing something and i am trying every option since last hour, please help.
I assume your question is something like:
Why does COUNT(s.id) return 44 in the first query and 34 in the second query, and how can I make it count 34 students while I sum the invoices in the same query?
You have multiple invoices for some of your students, and the join results in multiple rows with the same s.id. When you count them, it counts each of these multiple rows.
You should use COUNT(DISTINCT s.id) to make the query count each student id only once, even when it appears multiple times as a consequence of the join to invoices.
Re your question about what to change, just change COUNT(s.id) to COUNT(DISTINCT s.id). The rest of the query looks fine, if I have a correct understanding of what you want it to do.
Related
I have 3 tables:
Users:
id | account_name
-------------------|----------------------|
18 | panic |
Deposits:
id | user_id | amount
-------------------|---------------------------|
1 | 18 | 100
2 | 18 | 100
Withdrawals:
id | user_id | amount
------------------------|--------------------------------|
1 | 18 | 200
2 | 18 | 200
and i'm trying to get a result like:
id | totalDeposits | totalWithdraws
------------------------|---------------------------|
18 | 200 | 400
Now when i try to get the totals for some reason they are cross adding themselves up, of course if there are no rows it should return 0.
SELECT t0.id,IFNULL(SUM(t1.amount),0) AS totalWithdrawals,
IFNULL(SUM(t2.amount),0) AS totalDeposits
FROM users t0
LEFT OUTER JOIN withdrawals t1 ON (t0.id = t1.user_id)
LEFT OUTER JOIN deposits t2 ON (t0.id = t2.user_id)
GROUP BY t0.id
Any idea how to do this cross join or where am i summing them wrong?
Try this-
SELECT A.id,
(SELECT SUM(amount) FROM Deposits WHERE user_id = A.id) totalDeposits,
(SELECT SUM(amount) FROM Withdrawals WHERE user_id = A.id) totalWithdraws
FROM users A
WHERE A.id = 18 -- WHERE can be removed to get all users details
You can try something along the lines of
SELECT u.id,
COALESCE(d.amount, 0) totalDeposits,
COALESCE(w.amount, 0) totalWithdrawals
FROM users u
LEFT JOIN (
SELECT user_id, SUM(amount) amount
FROM deposits
GROUP BY user_id
) d ON u.id = d.user_id
LEFT JOIN (
SELECT user_id, SUM(amount) amount
FROM withdrawals
GROUP BY user_id
) w ON u.id = w.user_id
SQLFiddle
Result:
| id | totalDeposits | totalWithdrawals |
|----|---------------|------------------|
| 18 | 200 | 400 |
The problem is that you are generating a Cartesian product. One solution is to aggregate first. Another method is to use UNION ALL and GROUP BY. I would structure this as:
SELECT u.id,
SUM(deposit) as deposits,
SUM(withdrawal) as withdrawal
FROM users u LEFT JOIN
((SELECT d.user_id, d.amount as deposit, 0 as withdrawal
FROM deposits d
) UNION ALL
(SELECT w.user_id, 0, w.amount
FROM withdrawals w
)
) dw
ON u.id = dw.user_id
GROUP BY u.id;
I have two tables, one is the cost table and the other is the payment table, the cost table contains the cost of product with the product name.
Cost Table
id | cost | name
1 | 100 | A
2 | 200 | B
3 | 200 | A
Payment Table
pid | amount | costID
1 | 10 | 1
2 | 20 | 1
3 | 30 | 2
4 | 50 | 1
Now I have to sum the total of cost by the same name values, and as well sum the total amount of payments by the costID, like the query below
totalTable
name | sum(cost) | sum(amount) |
A | 300 | 80 |
B | 200 | 30 |
However I have been working my way around this using the query below but I think I am doing it very wrong.
SELECT
b.name,
b.sum(cost),
a.sum(amount)
FROM
`Payment Table` a
LEFT JOIN
`Cost Table` b
ON
b.id=a.costID
GROUP by b.name,a.costID
I would be grateful if somebody would help me with my queries or better still an idea as to how to go about it. Thank you
This should work:
select t2.name, sum(t2.cost), coalesce(sum(t1.amount), 0) as amount
from (
select id, name, sum(cost) as cost
from `Cost`
group by id, name
) t2
left join (
select costID, sum(amount) as amount
from `Payment`
group by CostID
) t1 on t2.id = t1.costID
group by t2.name
SQLFiddle
You need do the calculation in separated query and then join them together.
First one is straight forward.
Second one you need to get the name asociated to that payment based in the cost_id
SQL Fiddle Demo
SELECT C.`name`, C.`sum_cost`, COALESCE(P.`sum_amount`,0 ) as `sum_amount`
FROM (
SELECT `name`, SUM(`cost`) as `sum_cost`
FROM `Cost`
GROUP BY `name`
) C
LEFT JOIN (
SELECT `Cost`.`name`, SUM(`Payment`.`amount`) as `sum_amount`
FROM `Payment`
JOIN `Cost`
ON `Payment`.`costID` = `Cost`.`id`
GROUP BY `Cost`.`name`
) P
ON C.`name` = P.`name`
OUTPUT
| name | sum_cost | sum_amount |
|------|----------|------------|
| A | 300 | 80 |
| B | 200 | 30 |
A couple of issues. For one thing, the column references should be qualified, not the aggregate functions.
This is invalid:
table_alias.SUM(column_name)
Should be:
SUM(table_alias.column_name)
This query should return the first two columns you are looking for:
SELECT c.name AS `name`
, SUM(c.cost) AS `sum(cost)`
FROM `Cost Table` c
GROUP BY c.name
ORDER BY c.name
When you introduce a join to another table, like Product Table, where costid is not UNIQUE, you have the potential to produce a (partial) Cartesian product.
To see what that looks like, to see what's happening, remove the GROUP BY and the aggregate SUM() functions, and take a look at the detail rows returned by a query with the join operation.
SELECT c.id AS `c.id`
, c.cost AS `c.cost`
, c.name AS `c.name`
, p.pid AS `p.pid`
, p.amount AS `p.amount`
, p.costid AS `p.costid`
FROM `Cost Table` c
LEFT
JOIN `Payment Table` p
ON p.costid = c.id
ORDER BY c.id, p.pid
That's going to return:
c.id | c.cost | c.name | p.pid | p.amount | p.costid
1 | 100 | A | 1 | 10 | 1
1 | 100 | A | 2 | 20 | 1
1 | 100 | A | 4 | 50 | 1
2 | 200 | B | 3 | 30 | 2
3 | 200 | A | NULL | NULL | NULL
Notice that we are getting three copies of the id=1 row from Cost Table.
So, if we modified that query, adding a GROUP BY c.name, and wrapping c.cost in a SUM() aggregate, we're going to get an inflated value for total cost.
To avoid that, we can aggregate the amount from the Payment Table, so we get only one row for each costid. Then when we do the join operation, we won't be producing duplicate copies of rows from Cost.
Here's a query to aggregate the total amount from the Payment Table, so we get a single row for each costid.
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
That would return:
costid | tot_amount
1 | 80
2 | 30
We can use the results from that query as if it were a table, by making that query an "inline view". In this example, we assign an alias of v to the query results. (In the MySQL venacular, an "inline view" is called a "derived table".)
SELECT c.name AS `name`
, SUM(c.cost) AS `sum_cost`
, IFNULL(SUM(v.tot_amount),0) AS `sum_amount`
FROM `Cost Table` c
LEFT
JOIN ( -- inline view to return total amount by costid
SELECT p.costid
, SUM(p.amount) AS tot_amount
FROM `Payment Table` p
GROUP BY p.costid
ORDER BY p.costid
) v
ON v.costid = c.id
GROUP BY c.name
ORDER BY c.name
Im trying to count 2 different things from 2 different table with one query.
My problem is that one of my joins affects on the other join count.
I want each of the joins to count without any connection to the count from the other join.
Here is the query:
SELECT score.score, u.user_name,
COUNT(mrank.user_id) as rank, COUNT(cr.id) as completedChallenges
FROM user u
LEFT OUTER JOIN challenges_score_user_rel score
ON score.user_id = u.id AND score.challenge_group_id = 0
LEFT OUTER JOIN challenges_score_user_rel mrank
ON mrank.score >= score.score AND mrank.challenge_group_id = 0 AND (SELECT forGym.gym
FROM user forGym WHERE forGym.id = mrank.user_id) = 22
LEFT OUTER JOIN challenges_requests cr
ON u.id = cr.receiver AND cr.status = 3
WHERE u.gym = 22 AND score.score IS NOT NULL GROUP BY u.id ORDER BY score.score DESC LIMIT 20
+------------------+------+---------------------+
| score| user_name | rank | completedChallenges |
+------------------+------+---------------------+
| 999 | A | 3 | 3 |
+------------------+----------------------------+
| 155 | B | 2 | 0 |
+------------------+----------------------------+
| 130 | C | 3 | 0 |
+------------------+----------------------------+
| 24 | D | 4 | 0 |
+------------------+----------------------------+
As you can see from the results I get is that user A is in the first place but got rank 3.
The rank should be number in the ordered by score.
The count for the rank is from this join:
LEFT OUTER JOIN challenges_score_user_rel mrank
ON mrank.score >= score.score AND mrank.challenge_group_id = 0 AND (SELECT forGym.gym
FROM user forGym WHERE forGym.id = mrank.user_id) = 22
If I remove this join:
LEFT OUTER JOIN challenges_requests cr
ON u.id = cr.receiver AND cr.status = 3
The count is fine and I get the correct rank for the all the users.
Why does 2 joins affect each other can I make them count on they own?
The easiest way to solve this is to use count(distinct):
SELECT score.score, u.user_name,
COUNT(distinct mrank.user_id) as rank,
COUNT(distinct cr.id) as completedChallenges
The problem is that you are getting a cartesian product for each user. If the numbers are large, then this is not the best performing solution. In that case, you want to pre-aggregate the data in the from clause or use a correlated subquery in the SELECT clause.
I'm stuck on a rather complex query.
I'm looking to write a query that shows the "top five customers" as well as some key metrics (counts with conditions) about each of those customers. Each of the different metrics uses a totally different join structure.
+-----------+------------+ +-----------+------------+ +-----------+------------+
| customer | | | metricn | | | metricn_lineitem |
+-----------+------------+ +-----------+------------+ +-----------+------------+
| id | Name | | id | customer_id| |id |metricn_id |
| 1 | Customer1 | | 1 | 1 | | 1 | 1 |
| 2 | Customer2 | | 2 | 2 | | 2 | 1 |
+-----------+------------+ +-----------+------------+ +-----------+------------+
The issue this is that I always want to group by this customer table.
I first tried to put all of my joins into the original query, but the query was abysmal with performance. I then tried using subqueries, but I couldn't get them to group by the original hospital id.
Here's a sample query
SELECT
customer.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = customer_id
) as metric_2
FROM customer
GROUP BY customer.name
SORT BY COUNT(metric1.id) DESC
LIMIT 5
Any advice? Thanks!
SELECT name, metric_1, metric_2
FROM customer AS c
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_1
FROM metric1 AS m
INNER JOIN metric1_lineitem AS l ON m.id = l.metric1_id
GROUP BY customer_id) m1
ON m1.customer_id = c.customer_id
LEFT JOIN (SELECT customer_id, COUNT(*) AS metric_2
FROM metric2 AS m
INNER JOIN metric2_lineitem AS l ON m.id = l.metric2_id
GROUP BY customer_id) m1
ON m2.customer_id = c.customer_id
ORDER BY metric_1 DESC
LIMIT 5
You should also avoid using COUNT(columnname) when you can use COUNT(*) instead. The former has to test every value to see if it's null.
Although your data structure may be lousy, your query may not be so bad, with two exceptions. I don't think you need the aggregation on the outer level. Also, the "correlation"s in the where clause (such as metric1.customer_id = customer_id) are not doing anything, because customer_id is coming from the local tables. You need metric1.customer_id = c.customer_id:
SELECT c.name,
(SELECT COUNT(metric1_lineitem.id)
FROM metric1 INNER JOIN
metric1_lineitem
ON metric1_lineitem.metric1_id = metric1.id
WHERE metric1.customer_id = c.customer_id
) as metric_1,
(SELECT COUNT(metric2_lineitem.id)
FROM metric2 INNER JOIN
metric2_lineitem
ON metric2_lineitem.metric2_id = metric2.id
WHERE metric2.customer_id = c.customer_id
) as metric_2
FROM customer c
ORDER BY 1 DESC
LIMIT 5;
How can you make this run faster? One way is to introduce indexes. I would recommend metric1(customer_id), metric2(customer_id), metric1_lineitem(metric1_id) and metric2_lineitem(metric2_id).
This may be faster than the aggregation method (proposed by Barmar) because MySQL is inefficient with aggregations. This should allow the aggregations to take place only using indexes instead of the base tables.
I am working on some data which I got from work and I am trying to come up a query that make my life a far more easier (I took the time to import those data on mysql.)
I have bunch of samples that have different values (Area) at different times (kinda like bell curves), so all I need to do is to add up every peaks (area) before the biggest peak and every peaks (Area) after the biggest peak (so two column in total) at each samples. The task sounds really simple, but I am having a very tough time to come up with aquery that works.
I came up with something like that, but the problem is that I can't do "group by" in the where clause because of returns of multiple rows in the subquery, so I can't compare the values in within the samples. I tried couple different approaches, but none of them are going anywhere. Any helps would be appreciated.
SELECT Sample_name, sum(per_area) As '% area'/*For the areas before the peak.*/
FROM W_data.SEC_results
Where retention between /*retention = time */
0
AND
(( select retention
from W_data.SEC_results
where per_area = (
select max(per_area)
from W_data.SEC_results /* select the largest area in the entire set, instead of a specific samples */
)))
group by vial;
Table:
+----------------------------------+------+-------------+----------+
| Sample_name | vial | retention | per_area |
+----------------------------------+------+-------------+----------+
| a | 74 | 14.146 | 0.08 |
| a | 74 | 16.624 | 99.79 |
| a | 74 | 20.343 | 0.13 |
| b | 75 | 12.438 | 0.16 |
| b | 75 | 13.653 | 1.85 |
| b | 75 | 16.588 | 97.95 |
| b | 75 | 20.316 | 0.04 |
+-------------+----------------+-------------+
| sample_name | Area( before) |Area (after) |
+-------------+----------------+-------------+
| a | 0.08 | 0.13 |
| b | 2.01 | 0.04 |
logic is:-
first find maximum per_area for all vial
select vial,max(per_area) maxarea from sec_results group by vial
| 74 | 99.79 |
| 75 | 97.95 |
then find respective time for them
select sr.vial,sr.time,mt.maxarea from sec_results sr,
(select vial,max(per_area) maxarea from sec_results group by vial) mt
| 74 | 16.624 | 99.79 |
| 75 | 16.588 | 97.95 |
and sum up the values below and above those time seperately.
select a.sample_name,sum(if(a.time<temp.time,a.per_area,0)) Area_before,
sum(if(a.time>temp.time,a.per_area,0)) Area_after
from sec_results a, (select sr.vial,sr.time,mt.maxarea
from sec_results sr,(select vial,max(per_area) maxarea
from sec_results
group by vial) mt
where sr.vial = mt.vial
and sr.per_area = mt.maxarea
) temp
where a.vial = temp.vial
group by a.vial,a.sample_name;
Just been doing this for interest. Was trying to come up with a way to avoid having an IF inside the SUM to get the correct results.
Have managed it, but don't really think it is efficient. But having gone to the effort I thought I would put them here just for interest.
First way, joining against a pair of sub selects each of which get the sums either before or after:-
SELECT DISTINCT a.sample_name, b.AreaBefore, c.AreaAfter
FROM sec_results a
LEFT OUTER JOIN (
SELECT sr.vial, SUM(sr_max.per_area) AS AreaBefore
FROM sec_results sr
INNER JOIN (
SELECT vial, max(per_area) AS maxarea
FROM sec_results
GROUP BY vial) Sub1
ON sr.vial = Sub1.vial
AND sr.per_area = Sub1.maxarea
INNER JOIN sec_results sr_max
ON sr.vial = sr_max.vial
AND sr.retention > sr_max.retention
GROUP BY vial
) b
ON a.vial = b.vial
LEFT OUTER JOIN (
SELECT sr.vial, SUM(sr_max.per_area) AS AreaAfter
FROM sec_results sr
INNER JOIN (
SELECT vial, max(per_area) AS maxarea
FROM sec_results
GROUP BY vial) Sub1
ON sr.vial = Sub1.vial
AND sr.per_area = Sub1.maxarea
INNER JOIN sec_results sr_max
ON sr.vial = sr_max.vial
AND sr.retention < sr_max.retention
GROUP BY vial
) c
ON a.vial = c.vial
2nd way, which is using the subselects to get the sums of the before (or after) records of every record, then joining that against a sub select to get the max one.
SELECT a_sec_result.vial, a_sec_result.sample_name, area_before.area AS AreaBefore, area_after.area AS AreaAfter
FROM (
SELECT sr.vial, sr.sample_name, sr.retention
FROM sec_results sr
INNER JOIN (
SELECT vial, sample_name, max(per_area) AS maxarea
FROM sec_results
GROUP BY vial, sample_name
) max_area_sub
ON sr.vial = max_area_sub.vial
AND sr.sample_name = max_area_sub.sample_name
AND sr.per_area = max_area_sub.maxarea
) a_sec_result
INNER JOIN(
SELECT sr.vial, sr.retention, SUM(sr2.per_area) AS area
FROM sec_results sr
LEFT OUTER JOIN sec_results sr2
ON sr.vial = sr2.vial
AND sr.retention > sr2.retention
GROUP BY sr.vial, sr.retention
) area_before
ON a_sec_result.vial = area_before.vial
AND a_sec_result.retention = area_before.retention
INNER JOIN(
SELECT sr.vial, sr.retention, SUM(sr2.per_area) AS area
FROM sec_results sr
LEFT OUTER JOIN sec_results sr2
ON sr.vial = sr2.vial
AND sr.retention < sr2.retention
GROUP BY sr.vial, sr.retention
) area_after
ON a_sec_result.vial = area_after.vial
AND a_sec_result.retention = area_after.retention
Both should give the right result.