Sum Distinct Duplicated Values

Sum Distinct Duplicated Values - mysql

I have a dataset as below:
customer buy profit
a laptop 350
a mobile 350
b laptop case 50
c laptop 200
c mouse 200
It does not matter how many rows the customer has, the profit is already stated in the row (it's already an accumulative sum). For example, the profit of customer a is 350 and the profit of customer c is 200.
I would like to sum uniquely the profit for all the customers so the desire output should be 350 + 50 + 200 = 600. However, I need to execute this in one line of code (without doing subquery, nested query or write in a separate CTE).
I tried with Partition By but cannot combine MAX and SUM together. I also tried SUM (DISTINCT) but it does not work either
MAX(profit) OVER (PARTITION BY customer)
If anyone could give me a hint on an approach to tackle this, it would be highly appreciated.

You really should use a subquery here:
SELECT SUM(profit) AS total_profit
FROM (SELECT DISTINCT customer, profit FROM yourTable) t;
By the way, your table design should probably change such that you are not storing the redundant profit per customer across many different records.

You can combine SUM() window function with MAX() aggregate function:
SELECT DISTINCT SUM(MAX(profit)) OVER () total
FROM tablename
GROUP BY customer;
See the demo.

Select sum(distinct profit) as sum from Table

You can select max profit of each customer like this:
SELECT customer, MAX(profit) AS max_profit
FROM tablename
GROUP BY customer
Then you can summarise the result in your code or even in the query as nested queries:
SELECT SUM(max_profit) FROM (
SELECT customer, MAX(profit) AS max_profit
FROM tablename
GROUP BY customer
) AS temptable

Related

Aggregate function in BETWEEN and AND

I have joined 3 tables in my query. In my Inventory db,Price is taken from table c and quantity is taken from table b. How can I show the records list of users who have ordered between the given value and maximum value of the column.
I am using below query in mysql to retrieve records. As expected it shows error. Any help will be highly appreciated
SELECT .... GROUP BY userid HAVING SUM(c.`price` * b.`quantity`) BETWEEN 9000 AND MAX(SUM(c.`price` * b.`quantity`))

If I understand correctly you don't need BETWEEN. Try it this way
SELECT ....
GROUP BY userid
HAVING SUM(c.`price` * b.`quantity`) >= 9000
In case you wondered you can't chain aggregate functions. And even if you could it wouldn't make sense because you group by userid, but trying to get MAX of SUM from all users. In order for this to work you should've used a subquery to get max value e.g.
SELECT ....
GROUP BY userid
HAVING SUM(c.`price` * b.`quantity`) =
(
SELECT MAX(total) total
FROM
(
SELECT SUM(c.`price` * b.`quantity`) total
GROUP BY userid
) q
)

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.

SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON

Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason

In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason

The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle

Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;

MySQL - Calculating Net Dues against two tables

Thanks in advance for any assistance. I am new to SQL and have looked at several related threads on this site, and numerous other sites on Google, but have not been able to figure out what I am doing wrong. I have looked at sub-selects, various JOIN options, and keeping bumping into the wrong solution/result.
I have two tables that I am trying to do a query on.
Table:Doctors
idDoctors
PracticeID
FirstName
LastName
Table: Vendor Sales
Id
ProductSales
SalesCommission
DoctorFirstName
DoctorLastName
Here is the Query I am struggling with:
SELECT t1.PracticeID
, SUM( t2.ProductSales ) AS Total_Sales
, COUNT( t1.LastName ) AS Doctor_Count
, COUNT( t1.LastName ) *150 AS Dues
, SUM( t2.ProductSales * t2.SalesCommission ) AS Credit
FROM Doctors AS t1
JOIN VendorSales AS t2 ON t1.Lastname = t2.DoctorLastName
GROUP BY t1.PracticeID
LIMIT 0 , 30
The objective of the Query is to calculate net dues owed by a Practice. I am not yet attempting to calculate the net amount, just trying to get the initial calculations correct.
Result (limited to one result for this example)
PracticeID Total_Sales Doctor_Count Dues Credit
Practice A 16583.04 4 600 304.07360
This is what the result should be:
PracticeID Total_Sales Doctor_Count Dues Credit
Practice A 16583.04 3 450 304.07360
The problem is that Total Sales sums the aggregate sales transactions (in this case 4 sales entries totaling 16584.04). Each of the 4 sales has an associated commission rate. The Credit amount is the total (sum) of the commission.
The sales and credit numbers are accurate. But the Doctor count should be 3 (number of Doctors in the practice). Dues should be $450 (150x3). But as you can see it is multiplying by 4 instead of 3.
What do I need to change in the query to get the proper calculations (Doctors and dues multiplied by 3 instead of 4? Or should I be doing this differently? Thanks again.

There are various odd things about your schema, and you have not provided the sample data to justify your asserted values.
The first oddity is that you have both first and last name for the doctor in the Doctors table, and in the Vendor Sales table - yet you join only on the last name. Next, you have an ID column, it seems, in the Doctors table, yet you do not use that in the Vendor Sales table for the joining column.
It is not clear whether there is one entry in the Vendor Sales table per doctor, or whether there can be several. Given the counting issues you describe, we must assume there can be several entries per doctor in the Vendor Sales table. It also isn't clear where the vendor is identified, but we have to assume that is not germane to the problem.
So, one set of data you need is the number of doctors per practice, and the dues (which is 150 currency units per doctor). Let's deal with that first:
SELECT PracticeID, COUNT(*) AS NumDoctors, COUNT(*) * 150 AS Dues
FROM Doctors
GROUP BY PracticeID
Then we need the total sales per practice, and the credit too:
SELECT t1.PracticeID, SUM(t2.ProductSales) AS Total_Sales,
SUM(t2.ProductSales * t2.SalesCommission) AS Credit
FROM Doctors AS t1
JOIN VendorSales AS t2 ON t1.Lastname = t2.DoctorLastName
GROUP BY t1.PracticeID
These two partial answers need to be joined on the PracticeID to produce your final result:
SELECT r1.PracticeID, r1.NumDoctors, r1.Dues,
r2.Total_Sales, r2. Credit
FROM (SELECT PracticeID, COUNT(*) AS NumDoctors, COUNT(*) * 150 AS Dues
FROM Doctors
GROUP BY PracticeID) AS r1
JOIN (SELECT t1.PracticeID, SUM(t2.ProductSales) AS Total_Sales,
SUM(t2.ProductSales * t2.SalesCommission) AS Credit
FROM Doctors AS t1
JOIN VendorSales AS t2 ON t1.Lastname = t2.DoctorLastName
GROUP BY t1.PracticeID) AS r2
ON r1.PracticeID = r2.PracticeID;
That should get you the result you seek, I believe. But it is untested SQL - not least because you didn't give us appropriate sample data to work with.

MYSQL Sum results of a calculation

I am building a query in mysql 5.0 to calculate a student semester grade. The initial table (studentItemGrades) contains the list of assignments etc which will be used to calculate the final grade. Each assignment has a PossibleScore, Grade and Weight. The calculation should group all similarly weighted items, and provide the SUM(GRADE)/SUM(POSSIBLESCORE) based on a date range of when the assignment was due. The problem I am encountering is the final summation of all the individual weighted grades. For example, the results currently produce the following:
CourseScheduleID sDBID AssignedDate DueDate Weight WeightedGrade
1 519 2010-08-26 2010-08-30 10 0.0783333333333333
1 519 2010-09-01 2010-09-03 20 0.176
1 519 2010-09-01 2010-09-10 70 0.574
from the query:
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM(PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades
WHERE DueDate>='2010-08-23'
AND DueDate<='2010-09-10'
AND CourseScheduleID=1
AND sDBID=519
AND Status>0
GROUP BY Weight
The question: How do I now SUM the three results in the WeighedGrade output? And by the way, this is part of a much larger query for calculating all grades for all courses on a particular campus.
Thanks in advance for your help.

You can use a subquery, like so:
SELECT SUM(WeightedGrade) FROM
(
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM(PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades
WHERE DueDate>='2010-08-23'
AND DueDate<='2010-09-10'
AND CourseScheduleID=1
AND sDBID=519
AND Status>0
GROUP BY Weight
) t1

In order to sum the three results, you would need to requery the results of this select using another select with a group by. This could be done using a single sql statement by using subqueries.
SELECT sq.CourseScheduleID, sq.sDBID, SUM(sq.WeightedGrade) as FinalGrade
FROM
(
SELECT CourseScheduleID, sDBID, AssignedDate, DueDate, Weight,
((SUM(Grade)/SUM (PossibleScore))*(Weight/100)) AS WeightedGrade
FROM studentItemGrades WHERE DueDate>='2010-08-23' AND DueDate<='2010-09-10'
AND CourseScheduleID=1 AND sDBID=519 AND Status>0 GROUP BY Weight
) AS sq
GROUP BY sq.CourseScheduleID, sq.sDBID

Do aggregate MySQL functions always return a single row?

I'm sorry if this is really basic, but:
I feel at some point I didn't have this issue, and now I am, so either I was doing something totally different before or my syntax has skipped a step.
I have, for example, a query that I need to return all rows with certain data along with another column that has the total of one of those columns. If things worked as I expected them, it would look like:
SELECT
order_id,
cost,
part_id,
SUM(cost) AS total
FROM orders
WHERE order_date BETWEEN xxx AND yyy
And I would get all the rows with my orders, with the total tacked on to the end of each one. I know the total would be the same each time, but that's expected. Right now to get that to work I'm using:
SELECT
order_id,
cost,
part_id,
(SELECT SUM(cost)
FROM orders
WHERE order_date BETWEEN xxx AND yyy) AS total
FROM orders
WHERE order_date BETWEEN xxx AND yyy
Essentially running the same query twice, once for the total, once for the other data. But if I wanted, say, the SUM and, I dunno, the average cost, I'd then be doing the same query 3 times, and that seems really wrong, which is why I'm thinking I'm making some really basic mistake.
Any help is really appreciated.

You need to use GROUP BY as such to get your desired result:
SELECT
order_id,
part_id,
SUM(cost) AS total
FROM orders
WHERE order_date BETWEEN xxx AND yyy
GROUP BY order_id, part_id
This will group your results. Note that since I assume that order_id and part_id is a compound PK, SUM(cost) in the above will probably be = cost (since you a grouping by a combination of two fields which is guarantied to be unique. The correlated subquery below will overcome this limitation).
Any non-aggregate rows fetched needs to be specified in the GROUP BY row.
For more information, you can read a tutorial about GROUP BY here:
MySQL Tutorial - Group By
EDIT: If you want to use a column as both aggregate and non-aggregate, or if you need to desegregate your groups, you will need to use a subquery as such:
SELECT
or1.order_id,
or1.cost,
or1.part_id,
(
SELECT SUM(cost)
FROM orders or2
WHERE or1.order_id = or2.order_id
GROUP BY or2.order_id
) AS total
FROM orders or1
WHERE or1.order_date BETWEEN xxx AND yyy

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008