SUM(DISTINCT) Based on Other Columns

SUM(DISTINCT) Based on Other Columns - mysql

I currently have a table that looks something like this:
+------+-------+------------+------------+
| id | rate | first_name | last_name |
+------+-------+------------+------------+
What I need to do is get the SUM of the rate column, but only once for each name. For example, I have three rows of name John Doe, each with rate 8. I need the SUM of those rows to be 8, not 24, so it counts the rate once for each group of names.
SUM(DISTINCT last_name, first_name) would not work, of course, because I'm trying to sum the rate column, not the names. I know when counting individual records, I can use COUNT(DISTINCT last_name, first_name), and that is the type of behavior I am trying to get from SUM.
How can I get just SUM one rate for each name?
Thanks in advance!

select sum (rate)
from yourTable
group by first_name, last_name
Edit
If you want to get all sum of those little "sums", you will get a sum of all table..
Select sum(rate) from YourTable
but, if for some reason are differents (if you use a where, for example)
and you need a sum for that select above, just do.
select sum(SumGrouped) from
( select sum (rate) as 'SumGrouped'
from yourTable
group by first_name, last_name) T1

David said he found his answer as such:
SELECT SUM(rate) FROM (SELECT * FROM records GROUP BY last_name, first_name) T1
But when you do the GROUP BY in the inner query, I think you have to use aggregate functions in your SELECT. So, I think the answer is more like:
SELECT SUM(rate) FROM (SELECT MAX(rate) AS rate FROM records GROUP BY last_name, first_name) T1
I picked MAX() to pick only one "rate" for a "last_name, first_name" combination but MIN() should work the same, assuming that the "last_name, first_name" always leads us to the same "rate" even when it happens multiple times in the table. This seems to be David's original assumption - that for a unique name we want to grab the rate only once because we know it will be the same.

You can do this by making the values you are summing distinct. This is possible but is very very ugly.
First, you can turn a string into a number by taking a hash. The SQL below does an MD5 hash of the first and last name, which returns 32 hexadecimal digits. SUBSTRING takes the first 8 of these, and CONV turns that into a 10 digit number (it's theoretically possible this won't be unique):
CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)
Then you divide that by a very big number and add it to the rate. You'll end up with a rate like 8.0000019351087950. You have to use FORMAT to avoid MySQL truncating the decimal places. This rate will now be unique for each first name and last name.
FORMAT(rate + CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)/1000000000000000, 16)
And then if you do the SUM DISTINCT over that it will only count the 8 once. Then you need to FLOOR the result to get rid of the extra decimal places:
FLOOR(SUM(DISTINCT FORMAT(rate + CONV(SUBSTRING(MD5(CONCAT(first_name,last_name)), 1, 8), 16, 10)/1000000000000000, 16)))
I found this approach while doing a much more complicated query which joined and grouped several tables. I'm still not sure if I'll use it as it is pretty horrible, but it does work. It's also 6 years too late to be of any use to the person who answered the question.

SELECT SUM(rate)
FROM [TABLE]
GROUP BY first_name, last_name;

Recently, I came across a similar problem, but with the exception that I already had a GROUP BY clause for a different purpose. Here is an example:
SELECT r.name, SUM(r.rate), MIN(e.created_at)
FROM Rates r LEFT JOIN Events e ON r.id = e.rate_id
GROUP BY r.id
The problem here is that because of JOIN with Event SUM(r.rate) would sum duplicates for entries with multiple Events. In my case the query was a lot more complicated, so I wanted to avoid having extra subqueries. Luckily, there is an elegant solution:
SELECT r.name, SUM(r.rate) / GREATEST(COUNT(DISTINCT e.event_id), 1), MIN(e.created_at)
FROM Rates r LEFT JOIN Events e ON r.id = e.rate_id
GROUP BY r.id
GREATEST function is used to prevent division by zero for entries without any Events. If you are summing integers, you also might want to CAST the sums to INT

SELECT SUM(rate)
FROM [TABLE]
GROUP BY CONCAT_WS(' ', first_name, last_name);

You can use any of the above code sample provided since with group by clause without any aggregate function will return an indeterminate one record for each grouping condition. You can refer http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html link for further reading.

I found this thread looking for a better way to my solution, but i still didn't find a better one:
SELECT SUM(rate) FROM (SELECT DISTINCT rate, first_name, last_name) Q

Related

MySQL How many distinct values are there in a column and what the # of occurences of each one of them?

As the titles say. I am already able to calculate the DISTINCT values in a column, but not sure how to calculate their occurences so that I get a one line code that answers both questions at the same time.
I have solved this by two queries:
SELECT COUNT(DISTINCT GovernmentForm) FROM country;
SELECT GovernmentForm, COUNT(*) FROM country GROUP BY GovernmentForm;
But I want to write a code that solves both questions in one query.

This will work for you:
SELECT t.fieldname , sum(1) as occurences
FROM YourTable t
GROUP BY t.fieldname;

How to remove a subquery from the FROM field in MySQL

I'm new to MySQL and databases and I've seen in many places that it is not considered a good programming practice to use subqueries in the FROM field of SELECT in MySQL, like this:
select userid, avg(num_pages)
from (select userid , pageid , regid , count(regid) as num_pages
from reg_pag
where ativa = 1
group by userid, pageid) as from_query
group by userid;
which calculates the average number of registers per page that the users have.
The reg_page table looks like this:
Questions:
How to change the query so that it doesn't use the subquery in the FROM field?
And is there a general way to "flatten" queries like this?

The average number of registers per page per user can also be calculated as number of registers per user divided by number of pages per user. Use count distinct to count only distinct psgeids per user:
select userid, count(regid) / count(distinct pageid) as avg_regs
from reg_pag
where ativa=1
group by userid;
There is no general way of flattening such queries. It may not even be possible to flatten some of them, otherwise there would be little point in having this feature in the first place. Do not get scared of using subqueries in the from clause, in some occasions they may be even more effective, than a flattened query. But this is vendor and even version specific.

One way is to use count(distinct):
select userid, count(*) / count(distinct pageid)
from reg_pag
where ativa = 1
group by userid;

access get top and last rows

let's say I have a table CData with the columns CName, Amount1, Amount2.
Now I want to use a query to get calculate the difference between Amount1 and Amount2 for each distinct CName and, as a result of the query, get the ~1000 rows with the biggest difference and the 1000~ rows with the smallest (or most negative) difference. It doesn't matter if the results come in one table or two.
1) I am aware of the function TOP and so I could do this with two queries and sort by Difference (once ascending, once descending). Is there a way to do this in one query, though? This would save some time.
2) General question: When I define a field in my query (in this example "Difference"), can I somehow use it to, for example, sort the data by it? Like this (well, it's not working, but to give you an idea of what I mean):
SELECT CData.CName, CData.Amount2-CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY Difference
Or do I always have to do the following:
...
ORDER BY CData.Amount2-CData.Amount1
Not much of a difference in this example, I just wanted to know if that's possible in general.

Sort the first time ASC (Ascending) and the second time DESC (Descending)
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 ASC
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 DESC

which aggregate functino do you want to perform for your differences? Avg? Sum?
SELECT CName, avg(Amount2-Amount1) AS Difference
FROM CData
GROUP BY CName
btw, to do it in 'one' query, you could use a union query on two subqueries, one with the TOP 1000 asc, one with the TOP 1000 desc
looks like Access is not allowing you to use an alias in the ORDER BY Clause, if you use the QBE grid you can change the format from the UI to SQL and it repeats the calculation in the ORDER BYclause.
Hi, John.
Check out the SO tour for instructions on how to use options such as formatting code.
Not sure if this will work for you, but you can try something like:
select * from
(SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, 99999-Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale
UNION
SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale) u
order by
srt

Mysql COUNT, GROUP BY and ORDER BY

This sounds quite simple but I just can't figure it out.
I have a table orders (id, username, telephone_number).
I want to get number of orders from one user by comparing the last 8 numbers in telephone_number.
I tried using SUBSTR(telephone_number, -8), I've searched and experimented a lot, but still I can't get it to work.
Any suggestions?

Untested:
SELECT
COUNT(*) AS cnt,
*
FROM
Orders
GROUP BY
SUBSTR(telephone_number, -8)
ORDER BY
cnt DESC
The idea:
Select COUNT(*) (i.e., number of rows in each GROUPing) and all fields from Orders (*)
GROUP by the last eight digits of telephone_number1
Optionally, ORDER by number of rows in GROUPing descending.
1) If you plan to do this type of query often, some kind of index on the last part of the phone number could be desirable. How this could be best implemented depends on the concrete values stored in the field.

//Memory intensive.
SELECT COUNT(*) FROM `orders` WHERE REGEXP `telephone_number` = '(.*?)12345678'
OR
//The same, but better and quicker.
SELECT COUNT(*) FROM `orders` WHERE `telephone_number` LIKE '%12345678'

You can use the below query to get last 8 characters from a column values.
select right(rtrim(First_Name),8) FROM [ated].[dbo].[Employee]

Will grouping an ordered table always return the first row? MYSQL

I'm writing a query where I group a selection of rows to find the MIN value for one of the columns.
I'd also like to return the other column values associated with the MIN row returned.
e.g
ID QTY PRODUCT TYPE
--------------------
1 2 Orange Fruit
2 4 Banana Fruit
3 3 Apple Fruit
If I GROUP this table by the column 'TYPE' and select the MIN qty, it won't return the corresponding product for the MIN row which in the case above is 'Apple'.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
Thanks in advance.

Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
No, this is not the correct approach.
I believe you are talking about a query like this:
SELECT product.*, MIN(qty)
FROM product
GROUP BY
type
ORDER BY
qty
What you are doing here is using MySQL's extension that allows you to select unaggregated/ungrouped columns in a GROUP BY query.
This is mostly used in the queries containing both a JOIN and a GROUP BY on a PRIMARY KEY, like this:
SELECT order.id, order.customer, SUM(price)
FROM order
JOIN orderline
ON orderline.order_id = order.id
GROUP BY
order.id
Here, order.customer is neither grouped nor aggregated, but since you are grouping on order.id, it is guaranteed to have the same value within each group.
In your case, all values of qty have different values within the group.
It is not guaranteed from which record within the group the engine will take the value.
You should do this:
SELECT p.*
FROM (
SELECT DISTINCT type
FROM product p
) pd
JOIN p
ON p.id =
(
SELECT pi.id
FROM product pi
WHERE pi.type = pd.type
ORDER BY
type, qty, id
LIMIT 1
)
If you create an index on product (type, qty, id), this query will work fast.

It's difficult to follow you properly without an example of the query you try.
From your comments I guess you query something like,
SELECT ID, COUNT(*) AS QTY, PRODUCT_TYPE
FROM PRODUCTS
GROUP BY PRODUCT_TYPE
ORDER BY COUNT(*) DESC;
My advice, you group by concept (in this case PRODUCT_TYPE) and you order by the times it appears count(*). The query above would do what you want.
The sub-queries are mostly for sorting or dismissing rows that are not interested.
The MIN you look is not exactly a MIN, it is an occurrence and you want to see first the one who gives less occurrences (meaning appears less times, I guess).
Cheers,

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SUM(DISTINCT) Based on Other Columns - mysql

SELECT SUM(rate) FROM [TABLE] GROUP BY first_name, last_name;

SELECT SUM(rate) FROM [TABLE] GROUP BY CONCAT_WS(' ', first_name, last_name);

You can use any of the above code sample provided since with group by clause without any aggregate function will return an indeterminate one record for each grouping condition. You can refer http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html link for further reading.

I found this thread looking for a better way to my solution, but i still didn't find a better one: SELECT SUM(rate) FROM (SELECT DISTINCT rate, first_name, last_name) Q

Related

MySQL How many distinct values are there in a column and what the # of occurences of each one of them?

How to remove a subquery from the FROM field in MySQL

access get top and last rows

Mysql COUNT, GROUP BY and ORDER BY

Will grouping an ordered table always return the first row? MYSQL

Categories

Resources