How to GROUP BY 2 different columns together - mysql

I have 2 columns having users id participating in a transaction, source_id and destination_id. I'm building a function to sum all transactions grouped by any user participating on it, either as source or as destination.
The problem is, when I do:
select count (*) from transactions group by source_id, destination_id
it will first group by source, then by destination, I want to group them together. Is it possible using only SQL?
Sample Data
source_user_id destination_user_id
1 4
3 4
4 1
3 2
Desired result:
Id Count
4 - 3 (4 appears 3 times in any of the columns)
3 - 2 (3 appears 2 times in any of the columns)
1 - 2 (1 appear 2 times in any of the columns)
2 - 1 (1 appear 1 time in any of the columns)
As you can see on the example result, I want to know the number of times an id will appear in any of the 2 fields.

Use union all to get the id's into one column and get the counts.
select id,count(*)
from (select source_id as id from tbl
union all
select destination_id from tbl
) t
group by id
order by count(*) desc,id

edited to add: Thank you for clarifying your question. The following isn't what you need.
Sounds like you want to use the concatenate function.
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
GROUP BY CONCAT(source_id,"_",destination_id)
The underscore is intended to distinguish "source_id=1, destination_id=11" from "source_id=11, destination_id=1". (We want them to be 1_11 and 11_1 respectively.) If you expect these IDs to contain underscores, you'd have to handle this differently, but I assume they're integers.

It may look like this.
Select id, count(total ) from
(select source_id as id, count (destination_user_id) as total from transactions group by source_id
union
select destination_user_id as id , count (source_id) as total from transactions group by destination_user_id ) q group by id

Related

Possible to count number of occurrences in a "group" in MySQL?

Sorry if the title is misleading, I don't really know the terminology for what I want to accomplish. But let's consider this table:
CREATE TABLE entries (
id INT NOT NULL,
number INT NOT NULL
);
Let's say it contains four numbers associated with each id, like this:
id number
1 0
1 9
1 17
1 11
2 5
2 8
2 9
2 0
.
.
.
Is it possible, with a SQL-query only, to count the numbers of matches for any two given numbers (tuples) associated with a id?
Let's say I want to count the number of occurrences of number 0 and 9 that is associated with a unique id. In the sample data above 0 and 9 does occur two times (one time where id=1 and one time where id=2). I can't think of how to write a SQL-query that solves this. Is it possible? Maybe my table structure is wrong, but that's how my data is organized right now.
I have tried sub-queries, unions, joins and everything else, but haven't found a way yet.
You can use GROUP BY and HAVING clauses:
SELECT COUNT(s.id)
FROM(
SELECT t.id
FROM YourTable t
WHERE t.number in(0,9)
GROUP BY t.id
HAVING COUNT(distinct t.number) = 2) s
Or with EXISTS():
SELECT COUNT(distinct t.id)
FROM YourTable t
WHERE EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.id IN(0,9)
HAVING COUNT(distinct s.number) = 2)

Select Distinct whilst adding tuples together in SQL

I have a table that contains random data against a key with duplicate entries. I'm looking to remove the duplicates (a projection as it is called in relational algebra), but rather than discarding the attached data, sum it together. For example:
orderID cost
1 5
1 2
1 10
2 3
2 3
3 15
Should remove duplicates from orderID whilst summing each orderID's values:
orderID cost
1 17 (5 + 2 + 10)
2 6
3 15
My assumption is I'd use SELECT DISTINCT somehow, but I don't know how I'd go about doing so. I understand GROUP BY might be able to do something but I am unsure.
This is a very basic aggregation:
SELECT orderId, SUM(cost) AS cost
FROM MyTable
GROUP BY orderId
This says, for each "orderId" grouping, sum the "cost" field and return one value per group.
You can use the group by clause to get one row per distinct values of the column(s) you're grouping by - orderId in this case. You can the apply an aggregate function to get a result of the columns you aren't grouping by - sum, in this case:
SELECT orderId, SUM(cost)
FROM mytable
GROUP BY orderId

Select redundant rows only, not the original

So I'm tasked with cleaning up a system that has generated redundant orders.
Data example of the problem
ORDER ID, SERIAL, ...
1 1
2 1
3 2
4 2
5 3
6 3
7 3
The above data shows that 2 orders were generated with serial 1, 2 orders with serial 2, and 3 orders with serial 3. This is not allowed, and there should be only one order per serial.
So I need a query that can identify the REDUNDANT orders ONLY. I'd like the query to exclude the original order.
So the output from the above data should be:
REDUNDANT ORDER IDS
2
4
6
7
I can easily identify which orders have duplicates using GROUP BY and HAVING COUNT(*) > 1 but the tricky part comes with removing the original.
Is it even possible?
Any help is greatly appreciated.
As posted in the comments, here's one way to achieve this:
SELECT T1.ORDER_ID as redundant
FROM thetable T1
LEFT JOIN
(
SELECT SERIAL, MIN(ORDER_ID) AS firstorder
FROM thetable
GROUP BY SERIAL
HAVING COUNT(*) > 1
) T2 ON T1.ORDER_ID=T2.firstorder
WHERE T2.firstorder IS NULL
SQL Fiddle

Mysql group by two columns and pick the maximum value of third column

I have a table that has user_id, item_id and interaction_type as columns. interaction_type could be 0, 1,2,3,4 or 5. However, for some user_id and item_id pairs, we might have multiple interaction_types. For example, we might have:
user_id item_id interaction_type
2 3 1
2 3 0
2 3 5
4 1 0
5 4 4
5 4 2
What I want is to only keep the maximum interaction_type if there are multiples. So I want this:
user_id item_id interaction_type
2 3 5
4 1 0
5 4 4
Here is the query I wrote for this purpose:
select user_id, item_id, max(interaction_type) as max_type
from mytable
group by user_id, item_id;
But the result is weird. For example, in the original table I have 100000 rows with interaction_type=5 but in the result table I have only 2000. How is this possible as the max will pick 5 between every comparison that contains 5 and therefore I shouldn't have fewer 5 in the result table.
Your query is fine. The reason you are getting 2000 rows is because you are getting one row for every unique pair of values user_id, item_id.
If you want to see the interaction types going into each row then use:
select user_id, item_id, max(interaction_type) as max_type,
group_concat(distinct interaction_type) as interaction_types,
count(*) as cnt
from mytable
group by user_id, item_id;
It occurs to me that you want all rows with the maximum interaction type. If so, calculate the maximum and then find all rows that match that value:
select t.*
from mytable t cross join
(select max(interaction_type) as maxit from mytable) x
on x.maxit = t.interaction_type;
No group by is needed for this query.

MySQL Conditional count based on a value in another column

I have table that looks like this:
id rank
a 2
a 1
b 4
b 3
c 7
d 1
d 1
e 9
I need to get all the distinct rank values on one column and count of all the unique id's that have reached equal or higher rank than in the first column.
So the result I need would be something like this:
rank count
1 5
2 4
3 3
4 3
7 2
9 1
I've been able to make a table with all the unique id's with their max rank:
SELECT
MAX(rank) AS 'TopRank',
id
FROM myTable
GROUP BY id
I'm also able to get all the distinct rank values and count how many id's have reached exactly that rank:
SELECT
DISTINCT TopRank AS 'rank',
COUNT(id) AS 'count of id'
FROM
(SELECT
MAX(rank) AS 'TopRank',
id
FROM myTable
GROUP BY id) tableDerp
GROUP BY TopRank
ORDER BY TopRank ASC
But I don't know how to get count of id's where the rank is equal OR HIGHER than the rank in column 1. Trying SUM(CASE WHEN TopRank > TopRank THEN 1 END) naturally gives me nothing. So how can I get the count of id's where the TopRank is higher or equal to each distinct rank value? Or am I looking in the wrong way and should try something like running totals instead? I tried to look for similar questions but I think I'm completely on a wrong trail here since I couldn't find any and this seems a pretty simple problem that I'm just overthinking somehow. Any help much appreciated.
One approach is to use a correlated subquery. Just get the list of ranks and then use a correlated subquery to get the count you are looking for:
SELECT r.rank,
(SELECT COUNT(DISTINCT t2.id)
FROM myTable t2
WHERE t2.rank >= r.rank
) as cnt
FROM (SELECT DISTINCT rank FROM myTable) r;