Select redundant rows only, not the original - mysql

So I'm tasked with cleaning up a system that has generated redundant orders.
Data example of the problem
ORDER ID, SERIAL, ...
1 1
2 1
3 2
4 2
5 3
6 3
7 3
The above data shows that 2 orders were generated with serial 1, 2 orders with serial 2, and 3 orders with serial 3. This is not allowed, and there should be only one order per serial.
So I need a query that can identify the REDUNDANT orders ONLY. I'd like the query to exclude the original order.
So the output from the above data should be:
REDUNDANT ORDER IDS
2
4
6
7
I can easily identify which orders have duplicates using GROUP BY and HAVING COUNT(*) > 1 but the tricky part comes with removing the original.
Is it even possible?
Any help is greatly appreciated.

As posted in the comments, here's one way to achieve this:
SELECT T1.ORDER_ID as redundant
FROM thetable T1
LEFT JOIN
(
SELECT SERIAL, MIN(ORDER_ID) AS firstorder
FROM thetable
GROUP BY SERIAL
HAVING COUNT(*) > 1
) T2 ON T1.ORDER_ID=T2.firstorder
WHERE T2.firstorder IS NULL
SQL Fiddle

Related

Possible to count number of occurrences in a "group" in MySQL?

Sorry if the title is misleading, I don't really know the terminology for what I want to accomplish. But let's consider this table:
CREATE TABLE entries (
id INT NOT NULL,
number INT NOT NULL
);
Let's say it contains four numbers associated with each id, like this:
id number
1 0
1 9
1 17
1 11
2 5
2 8
2 9
2 0
.
.
.
Is it possible, with a SQL-query only, to count the numbers of matches for any two given numbers (tuples) associated with a id?
Let's say I want to count the number of occurrences of number 0 and 9 that is associated with a unique id. In the sample data above 0 and 9 does occur two times (one time where id=1 and one time where id=2). I can't think of how to write a SQL-query that solves this. Is it possible? Maybe my table structure is wrong, but that's how my data is organized right now.
I have tried sub-queries, unions, joins and everything else, but haven't found a way yet.
You can use GROUP BY and HAVING clauses:
SELECT COUNT(s.id)
FROM(
SELECT t.id
FROM YourTable t
WHERE t.number in(0,9)
GROUP BY t.id
HAVING COUNT(distinct t.number) = 2) s
Or with EXISTS():
SELECT COUNT(distinct t.id)
FROM YourTable t
WHERE EXISTS(SELECT 1 FROM YourTable s
WHERE t.id = s.id and s.id IN(0,9)
HAVING COUNT(distinct s.number) = 2)

How to select all instance of an ID from a reference table based on whether it contains at least one instance of another ID from a different column?

I think my question is best put analogously through the example below.
Suppose I have the following table that contains information about classes and students in a school.
class_id
student_id
1
3
1
6
1
2
1
6
2
4
2
7
3
6
3
3
3
2
3
5
3
1
I would like to retrieve all students and classes that contain the students with ID's 3 and 6, i.e. my result would be:
class_id
student_id
1
3
1
6
1
2
1
6
3
6
3
3
3
2
3
5
3
1
This can be achieved with the following query...
SELECT
*
FROM
table_name
WHERE
class_id IN (SELECT
class_id
FROM
table_name
WHERE
student_id IN (3 , 6)
GROUP BY class_id
HAVING COUNT(DISTINCT student_id) = 2);
I was wondering whether I could achieve the above without the use of a subquery? Speed is of the upmost importance and so I would like to minimise the duration of this select query.
Attempting to avoid sub-queries can lead you down the wrong path. You can indeed write bad queries with sub-queries, but you can write bad queries without them too, as well as perfectly good queries with them.
In your case, the sub-query is being used as a separate scope to find the list of classes of interest. Doing that in the same scope as you also get those classes' students is likely to be messy and inefficient.
For example, an alternative to using IN() still requires a sub-query...
SELECT
student.*
FROM
table_name AS student
INNER JOIN
(
SELECT
class_id
FROM
table_name
WHERE
student_id IN (3 , 6)
GROUP BY
class_id
HAVING
COUNT(DISTINCT student_id) = 2
)
AS class
ON class.class_id = student.class_id
That, however, may yield a similar (or even identical) plan to the query you already have.
The closest I can get to "without a sub-query" is to use a nested query, which you can argue is still a sub-query but is easier for the optimiser to expand.
(Uses MySQL 8+)
SELECT
*
FROM
(
SELECT
*,
SUM(CASE WHEN student_id IN (3,6) THEN 1 END)
OVER (PARTITION BY class_id)
AS target_member_count
FROM
table_name
)
scanned
WHERE
target_member_count = 2
I would expect that to be slower in the case that most classes don't contain students 3 and/or 6. But you could try it.

How to GROUP BY 2 different columns together

I have 2 columns having users id participating in a transaction, source_id and destination_id. I'm building a function to sum all transactions grouped by any user participating on it, either as source or as destination.
The problem is, when I do:
select count (*) from transactions group by source_id, destination_id
it will first group by source, then by destination, I want to group them together. Is it possible using only SQL?
Sample Data
source_user_id destination_user_id
1 4
3 4
4 1
3 2
Desired result:
Id Count
4 - 3 (4 appears 3 times in any of the columns)
3 - 2 (3 appears 2 times in any of the columns)
1 - 2 (1 appear 2 times in any of the columns)
2 - 1 (1 appear 1 time in any of the columns)
As you can see on the example result, I want to know the number of times an id will appear in any of the 2 fields.
Use union all to get the id's into one column and get the counts.
select id,count(*)
from (select source_id as id from tbl
union all
select destination_id from tbl
) t
group by id
order by count(*) desc,id
edited to add: Thank you for clarifying your question. The following isn't what you need.
Sounds like you want to use the concatenate function.
https://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_concat
GROUP BY CONCAT(source_id,"_",destination_id)
The underscore is intended to distinguish "source_id=1, destination_id=11" from "source_id=11, destination_id=1". (We want them to be 1_11 and 11_1 respectively.) If you expect these IDs to contain underscores, you'd have to handle this differently, but I assume they're integers.
It may look like this.
Select id, count(total ) from
(select source_id as id, count (destination_user_id) as total from transactions group by source_id
union
select destination_user_id as id , count (source_id) as total from transactions group by destination_user_id ) q group by id

Guidance required for sql query

I have a database with one table as shown below. Here I'm trying to write a query to display the names of medication manufactured by the company that manufactures the most number of medications.
By looking at the table we could say the medication names which belongs to the company id 1 and 2 - because those company manufactures the most medication according to this table, but I'm not sure how to write a query for selecting the same i said before.
ID | COMPANY_ID | MEDICATION_NAME
1 1 ASPIRIN
2 1 GLUCERNA
3 2 SIBUTRAMINE
4 1 IBUPROFEN
5 2 VENOFER
6 2 AVONEN
7 4 ACETAMINOPHEN
8 3 ACETAMINO
9 3 GLIPIZIDE
Please share your suggestions. Thanks!
Several ways to do this. Here's one which first uses a subquery to get the maximum count, then another subquery to get the companies with that count, and finally the outer query to return the results:
select *
from yourtable
where companyid in (
select companyid
from yourtable
group by companyid
having count(1) = (
select count(1) cnt
from yourtable
group by companyid
order by 1 desc
limit 1
)
)
SQL Fiddle Demo
This Query might work. I have not tested but the logic is correct
SELECT MEDICATION_NAME
FROM TABLE where
COMPANY_ID=(SELECT
MAX(counted)
FROM ( SELECT COUNT(*) AS counted FROM TABLE ) AS counts);

Multiple rows come back of same ID, just need one

I am trying to list several products on a page. My query returns multiples of the same product and I am trying to figure out how to limit it to one only with my query.
The primary key on the first table that we will call table_one is ID.
The second table has a column of ProductID that references the primary key on table_one.
My query brings me back multiples of my ProductID that is equal to 6 below. I just want one result to be brought back, BUT I still want my all of my data in DateReserved on table_two to be queried. Pretty sure I need to add one more thing to my query, but I have not had much luck.
The results I want back are as follows.
ID Productname Quantity Image Date Reserved SumQuantity
6 productOne 6 'image.jpg' 03-31-2013 3
7 productTwo 1 'product.jpg' 04-04-2013 1
Here is my first table. table_one
ID Productname Quantity Image
6 productOne 6 'image.jpg'
7 productTwo 1 'product.jpg'
Here is my second table. table_two
ID ProductID DateReserved QuantityReserved
1 6 03-31-2013 3
2 6 04-07-2013 2
3 7 04-04-2013 1
Here is my query that I am trying to use.
SELECT *
FROM `table_one`
LEFT JOIN `table_two`
ON `table_one`.`ID` = `table_two`.`ProductID`
WHERE `table_one`.`Quantity` > 0
OR `table_two`.`DateReserved` + INTERVAL 5 DAY <= '2013-03-27'
ORDER BY ProductName
Sorry for posting another answer, but as it seems my first try on it was not so good ;)
To only get one result row per reservation you need to sum them up somehow.
First I suggest you explicitly select the columns you want back in your result and don't use "*".
I suggest you try something like this:
SELECT
`table_one`.`ID`, `table_one`.`Productname`, `table_one`.`Image`, `table_one`.`Quantity`,
`table_two`.`ProductID`, SUM(`table_two`.`QuantityReserved`)
FROM
`table_one`
LEFT JOIN
`table_two` ON `table_one`.`ID` = `table_two`.`ProductID`
WHERE
`table_one`.`Quantity` > 0
OR `table_two`.`DateReserved` + INTERVAL 5 DAY <= '2013-03-27'
GROUP BY `table_two`.`ProductID`
ORDER BY ProductName
As you see I used "SUM" to get a combined quantity, this is called aggregation and the "GROUP BY" helps you getting rid of multiple occurences of the same ProductID.
One problem that you have now is that you will have to get the reservation date from a seperate query (well at least I am now unsure how you would get it into the same query)
Since you are using MySQL
LIMIT <NUMBER>
should exactly do what you want, you just insert it after your ORDER BY clause, but probably you should also add one more ordering to that, so you can be sure that you will always get the one entity that you wanted and not just some "random" entity ;)
So without further ordering your query would look like this:
SELECT
*
FROM `table_one`
LEFT JOIN `table_two` ON `table_one`.`ID` = `table_two`.`ProductID`
WHERE
`table_one`.`Quantity` > 0
OR `table_two`.`DateReserved` + INTERVAL 5 DAY <= '2013-03-27'
ORDER BY ProductName
LIMIT 1
here some more description about that
SELECT a.member_id,a.member_name,a.gender,a.amount,b.trip_id,b.location
FROM tbl_member a
LEFT JOIN (SELECT trip_id, MAX(amount) as amount FROM tbl_member GROUP BY trip_id ) b ON a.trip_id= b.trip_id
LEFT JOIN tbl_trip b ON a.trip_id=c.trip_id
ORDER BY member_name