my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id.
the columns i want to check and compare are:
sub_id, spec_id, ex_time
so, for this table
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 1 | 100 | 444 | 09:29 | 2 |
| 2 | 101 | 555 | 10:01 | 10 |
| 3 | 100 | 444 | 09:29 | 23 |
| 4 | 200 | 321 | 05:15 | 5 |
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i would like to get this result
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i was able to build this query to select all duplicate rows from multiple columns, according to this question
select t.*
from mytable t join
(select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
from mytable
group by sub_id, spec_id, ex_time
having NumDuplicates > 1
) tsum
on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time
but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id.
as shown here
You can modify your sub-select query, to get maximum value of id for each duplication combination.
Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
You can now Delete from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id
Here is the case I have two tables tags and customers as the following structure
Tags Table
ID Name
1 Tag1
2 Tag2
Customers Table
ID Tag_ID Name
1 1 C1
2 2 C2
3 1 C3
I want a SQL statement to get the first 10 customers (alphabetically) for each tag? is it possible to be done in one query.
P.S the data in the tables are sample data not the actual data
Consider the following:
DROP TABLE IF EXISTS tags;
CREATE TABLE tags
(tag_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,name VARCHAR(12) NOT NULL
);
INSERT INTO tags VALUES
(1,'One'),
(2,'Two'),
(3,'Three'),
(4,'Four'),
(5,'Five'),
(6,'Six');
DROP TABLE IF EXISTS customers;
CREATE TABLE customers
(customer_id INT NOT NULL
,customer VARCHAR(12)
);
INSERT INTO customers VALUES
(1,'Dave'),
(2,'Ben'),
(3,'Charlie'),
(4,'Michael'),
(5,'Steve'),
(6,'Clive'),
(7,'Alice'),
(8,'Ken'),
(9,'Petra');
DROP TABLE IF EXISTS customer_tag;
CREATE TABLE customer_tag
(customer_id INT NOT NULL
,tag_ID INT NOT NULL
,PRIMARY KEY(customer_id,tag_id)
);
INSERT INTO customer_tag VALUES
(1,1),
(1,2),
(1,4),
(2,3),
(2,2),
(3,1),
(4,4),
(4,2),
(5,2),
(5,5),
(5,6),
(6,6);
The following query returns all customers associated with each tag, and their respective 'rank' when sorted alphabetically...
SELECT t.*, c1.*, COUNT(ct2.tag_id) rank
FROM tags t
JOIN customer_tag ct1
ON ct1.tag_id = t.tag_id
JOIN customers c1
ON c1.customer_id = ct1.customer_id
JOIN customer_tag ct2
ON ct2.tag_id = ct1.tag_id
JOIN customers c2
ON c2.customer_id = ct2.customer_id
AND c2.customer <= c1.customer
GROUP
BY t.tag_id, c1.customer_id
ORDER
BY t.tag_id,rank;
+--------+-------+-------------+----------+------+
| tag_id | name | customer_id | customer | rank |
+--------+-------+-------------+----------+------+
| 1 | One | 3 | Charlie | 1 |
| 1 | One | 1 | Dave | 2 |
| 2 | Two | 2 | Ben | 1 |
| 2 | Two | 1 | Dave | 2 |
| 2 | Two | 4 | Michael | 3 |
| 2 | Two | 5 | Steve | 4 |
| 3 | Three | 2 | Ben | 1 |
| 4 | Four | 1 | Dave | 1 |
| 4 | Four | 4 | Michael | 2 |
| 5 | Five | 5 | Steve | 1 |
| 6 | Six | 6 | Clive | 1 |
| 6 | Six | 5 | Steve | 2 |
+--------+-------+-------------+----------+------+
If we just want the top 2, say, for each tag, we can rewrite that as follows...
SELECT t.*
, c1.*
FROM tags t
JOIN customer_tag ct1
ON ct1.tag_id = t.tag_id
JOIN customers c1
ON c1.customer_id = ct1.customer_id
JOIN customer_tag ct2
ON ct2.tag_id = ct1.tag_id
JOIN customers c2
ON c2.customer_id = ct2.customer_id
AND c2.customer <= c1.customer
GROUP
BY t.tag_id, c1.customer_id
HAVING COUNT(ct2.tag_id) <=2
ORDER
BY t.tag_id, c1.customer;
+--------+-------+-------------+----------+
| tag_id | name | customer_id | customer |
+--------+-------+-------------+----------+
| 1 | One | 3 | Charlie |
| 1 | One | 1 | Dave |
| 2 | Two | 2 | Ben |
| 2 | Two | 1 | Dave |
| 3 | Three | 2 | Ben |
| 4 | Four | 1 | Dave |
| 4 | Four | 4 | Michael |
| 5 | Five | 5 | Steve |
| 6 | Six | 6 | Clive |
| 6 | Six | 5 | Steve |
+--------+-------+-------------+----------+
This is fine, but where performance is an issue, a solution like the following will be faster - although you may need to run SET NAMES utf8; prior to constructing the tables (as I had to) in order for it to work properly:
SELECT tag_id, name, customer_id,customer
FROM
(
SELECT t.*
, c.*
, CASE WHEN #prev=t.tag_id THEN #i:=#i+1 ELSE #i:=1 END rank
, #prev := t.tag_id
FROM tags t
JOIN customer_tag ct
ON ct.tag_id = t.tag_id
JOIN customers c
ON c.customer_id = ct.customer_id
JOIN ( SELECT #i:=1, #prev:=0) vars
ORDER
BY t.tag_id
, c.customer
) x
WHERE rank <=2
ORDER
BY tag_id,customer;
+--------+-------+-------------+----------+
| tag_id | name | customer_id | customer |
+--------+-------+-------------+----------+
| 1 | One | 3 | Charlie |
| 1 | One | 1 | Dave |
| 2 | Two | 2 | Ben |
| 2 | Two | 1 | Dave |
| 3 | Three | 2 | Ben |
| 4 | Four | 1 | Dave |
| 4 | Four | 4 | Michael |
| 5 | Five | 5 | Steve |
| 6 | Six | 6 | Clive |
| 6 | Six | 5 | Steve |
+--------+-------+-------------+----------+
To achieve this, we have to use two session variables, one for the row number and the other for storing the old customer ID to compare it with the current one as the following query:
select c.name, #row_number:=CASE
WHEN #cid = c.id THEN #row_number + 1
ELSE 1
END AS rows,
#id:=c.id as CustomerId from tags t, customers c where t.id=c.id group by c.name where Rows<=10
We used CASE statement in the query. If the customer number remains the same, we increase the row_number variable
Reference
Your question reminds me of this one (see especially the top-voted answer), so I came up with this query:
SELECT Tags.ID,
Tags.Name,
SUBSTRING_INDEX(GROUP_CONCAT(Customers.Name
ORDER BY Customers.Name),
',', 10) AS Customers
FROM Customers
INNER JOIN Tags
ON Tags.ID = Customers.Tag_ID
GROUP BY Tags.ID
ORDER BY Tags.Id;
It works, but this is clearly a hacky way to do this, because MySQL does not offer yet tools to do this more naturally.
I have 3 tables like this
SecretAgents
| id | name |
|----|------|
| 1 | A |
| 2 | B |
Victims
| id | name | agent_id |
|----|------|----------|
| 1 | Z | 1 |
| 2 | Y | 1 |
| 3 | X | 2 |
Data
| id | keys | values | victim_id | form_id |
|----|------|--------|-----------|---------|
| 1 | a1 | x | 1 | 1 |
| 2 | a2 | xx | 1 | 2 |
| 3 | a3 | xxx | 2 | 1 |
| 4 | a5 | xxx | 1 | 1 |
I have to get the count of forms(here victim_id and form_id are composite primary keys) and the count of victims for each agent.
I have tried this for any 2 tables with left joins and group by but I am not able to achieve the same together. If anyone can be generous enough to offer a pointer/solution, that would be super awesome..
EDIT 1: The query
This is definitely not the right query but anyways
SELECT count(DISTINCT v.id) as victimcount, `sa`.`username`, `sa`.`id`, count(DISTINCT d.form_id) as submissions
FROM `SecretAgents` as `sa`
LEFT JOIN `Victims` as `v` ON `v`.`agent_id`=`sa`.`id`
LEFT JOIN `Data` as `d` ON `d`.`victim_id`=`v`.`id`
GROUP BY `v`.`agent_id`
ORDER BY `sa`.`id` ASC
The victimcount is correct but the submissions count becomes wrong. Tried lots of other things too but this is the most relevant...
Thanks
I believe you can count the forms-per-agent like so:
SELECT COUNT(*) as form_count, a.id as id, a.name as agent
FROM Data d
LEFT JOIN Victims v ON v.id = d.victim_id
LEFT JOIN SecretAgents a on v.agent_id = a.id
GROUP BY a.id;
To count the victims, just leave off the Data table.
Sorry....should have said, this is MySQL.
Ok....first and foremost, I don't know if I can actually do what I am looking to do. I have some experience with SQL, but not a ton. Hopefully, someone can help.
I have two tables, one has orders and one has shipments. I can do a join between them and get a proper result.....
Orders Table
Order_ID | Revenue |
1001 | 125.00 |
1002 | 215.31 |
1003 | 654.43 |
Shipments Table
Order_ID | Shipment_ID | Item Count |
1001 | 99001 | 25 |
1001 | 99002 | 5 |
1002 | 99003 | 65 |
1003 | 99004 | 123 |
1003 | 99005 | 20 |
With a straight join on Order_ID, I get back the expected result:
Order_ID | Revenue | Shipment_ID | Item Count |
1001 | 125.00 | 99001 | 25 |
1001 | 125.00 | 99002 | 5 |
1002 | 215.31 | 99003 | 65 |
1003 | 654.43 | 99004 | 123 |
1003 | 654.43 | 99005 | 20 |
I am trying to reconcile revenue and cost in the same output, if possible. I know from a separate table what the cost of each of my shipments was, so that math is simple. However, my revenue is off this way because I have duplication in the revenue column, due to orders going in multiple shipments.
I would like to get something like the following:
Order_ID | Revenue | Shipment_ID | Item Count |
1001 | 125.00 | 99001 | 25 |
1001 | NULL | 99002 | 5 |
1002 | 215.31 | 99003 | 65 |
1003 | 654.43 | 99004 | 123 |
1003 | NULL | 99005 | 20 |
The values for the duplicate revenue numbers could be null, blank, 0, anything other than a value that will calculate. Any ideas?
Thanks in advance!
Matthew
In MySQL you can use the following query:
SELECT s1.Order_ID, s1.Shipment_ID, s1.Item_Count,
IF(s1.Shipment_ID = s2.minS_ID, o.Revenue, 0) AS Revenue
FROM Shipments AS s1
INNER JOIN (
SELECT Order_ID, MIN(Shipment_ID) AS minS_ID
FROM Shipments
GROUP BY Order_ID
) AS s2 ON s1.Order_ID = s2.Order_ID
INNER JOIN Orders AS o ON s1.Order_ID = o.Order_ID
The idea is to perform an additional join with a derived table that contains the minimum Shipment_ID per Order_ID. If the Shipment_ID value of the current row is equal to this value then return Revenue, else return 0.
Demo here
In SQL Server you can use window version of MIN to make the comparison:
SELECT s.Order_ID, Shipment_ID, Item_Count,
CASE
WHEN MIN(Shipment_ID) OVER (PARTITION BY s.Order_ID) = Shipment_ID THEN Revenue
ELSE 0
END AS Revenue
FROM Shipments AS s
INNER JOIN Orders AS o ON s.Order_ID = o.Order_ID
I have a table that looks like this:
| id | order_id | product_id | category_id |name | cost | returned_product_id |
| 3100 | 900 | 0125 | 3 | Foo | 14 | NULL |
| 3101 | 901 | 0145 | 3 | Bar | 10 | NULL |
| 3102 | 901 | 2122 | 3 | Baz | 11 | NULL |
| 3103 | 900 | 0125 | 3 | Foo | -14 | 3100 |
| 3104 | 902 | 0125 | 3 | Foo | 14 | NULL |
| 3105 | 902 | 0125 | 3 | Foo | -14 | 3104 |
| 3106 | 903 | 0125 | 3 | Foo | 14 | NULL |
The id is a single line item of an order where product_id was included. If the product is returned, a new line item is created with a new id. There is one of each product, and it is possible to repurchase a returned item again, and return it again.
I'm joining the table data with data from other tables given certain conditions. As a final condition, I am attempting to exclude any line items that were originally returned. This is in attempt to perform a single query that essentially gives me all product_ids ever purchased and that have not been returned, like this:
select product_id
from orders o,
line_items i
where o.state = 'paid'
and o.id = i.order_id
and i.category_id = 3
and i.product_id not in (select li.returned_product_id
from line_items li
where li.refunded_product_id is not null
and li.product_id = 3)
Even though I have indexes on both the id and returned_product_id, the query above is really slow (thousands of lines), where if my subselect queried for the id, it's fast.
If your query is from the table that you exposed the content, the line:
and i.id not in (select li.returned_product_id
will look in the id of the table and not the id of the product, wright?
That should be
and i.product_id not in (select li.returned_product_id
something like:
select distinct i.product_id from
line_items i left join line_items j
on i.product_id = j.refunded_product_id
where j.refunded_product_id is null
?