Why is this left outer join include rows twice? - mysql

In the following case:
CREATE TABLE Persons (
groupId int,
age int,
Person varchar(255)
);
insert into Persons (Person, groupId, age) values('Bob' , 1 , 32);
insert into Persons (Person, groupId, age) values('Jill' , 1 , 34);
insert into Persons (Person, groupId, age)values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Jake' , 2 , 29);
insert into Persons (Person, groupId, age) values('Paul' , 2 , 36);
insert into Persons (Person, groupId, age) values('Laura' , 2 , 39);
The following query:
SELECT *
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.groupId = b.groupId AND o.age < b.age
returns (executed in http://sqlfiddle.com/#!9/cae8023/5):
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 36 Paul 2 39 Laura
2 39 Laura (null) (null) (null).
I don't understand the result.
I was expecting
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 39 Laura (null) (null) (null)
Reason I was expecting that is that in my understanding the left join picks each row from the left table, tries to match it each row of the right table and if there is a match it adds the row. If there is no match in the condition it adds the left row with null values for the right columns.
So if that is correct why in the fiddle output we have after
1 34 Jill 1 42 Shawn
rows for Bob and Jill repeated?

Your condition for joining rows is that the groupId is equal and o.age < b.age.
Bob's age is 32. That is less than Jill's age of 34. It is also less than Shawn's age of 42. So the condition is satisfied in two pairings of joined rows.
The joined row has all the columns from the row referenced as o and all the columns from the row referenced as b.
Note that you have entered two rows for Shawn. Bob's row actually matches Jill's row and both rows for Shawn. So you get three rows for Bob.
When I test your query on my local MySQL instance (8.0.31), I get the result in the following order, which is different from your sqlfiddle's result:
+---------+------+--------+---------+------+--------+
| groupId | age | Person | groupId | age | Person |
+---------+------+--------+---------+------+--------+
| 1 | 32 | Bob | 1 | 42 | Shawn |
| 1 | 32 | Bob | 1 | 42 | Shawn |
| 1 | 32 | Bob | 1 | 34 | Jill |
| 1 | 34 | Jill | 1 | 42 | Shawn |
| 1 | 34 | Jill | 1 | 42 | Shawn |
| 1 | 42 | Shawn | NULL | NULL | NULL |
| 1 | 42 | Shawn | NULL | NULL | NULL |
| 2 | 29 | Jake | 2 | 39 | Laura |
| 2 | 29 | Jake | 2 | 36 | Paul |
| 2 | 36 | Paul | 2 | 39 | Laura |
| 2 | 39 | Laura | NULL | NULL | NULL |
+---------+------+--------+---------+------+--------+
Without an explicit ORDER BY clause, the default behavior of InnoDB is to return rows in the order they are read from the index. In this case, it's using the primary key order for both tables, because there's no other index to optimize the join. You can see that the order of columns from the left table match the primary key order.
I'm not sure how to explain why the Bob-Shawn rows are before the Bob-Jill row, because that's not primary key order for the joined table. It could be that the order is messed up in the join buffer while doing an unindexed join.
The sqlfiddle might be doing something in the client that reorders rows.

You inserted the record of (Shawn) twice. Your query should be :
CREATE TABLE Persons (
groupId int,
age int,
Person varchar(255)
);
insert into Persons (Person, groupId, age) values('Bob' , 1 , 32);
insert into Persons (Person, groupId, age) values('Jill' , 1 , 34);
insert into Persons (Person, groupId, age)values('Shawn' , 1 , 42);
insert into Persons (Person, groupId, age) values('Jake' , 2 , 29);
insert into Persons (Person, groupId, age) values('Paul' , 2 , 36);
insert into Persons (Person, groupId, age) values('Laura' , 2 , 39);
SELECT *
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.groupId = b.groupId AND o.age < b.age
;
This will gives you the following results
1 32 Bob 1 34 Jill
1 32 Bob 1 42 Shawn
1 34 Jill 1 42 Shawn
1 42 Shawn (null) (null) (null)
2 29 Jake 2 36 Paul
2 29 Jake 2 39 Laura
2 36 Paul 2 39 Laura
2 39 Laura (null) (null) (null)

Related

Query to lookup reference tables on sum the result

I am new to SQL, would like to have your suggestions on how to solve this problem,
I have the sales information by type
I want to sum the Prices of certain references by Type and based on the resulting sum, fetch the values from another table and populate in the Output Column.
Group Type 100000 200000 300000
1 A 1 2 3
1 B 0 1 1
2 T 2 2 4
2 U 0 2 2
3 V 2 2 3
4 N 1 1 1
From the above table 2 we find the TYPE A and B belong to same group - Group 1. So in the first table, the query should sum Prices of the references belonging to the Group 1. If the sum is >100000 and <=200000 then based on the type the corresponding value must be chosen.
Incase the sum of Prices based on group is less than 100000 or the type not found in Table 2 then it should take the values from the below table
[+------+----+---+
| Type | 1 | 2 |
+------+----+---+
| A | 50 | 2 |
| B | 60 | 5 |
| C | 65 | 2 |
| D | 65 | 3 |
| E | 65 | 4 |
+------+----+---+][3]
Thus the final output for the above datasheet would be like below,
Order ID Reference Type Price Output
101 AAA A 500000 3
101 AAB B 100000 1
101 ABC C 20000 67
101 DCE B 50000 1
101 BOD D 200000 68
101 ZYZ E 200000 69
102 AAA A 20000 52
So for the first line, its TYPE A and Type A is present under Group 1 and in Group1 we also have Type 2. So for the same order ID 101 , the overall Sales of Type A and B is 650000 > 300000, therefore for Type A we chose the value 3 from the table 2. Since Type C is not present in Table 2, I went to Table 3 and added the two values and so on
Sorry for the long post. Hope my question is clear? Would like to have your expert opinion.
Thanks,
SS
Join all tables and make sure you do LEFT JOIN as we want to keep records from the first table even we don't have corresponding data in the second or third table.
For total count, give priority to the second table, use case when to verify in which range this mrp field is falling. If lies within a range pick count from the second table otherwise pick count from the third table.
SELECT
s.order_id,
s.reference,
s.`type`,
s.mrp,
#a:= IFNULL(g_total.Total, s.mrp) AS MRP_Total, -- #a variable to use it in CASE WHEN clause
CASE
WHEN #a > 100000 AND #a <= 200000 AND sg.`type` IS NOT NULL THEN sg.price_100000
WHEN #a > 200000 AND #a <= 300000 AND sg.`type` IS NOT NULL THEN sg.price_200000
WHEN #a > 300000 AND sg.`type` IS NOT NULL THEN sg.price_300000
ELSE tp.price_1 + tp.price_2
END Total
FROM sales s
LEFT JOIN sales_group sg ON s.`type` = sg.`type`
LEFT JOIN type_prices tp ON s.`type` = tp.`type`
LEFT JOIN (
SELECT
s.order_id, sgg.`group`, SUM(mrp) as Total
FROM sales s
INNER JOIN sales_group sgg ON s.`type` = sgg.`type`
GROUP BY s.order_id, sgg.`group`
) AS g_total -- Temp table to find total MRP, order and group wise
ON s.order_id = g_total.order_id AND sg.`group` = g_total.`group`
ORDER BY s.order_id, s.`type`;
Output:
sales
---
| order_id | reference | type | mrp | MRP_Total | Total |
---------------------------------------------------------
| 101 | AAA | A | 500000 | 650000 | 3 |
| 101 | DCE | B | 50000 | 650000 | 1 |
| 101 | AAB | B | 100000 | 650000 | 1 |
| 101 | ABC | C | 200000 | 200000 | 67 |
| 101 | BOD | D | 200000 | 200000 | 68 |
| 101 | ZYZ | E | 200000 | 200000 | 69 |
| 102 | AAA | A | 20000 | 20000 | 52 |
Note: sg.type IS NOT NULL is added in CASE WHEN clause because if we don't have any mapping in the second table, we should move to ELSE part which refers to the third table.

Multi-event tournament standings (with arbitrary number of entries)

Suppose you have a a multi-event competition where competitors can attempt any event an arbitrary number of times. (weird, I know.)
How do pull out a desired player's best time for each event,
and assign it a placing? (1st 2nd 3rd...)
Data example: Desired output:
Name | Event | Score Name | Event | Score | Rank
-------------------- ----------------------------
Bob 1 50 Given input: "Bob"
Bob 1 100 Bob 1 100 1
Bob 2 75 Bob 2 75 3
Bob 3 80 Bob 3 80 2
Bob 3 65
Given input: "Jill"
Jill 2 75 Jill 2 90 1
Jill 2 90 Jill 3 60 3
Jill 3 60
Given input: "Chris"
Chris 1 70 Chris 1 70 2
Chris 2 50 Chris 2 85 2
Chris 2 85 Chris 3 100 1
Chris 3 100
This is a build up of my previous question:
Multi-event tournament standings
I feel understand that problem much better (Thanks!), but I cannot bridge the gap to this version of the problem.
I have SQL 5.x so I cant use stuff like Rank(). This will also be crunching many thousands of scores.
Desired output can be acheaved with this query:
select
IF(event is NULL, CONCAT('Given input: "', name,'"'), name) as name,
IF(event is NULL, '', event) as event,
IF(event is NULL, '', max(score)) as score,
IF(event is NULL, '', (
select count(s2.name) + 1
from (
select name, max(score) as score
from scores es
where es.event = s.event
group by es.name
order by score desc
) s2
where s2.score > max(s.score)
)) as `rank`
from scores s
group by name, event with rollup
having name is not NULL
order by name, event;
And output (if run query in mysql cli):
+----------------------+-------+-------+------+
| name | event | score | rank |
+----------------------+-------+-------+------+
| Given input: "Bob" | | | |
| Bob | 1 | 100 | 1 |
| Bob | 2 | 75 | 3 |
| Bob | 3 | 80 | 2 |
| Given input: "Chris" | | | |
| Chris | 1 | 70 | 2 |
| Chris | 2 | 85 | 2 |
| Chris | 3 | 100 | 1 |
| Given input: "Jill" | | | |
| Jill | 2 | 90 | 1 |
| Jill | 3 | 60 | 3 |
+----------------------+-------+-------+------+
11 rows in set, 3 warnings (0.00 sec)
Should work on any Mysql 5.
You can get the highest score per event by an aggregation by event taking the max(). To simulate a dense_rank() you can use a subquery counting the scores higher than or equal to the current score per event.
For a particular contestant (here Bob) that makes:
SELECT d1.name,
d1.event,
max(d1.score) score,
(SELECT count(*)
FROM (SELECT d2.event,
max(d2.score) score
FROM data d2
GROUP BY d2.event,
d2.name) x1
WHERE x1.score >= max(d1.score)
AND x1.event = d1.event) rank
FROM data d1
WHERE d1.name = 'Bob'
GROUP BY d1.event
ORDER BY d1.event;
And for all of them at once:
SELECT d1.name,
d1.event,
max(d1.score) score,
(SELECT count(*)
FROM (SELECT d2.event,
max(d2.score) score
FROM data d2
GROUP BY d2.event,
d2.name) x1
WHERE x1.score >= max(d1.score)
AND x1.event = d1.event) rank
FROM data d1
GROUP BY d1.name,
d1.event
ORDER BY d1.name,
d1.event;
db<>fiddle
E.g.:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,name VARCHAR(12) NOT NULL
,event INT NOT NULL
,score INT NOT NULL
);
INSERT INTO my_table (name,event,score) VALUES
('Bob' ,1, 50),
('Bob' ,1,100),
('Bob' ,2, 75),
('Bob' ,3, 80),
('Bob' ,3, 65),
('Jill' ,2, 75),
('Jill' ,2, 90),
('Jill' ,3, 60),
('Chris',1, 70),
('Chris',2, 50),
('Chris',2, 85),
('Chris',3,100);
SELECT a.*
, FIND_IN_SET(a.score,b.scores) my_rank
FROM my_table a -- it's possible that this really needs to be a repeat of the subquery below, so
-- ( SELECT m.* FROM my_table m JOIN (SELECT name,event,MAX(score) score FROM my_table
-- GROUP BY name, event) n ON n.name = m.name AND n.event = m.event AND n.score = m.score) AS a
JOIN
(
SELECT x.event
, GROUP_CONCAT(DISTINCT x.score ORDER BY x.score DESC) scores
FROM my_table x
JOIN
( SELECT name
, event
, MAX(score) score
FROM my_table
GROUP
BY name
, event
) y
ON y.name = x.name
AND y.event = x.event
AND y.score = x.score
GROUP
BY x.event
) b
ON b.event = a.event
WHERE FIND_IN_SET(a.score,b.scores) >0;
+----+-------+-------+-------+------+
| id | name | event | score | rank |
+----+-------+-------+-------+------+
| 2 | Bob | 1 | 100 | 1 |
| 3 | Bob | 2 | 75 | 3 |
| 4 | Bob | 3 | 80 | 2 |
| 6 | Jill | 2 | 75 | 3 |
| 7 | Jill | 2 | 90 | 1 |
| 8 | Jill | 3 | 60 | 3 |
| 9 | Chris | 1 | 70 | 2 |
| 11 | Chris | 2 | 85 | 2 |
| 12 | Chris | 3 | 100 | 1 |
+----+-------+-------+-------+------+

bigquery - filter out unique results only

My database looks the following:
Entry-Key Name Surname Age
10a Smith Alex 35
11b Finn John 41
10a Smith Al 35
10c Finn Berta 28
11b Fin John 41
I need to get unique rows out of it. Group by does not work properly since sometimes there are inaccuracies in Name/Surname columns.
I thought to group by just the Entry-Keys and then find the first appearance of the Key in the table and take only this row. I know how to do it in Excel but since the database has some 100,000 lines Excel is not a real option.
the idea is to get finally this table:
10a Smith Alex 35
11b Finn John 41
12c Finn Berta 28
Please help!
For your logic you can do the below query:
select key, first(name), first(surname), first(age) from
(select '10a' as key, 'Smith' as name, 'Alex' as surname, 35 as age),
(select '11b' as key, 'Finn' as name, 'John' as surname, 41 as age),
(select '10a' as key, 'Smith' as name, 'Al' as surname, 35 as age),
(select '10c' as key, 'Finn' as name, 'Berta' as surname, 28 as age),
(select '11b' as key, 'Fin' as name, 'John' as surname, 41 as age),
group by key
This returns:
+-----+-----+-------+-------+-----+---+
| Row | key | f0_ | f1_ | f2_ | |
+-----+-----+-------+-------+-----+---+
| 1 | 10a | Smith | Alex | 35 | |
| 2 | 11b | Finn | John | 41 | |
| 3 | 10c | Finn | Berta | 28 | |
+-----+-----+-------+-------+-----+---+

Deleting duplicate entries with search criteria

I have table like
table_id item_id vendor_id category_id
1 1 33 4
2 1 33 4
3 1 33 2
4 2 33 4
5 2 33 2
6 3 33 4
7 3 33 4
8 1 34 4
9 1 34 4
10 3 35 4
Here table_id is primary key and table having total 98000 entries including 61 duplicate entries which I found by executing query
SELECT * FROM my_table
WHERE vendor_id = 33
AND category_id = 4
GROUP BY item_id having count(item_id)>1
In above table table_id 1,2 and 6,7 duplicate. I need to delete 2 and 7 from my table( Total 61 Duplicate Entries). How can I delete duplicate entries from my table using query with where clause vendor_id = 33 AND category_id = 4 ? I don't want delete other duplicate entries such as table_id 8,9
I cannot index the table, since I need to kept some duplicate entries which required. I need to delete duplicate with certain criteria
Please always take backup before running any deletion query.
Try using LEFT JOIN like this:
DELETE my_table
FROM my_table
LEFT JOIN
(SELECT MIN(table_id) AS IDs FROM my_table
GROUP BY `item_id`, `vendor_id`, `category_id`
)A
ON my_table.table_id = A.IDs
WHERE A.ids IS NULL;
Result after deletion:
| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
| 1 | 1 | 33 | 4 |
| 3 | 1 | 33 | 2 |
| 4 | 2 | 33 | 4 |
| 5 | 2 | 33 | 2 |
| 6 | 3 | 33 | 4 |
See this SQLFiddle
Edit: (after OP's edit)
If you want to add more conditions, you can add it in outer WHERE condition like this:
DELETE my_table
FROM my_table
LEFT JOIN
(SELECT MIN(table_id) AS IDs FROM my_table
GROUP BY `item_id`, `vendor_id`, `category_id`
)A
ON my_table.table_id = A.IDs
WHERE A.ids IS NULL
AND vendor_id = 33 --< Additional conditions here
AND category_id = 4 --< Additional conditions here
See this SQLFiddle
What about this:
DELETE FROM my_table
WHERE table_id NOT IN
(SELECT MIN(table_id)
FROM my_table
GROUP BY item_id, vendor_id, category_id)
try below code...
DELETE FROM myTable
WHERE table_ID NOT IN (SELECT MAX (table_ID)
FROM myTable
GROUP BY table_ID
HAVING COUNT (*) > 1)
Try
DELETE m
FROM my_table m JOIN
(
SELECT MAX(table_id) table_id
FROM my_table
WHERE vendor_id = 33
AND category_id = 4
GROUP BY item_id, vendor_id, category_id
HAVING COUNT(*) > 1
) q ON m.table_id = q.table_id
After delete you'll have
| TABLE_ID | ITEM_ID | VENDOR_ID | CATEGORY_ID |
------------------------------------------------
| 1 | 1 | 33 | 4 |
| 3 | 1 | 33 | 2 |
| 4 | 2 | 33 | 4 |
| 5 | 2 | 33 | 2 |
| 6 | 3 | 33 | 4 |
| 8 | 1 | 34 | 4 |
| 9 | 1 | 34 | 4 |
| 10 | 3 | 35 | 4 |
Here is SQLFiddle demo
From your Question, I guess you need to remove the duplicate rows which has same values for the item_id,vendor_id and category_id like the rows having tabled_id 1 and 2. So it can be done by making the mentioned three columns unique together. So try the following,
alter ignore table table_name add unique index(item_id, vendor_id, category_id);
Note: I didnt test this yet, Will give sqlfiddle in sometime

How to group and concatenate several rows into groups of 20s

I have a table with only numeric IDs
ID
1
2
3
4
5
6
7
8
9
10
And I want to break and concatenate (group) this ids into groups of 5s or 20s, ej.
GROUPS
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
How can I do this with SQL?
UPDATE:
SELECT with sorted ids
SELECT GROUP_CONCAT(id ORDER BY id) AS GROUPS
FROM `test`
GROUP BY (id - 1) DIV 5
Result:
GROUPS
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
16,17,18,19,20
21,22,23,24,25
26,27,28,29,30
31,32,33,34,35
SELECT with second unsorted table
SELECT GROUP_CONCAT(id ORDER BY id) AS GROUPS
FROM `test2`
GROUP BY (id - 1) DIV 5
Result:
GROUPS
3,5
10
12
16
23,24,25
32,35
43,44
47
55
61
68,70
77
84
89
91,92,95
97,100
For MySQL:
SELECT GROUP_CONCAT(id ORDER BY id) AS GROUPS
FROM yourtable
GROUP BY (id - 1) DIV 5
See it working online: sqlfiddle
For unsorted IDs:
SET #rank=0;
SELECT
id,
#rank:=#rank+1 AS rank,
GROUP_CONCAT(id ORDER BY id) AS GROUPS
FROM `test2`
GROUP BY (#rank ) DIV 5
And the result:
+----+------+----------------+
| id | rank | GROUPS |
+----+------+----------------+
| 1 | 1 | 1,3,5,7,9 |
| 13 | 7 | 11,13,15,17,19 |
| 29 | 15 | 21,23,25,27,29 |
| 31 | 16 | 31,33,35,37,39 |
| 45 | 23 | 41,43,45,47,49 |
| 51 | 26 | 51,53,55,57,59 |
| 61 | 31 | 61,63,65,67,69 |
| 77 | 39 | 71,73,75,77,79 |
| 81 | 41 | 81,83,85,87,89 |
| 93 | 47 | 91,93,95,97,99 |
+----+------+----------------+
Thanks a lot for your help, I could't know without your help!