MySQL delete except one - mysql

I've the following database structure:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
5 2 136
6 2 128
7 2 10
8 2 1
9 3 561
10 3 560
11 3 10
12 3 1
13 4 561
14 4 560
15 4 10
16 4 1
17 5 234
18 5 120
19 5 1
20 6 234
21 6 120
22 6 1
Here are the details:
The table refers idproperty with different geographic location. For example:
idgbs
1 refers to United States
10 refers to Alabama with parentid 1 (United States)
128 refers to Alabama Gulf Coast with parentid 10 (Alabama)
136 Dauphin Island with parentid 128 (Alabama Gulf Coast)
So, the structure is:
United States > Alabama > Alabama Gulf Coast > Dauphin Island
I want to delete all entries for idproperty EXCEPT the first with the set of idgbs 136, 128, 10, 1 i.e. leave atleast 1 property in all GBS and delete others.
Also, sometimes it is 4 level of geographic entries, sometimes it is 3 level.
Please share the logic & SQL query to delete all entries except one in every unique GBS.
GBS 1, 10, 128, 136 is one unique, so database should only contain 1 property id with these GBS.
After the query, the table would look like this:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
9 3 561
10 3 560
11 3 10
12 3 1
17 5 234
18 5 120
19 5 1
Rephrasing the question:
I want to keep properties in every root level GBS i.e. there should be only ONE property in Dauphin Island.

Whew... I think I understand what you are after now. I couldn't let this one go ;-)
I had to realize that in the question, you wanted property 2 deleted, because it shared a hierarchy with property 1. Once I realized that, I got the following idea. Basically, we join to an aggregated version of self twice: the first one tells us what our "gbs hierarchy path" is, and the second one matches any previous properties with the same hierarchy. Rows which find that there are no "previous" properties that share their hierarchy are spared, the rest with that hierarchy are deleted. It's possible that this could be further tweaked, but I wanted to share this now. I have tested it with the data you showed, and I got the results you posted.
DELETE
each_row.*
FROM property_gbs AS each_row
JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS mypath
USING(idproperty)
LEFT JOIN ( SELECT
idproperty,
GROUP_CONCAT(idgbs ORDER BY idgbs DESC SEPARATOR "/") AS idgbs_path
FROM property_gbs
GROUP BY idproperty
) AS previous_property
ON mypath.idgbs_path = previous_property.idgbs_path
AND previous_property.idproperty < each_row.idproperty
WHERE previous_property.idproperty
Note that the last line is not a typo, we are just checking if there is a previous property with the same path. If there is, then delete the currently-evaluated row.
Cheers!
note for clarification
The thought here is to associate every row with it's hierarchy, even if it's a row which represents somewhere in the middle of the hierarchy (such as row: {2, 1, 128} in the question). With the first join to the aggregate, each row now "knows" what it's path is (so that row would get "136/128/10/1"). We can then use that value in the second join to find other properties with the same path, but only if they have a LOWER property id. This allows us to check for the existence of a lower-ID property with the same "path", and delete any row which represents a property which does have such a "lower-order path-sibling".

I'm really not sure. But try this.
DELETE a1 FROM table a1, table a2
WHERE a1.id > a2.id
AND a1.idgbs = a2.idgbs
AND a1.idgbs <> 1
if you want to keep the row with the lowest id.

This one was difficult #dang but I enjoyed the challenge.
;With [CTE] as (Select id ,idproperty ,idgbs ,Row_Number() Over(Partition By idgbs order by idproperty Asc) as RN From [TableGBS])
,[CTE2] as (Select * From [CTE] Where RN > 1)
,[CTE3] as (Select idproperty ,count(*) as [Count] From [CTE2] Group by idproperty)
Delete from [TableGBS] Where id in (Select a.id From [CTE] as a Left Join [CTE3] as b on a.idproperty = b.idproperty Where RN > 1 And [Count] > 2);
Since i dont think you can do a delete statement in sqlfiddle here are the rows it will delete showing in a select statement: http://sqlfiddle.com/#!3/08108/40
Edit: I use MySQL linked to Microsoft SQL Server Management Studio so this might not work for you

This technique joins the table against an aggregated version of itself, essentially matching each row in the table to the knowledge of which idproperty is the lowest for it's idgbs, and deletes the row if it does not share that idproperty (i.e. the row with the lowest idproperty, when joined to itself, will not be deleted, but the rest of the rows with that idgbs will).
DELETE
each_row.*
FROM table AS each_row
JOIN (select MIN(idproperty), idgbs FROM table GROUP BY idgbs) as lowest_id
USING(idgbs)
WHERE each_row.idproperty != lowest_id.idproperty;

Related

Limit selected results by unique selected IDs when using left joins

I have a table users and some other tables like images and products
Table users:
user_id user_name
1 andrew
2 lutz
3 sophie
4 michael
5 peter
6 oscor
7 anton
8 billy
9 henry
10 jon
Tables images:
user_id img_type img_url
1 0 url1
1 1 url4
2 0 url5
7 0 url7
8 0 url8
9 1 url9
Table Products
user_id prod_id
1 5
1 55
2 555
8 5555
9 5
9 55
I use this kind of SELECT:
SELECT * FROM
(SELECT user.user_id,user.user_name, img.img_type, prod.prod_id FROM
users
LEFT JOIN images img ON img.user_id = users.user_id
LEFT JOIN products prod ON prod.user_id = users.user_id
WHERE user.user_id <= 5) AS users
ORDER BY user.user_id ASC
The result should be the following output. Due to performance improvements, I use ORDER BY and an inner select. If I put a LIMIT 5 within the inner or outer select, things won't work. MySQL will hard LIMIT the results to 5. However I need the LIMIT of 5 (pagination) found unique user_id results which would lead to 9 in this case.
Can I use maybe an if-statement to push an array with found user_id and break/finish up the select when the array consist of 5 UIDs? Or can I modify somehow the select?
user_id user_name img_type prod_id
1 andrew 0 5
1 andrew 1 5
1 andrew 0 55
1 andrew 1 55
2 lutz 0 5
2 lutz 0 55
3 sophie null null
4 michael null null
5 peter null null
results: 9
LIMIT 5 and user_id <= 5 do not necessarily give you the same results. One reason: There are multiple rows (after the JOINs) for user_id = 1. This is because there can be multiple images and/or multiple products for a given 'user'.
So, first decide which you want.
LIMIT without ORDER BY gives you an arbitrary set of rows. (Yeah, it is somewhat predictable, but you should not depend on it.)
ORDER BY + LIMIT usually implies gathering all the potentially relevant rows, sorting them, then doing the "limit". There are sometimes ways around this sluggishness.
LEFT leads to the NULLs you got; did you want that?
What do you want pagination to do if you are displaying 5 items per page, but user 1 has 6 images? You need to think about this edge case before we can help you with a solution. Maybe you want all of user 1 on a page, even if it exceeds 5? Maybe you want to break in the middle of '1'; but then we need an unambiguous way to know where to continue from for the next page.
Probably any viable solution will not use nested SELECTs. As you are finding out, it leads to "errors". Think of it this way: First find all the rows you need to display on all the pages, then carve out 5 for the current page.
Here are some more musings on pagination: http://mysql.rjweb.org/doc.php/pagination

Create multiple results from single row joining either 2 or 3 tables based off conditional of table1 results?

First let me say that yes, this is a horrible way to have stored data, second, it isn't my fault :) I am trying to integrate with a 3rd party database to extract info which is stored in 3 tables, which really should have been in two AND stored where table 2 had a many to one relationship. Since that isn't the case, I have a puzzle to share.
Table one contains rows in which multiple values can be stored. Each row has codeid1-codeid20. These columns may contain a value, or a 0 (they are never null). They also have a corresponding codetype1-codetype20 which will be either 0 or 1.
If codetype1 is equal to 0, we go to table 2 and select description from the matching table1.codeid1=table2.id. If codetype1 equals 1, we now have to look at table3 and find where table1.codeid1=table3.id and then match table3.table2id=table2.id and return the description.
Here is the data structure:
table1
codeid1,codeid2,codeid3,...codeid20 ... codetype1,codetype2,codetype3,.....codetype20
18 13 1 33 0 0 1 1
13 21 45 0 0 1 0 0
table2
id, description
13 Item 13 Description
15 Item 15 Description
17 Item 17 Description
18 Item 18 Description
21 Item 21 Description
28 Item 28 Description
45 Item 45 Description
table3
id, table2id
1 15
33 17
21 28
The results I would be looking for would look like this:
rowid, description
1 Item 18 Description
1 Item 13 Description
1 Item 15 Description
1 Item 17 Description
2 Item 13 Description
2 Item 28 Description
2 Item 45 Description
I got started working with someone last night, but I had missed part of the complexity of my situation in not integrating table3. Like I said, fun puzzle... This gives me the relationship between the first 2 tables, but I am unsure how I can work in a 3rd table.
SELECT table1.rowid, table2.description
FROM table2
INNER JOIN table1
ON table2.id=table1.codeie1
OR table2.id=table1.codeie2
...
The database is a Faircom C-Tree DB over an ODBC connection, which is generally compatible with Mysql statements including UNION, WITH, INTERSECT, EXISTS, JOIN... There is no PIVOT function.
https://docs.faircom.com/doc/sqlref/sqlref.pdf
Perhaps it will work with or rather than in:
where exists (select 1
from table1 as t1
where t2.id = t1.codeid1 or
t2.id = t2.codeid2 or
. . .
);

Normalizing data from a pair of CSV tables

I'm trying to normalize some data, and can't seem to come up with a solution. What I have is a table like this:
weight position1 position2 position3
1 10 20 30
2 25 35 45
3 17 05 22
and one like this:
location position
6 1
7 1
8 2
9 2
10 2
11 3
12 3
How do I normalize the above so that given a location and a weight, I can find the value for a given position?
I can use Perl, Python, Excel, MySQL or pretty much any tool on the block to do the actual reshuffling of the data; where I'm having a problem is in coming up with a reasonable schema.
The desired outcome here is something like
if location == 11 -> position is 3
therefore,
if weight == 2 -> the value is 45
The only thing to do is "unpivot" your first table to this:
weight position value
1 1 10
1 2 20
1 3 30
2 1 25
2 2 35
2 3 45
3 1 17
3 2 05
3 3 22
The first two columns should contain unique pairs of values. If you have other information that only depends on weight, you would need another table for that. Same for positions.
Converting to the new model
If you already have the tables, then you can create the first table (t1) with this statement:
create table t1_new
select weight, 1 as position, position1 as value
from t1
union all
select weight, 2 as position, position2 as value
from t1
union all
select weight, 3 as position, position3 as value
from t1
Then, after verification of the result, drop t1, and rename t1_new to t1.
Querying from the new model
To query from these tables the value for a given location and weight, you should use a join:
select value
from t1
inner join t2 on t2.weight = t1.weight
where t2.location = 11
and t1.position = 3

How do i get the the combinations from the table?

i have a table with 61 records, where the first are "atomic" identified with their ID, and the other ones are combinations of the first 19 records like this
id pi1 pi2 pi3
1 1
2 2
3 3
...
19 19
20 1 13
21 1 18
...
24 2 6
25 2 7
26 2 7 16
in a webpage i have a form where the user clicks in the buttons that correspond to the atomic records (e.g. choice 2 and 7). when the user submits, the backend should consider that the user pressed 2 and 7 and then the resulting set should be:
id
2
7
25
the 24th row shouldn't be on the result because the user didn't pressed it
to get the basic records, i think this should suffice
SELECT *
FROM table
WHERE id = 2
OR id = 7
if more options are pressed, the query would have more operators. but to access the complex combinations i don't know how to achieve it without getting unwanted records just because one of the options is present in one of the Pi's (pi1, pi2 or pi3)
It is really better to change your data structure to have a junction table -- one row per "new" id and one per related id. However, you have the three columns.
I think this is the logic you want:
select t.*
from table t
where id in (2, 7) or
(2 in (pi1, pi2, pi3) and
7 in (pi1, pi2, pi3)
);
However, this will give you id 26 as well. Excluding that requires a bit more work. In MySQL, this should do the trick:
select t.*
from table t
where id in (2, 7) or
((2 in (pi1, pi2, pi3)) +
(7 in (pi1, pi2, pi3))
) = ((pi1 is not null) + (pi2 is not null) + (pi3 is not null))

Getting count of rows with multiple combinations

I need help with a MySQL query. We have a database (~10K rows) which I have simplified down to this problem.
We have 7 truck drivers who visit 3 out of a possible 9 locations, daily. Each day they visit exactly 3 different locations and each day they can visit different locations than the previous day. Here are representative tables:
Table: Drivers
id name
10 Abe
11 Bob
12 Cal
13 Deb
14 Eve
15 Fab
16 Guy
Table: Locations
id day address driver.id
1 1 Oak 10
2 1 Elm 10
3 1 4th 10
4 1 Oak 16
5 1 4th 16
6 1 Toy 16
7 1 Toy 11
8 1 5th 11
9 1 Law 11
10 2 Oak 11
11 2 4th 11
12 2 Toy 11
.........
We have data for a full year and we need to find out how many times each "route" is visited over a year, sorted from most to least.
From my high school math, I believe there are 9!/(6!3!) route combinations, or 84 in total. I want do something like:
Get count of routes where route addresses = 'Oak' and 'Elm' and '4th'
then run again
where route addresses = 'Oak' and 'Elm' and '5th'
then again and again, etc.Then sort the route counts, descending. But I don't want to do it 84 times. Is there a way to do this?
I'd be looking at GROUP_CONCAT
SELECT t.day
, t.driver
, GROUP_CONCAT(t.address ORDER BY t.address)
FROM mytable t
GROUP
BY t.day
, t.driver
What's not clear here, if there's an order to the stops on the route. Does the sequence make a difference, and how to we tell what the sequence is? To ask that a different way, consider these two routes:
('Oak','Elm','4th') and ('Elm','4th','Oak')
Are these equivalent (because it's the same set of stops) or are they different (because they are in a different sequence)?
If sequence of stops on the route distinguishes it from other routes with the same stops (in a different order), then replace the ORDER BY t.address with ORDER BY t.id or whatever expression gives the sequence of the stops.
Some caveats with GROUP_CONCAT: the maximum length is limited by the setting of group_concat_max_len and max_allowed_packet variables. Also, the comma used as the separator... if we combine strings that contain commas, then in our result, we can't reliably distinguish between 'a,b'+'c' and 'a'+'b,c'
We can use that query as an inline view, and get a count of the the number of rows with identical routes:
SELECT c.route
, COUNT(*) AS cnt
FROM ( SELECT t.day
, t.driver
, GROUP_CONCAT(t.address ORDER BY t.address) AS route
FROM mytable t
GROUP
BY t.day
, t.driver
) c
GROUP
BY c.route
ORDER
BY cnt DESC