Removing somewhat duplicate records in table using MySQL

Removing somewhat duplicate records in table using MySQL - mysql

I have a table that compares the competitiveness of airline routes in United States. So, some of the fields in the table are id, route_id1, route_id2, airline_id1, airline_id2, sources_airport_id, and destination_airport_id.
This table is the result of self joining the routes table which consists of route maps.
But as the result, the table has somewhat duplicate records.
For example,
route 1 is competitive with route2 because they have the same source_airport and destination_airport but different airline_id. But I have two records comparing route1 to route2 and route2 to route1. They are the same comparison, but just ordered differently.
I've tried to fetch the duplicates by self-joining:
SELECT t1.*
FROM routes AS t1, routes AS t2
WHERE t1.route_id1 = t2.route_id2 AND t1.route_id2 = t2.route_id1
But this query just gets the same number of records in the table.
How do I get rid of the "duplicate" data?
Thanks in advance.

The problem is that you have no condition to separate t1 and t2. First you'll get duplicates where t1 and t2 are swapped. Secondly, if any rows have route_id1 = route_id2, you'll get those rows too, in both t1 and t2 of the result set.
The simplest way to get around this would be:
SELECT t1.* FROM routes AS t1, routes AS t2
WHERE t1.route_id1 = t2.route_id2 AND t1.route_id2 = t2.route_id1
AND t2.id > t1.id
The added criterion is that one row must have a larger id than the other. This means that t1, as returned, will always be the row with the lower id. You can of course replace it with a < or swap the parameters to get the row with the upper id.
That will get rid of most of the duplicates. If you have proper duplicates too in the database, those will create some duplicate rows in the result set of the above query. The reason is that a "duplicate" might be detected as being a "duplicate" of two different corresponding rows, which in turn are actual duplicates of each other.

in the select use the actual names of the fields and use the DISTINCT clause instead of using t1.* .
in the list of field make sure you do not include the airline_id as those are different and they would make your records not duplicates.

Have you tried using "SELECT DISTINCT t1.* FROM ..."?

Related

Determining whether each row exists in another MySQL table

I have two tables that are very similar. For example, let's say that each row has two ID numbers, and a data value. The first ID number may occur once, twice, or not be included, and the second ID number is either 1 or -1. The data value is not important, but for the sake of this example, we'll say it's an integer. For each pair of ID numbers, there can only be one data value, so if I have a data point where the ID's are 10 and 1, there won't be another 10 and 1 row with a different data value. Similarly, in the other table, the data point with ID's 10 and 1 will be the same as in the first table. I want to be able to select the rows that exist in both tables for the sake of changing the data value in all of the rows that are in both. My command for MySQL so far is as follows:
SELECT DISTINCT * FROM schema.table1
WHERE EXISTS (SELECT * from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
I want to be able to have this code select all the rows in table1 that are also in table2, but allow me to edit table1 values.
I understand that by creating a union of the two tables, I can see the rows that exist in both tables, but would this allow me to make changes to the actual data values if I changed the values in the merged set? For example, if I did:
SELECT DISTINCT * FROM schema.table1 inner join schema.table2
WHERE schema.table1.ID1 = schema.table2.ID1
schema.table1.ID2 = schema.table2.ID2;
If I call UPDATE on the rows that I get from this query, would the actual values in table1/table2 be changed or is this union just created in dynamic memory and I would just be changing values that get deleted when the query is over?

Update as follows:
UPDATE table1 SET data = whateverupdate
WHERE ID1 IN (SELECT ID1 from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
In your inner select statement, you cannot do a select * you'll have to select a particular column. This should work because your inner select finds the row in question and feeds it to your update statement. That being said, your inner select has to return the right row you need, else, the wrong row will be updated. Hope this helps.

How to write a mysql join query using IN()

Can you help me write my MySQL join query. Here is what i have so far:
SELECT * FROM table1 LEFT JOIN table2 ON table2.id IN
(table1.comma_separated_ids) WHERE table1.id = [some id]
where table1.comma_separated_ids is a VARCHAR column containing a list of comma separated IDs (integers) that relate to IDs in table2.
The above query returns only one row when it should return every row in table1.comma_separated_ids that has a matching row in table2
What I'm actually trying to do is a little more complex but it's hard to explain so I'm starting here. Any help?

In MySQL, you cannot put a comma-separated list as a single argument to in. It is treated as a string, a single string.
You can use find_in_set():
SELECT *
FROM table1 LEFT JOIN
table2
ON find_in_set(table2.id, table1.comma_separated_ids)
WHERE table1.id = XXX;
However, the bigger issue is that you are storing ids in a comma-separated list. These should be in a separate junction table, with one row per id. It is bad enough to store lists in strings; storing integer ids is even worse.

Performing Post SQL query on result set?

Basically what I want to do can be put neatly in the following two steps.
Execute a query on a particular table. I then get the result set.
One of the columns in the result set say 'id' has only numbers in it. I would want to take this number
'id' in every row in the result set, join it with another table called ID_NAMES & replace the
individual ids of every row with corresponding names obtained by joining with ID_NAMES table.
In a sense this is like performing post SQL query on the result set I obtain from executing pre SQL query.
Is there anyway I can accomplish this?

SELECT table1.id, table2.name FROM table1 INNER JOIN table2 on table1.id = table2.id
Try this query....
Refer at JNevill

mysql comparing two tables then insert into a new one [duplicate]

I have two identical tables. I want to compare these two tables and getting the result from them. The condition are:
each record in TABLE1 grouped by TID will be compared to all records in TABLE2 grouped by their each TID.
if each grouped record in TABLE1 are to be discovered in TABLE2 (records in TABLE2 that grouped by each tid, too), as many as N (N is the user input variable), then that record will be inserted into new table.
For example, like the ss below, ITEM C-F-A grouped by TID 2 has 3 occurrences in table2, thus they will be inserted into new table:
I've already tried writing the code for this and it worked (vb.net), but the compiler takes ridiculous time to complete. The main cause is I'm processing a huge database.
The method I've done in program is populate the two table into 2d array. assigning value to array while comparing the two element with if clause.
Below is the 2d array that I've created:
But this method is really expensive, my real database on pic above is 1st 2d array has 2k records and 2nd 2d array has 800 records, and when I try to calculate the estimate time for compiling to completed, it showed a fantastic number, about 16 hours.. gosh!!
So I was wondering, whether this problem can be solved with mysql query,
or other method that is more effective than what I have done?

INSERT INTO tbl3
SELECT tbl1.TID, tbl1.ITEM
FROM tbl1
JOIN tbl2 ON tbl2.TID = tbl1.TID AND tbl2.ITEM = tbl1.ITEM
This will insert a record into tbl3 for each record in tbl1 that has a corresponding record in tbl2 identified by TID and ITEM.
This assumes that TID/ITEM is a unique index in both tbl1 and tbl2.

Ok, here's a wild, untested, guess (WUG).
The approach goes like this:
You need a list of TID's from table1. So you build a distinct list (inner most query).
You use that list in a where clause when selecting from table2, so that you only get rows that have TIDs in table1. You group that query, and use HAVING to then limit the rows to only those with a count > X.
Now you have a list of TIDs that match those in table1 and have more than X entries in table2. You select those rows.
Those are used a the source of an insert statement into table1.
The SQL might looks something like:
insert into table1
values (select * from table2 where tid in
(select tid, count(*) as cnt
from table2
where tid in (select distinct tid from table1)
group by tid
having cnt > 10)));
I doubt the syntax is correct (cant remember the exact syntax for an insert from a select), and make no claim it will work off the bat, but its what my first shot would be if I wanted to do it all in one query.

mysql getting result from two identical table

I have two identical tables. I want to compare these two tables and getting the result from them. The condition are:
each record in TABLE1 grouped by TID will be compared to all records in TABLE2 grouped by their each TID.
if each grouped record in TABLE1 are to be discovered in TABLE2 (records in TABLE2 that grouped by each tid, too), as many as N (N is the user input variable), then that record will be inserted into new table.
For example, like the ss below, ITEM C-F-A grouped by TID 2 has 3 occurrences in table2, thus they will be inserted into new table:
I've already tried writing the code for this and it worked (vb.net), but the compiler takes ridiculous time to complete. The main cause is I'm processing a huge database.
The method I've done in program is populate the two table into 2d array. assigning value to array while comparing the two element with if clause.
Below is the 2d array that I've created:
But this method is really expensive, my real database on pic above is 1st 2d array has 2k records and 2nd 2d array has 800 records, and when I try to calculate the estimate time for compiling to completed, it showed a fantastic number, about 16 hours.. gosh!!
So I was wondering, whether this problem can be solved with mysql query,
or other method that is more effective than what I have done?

INSERT INTO tbl3
SELECT tbl1.TID, tbl1.ITEM
FROM tbl1
JOIN tbl2 ON tbl2.TID = tbl1.TID AND tbl2.ITEM = tbl1.ITEM
This will insert a record into tbl3 for each record in tbl1 that has a corresponding record in tbl2 identified by TID and ITEM.
This assumes that TID/ITEM is a unique index in both tbl1 and tbl2.

Ok, here's a wild, untested, guess (WUG).
The approach goes like this:
You need a list of TID's from table1. So you build a distinct list (inner most query).
You use that list in a where clause when selecting from table2, so that you only get rows that have TIDs in table1. You group that query, and use HAVING to then limit the rows to only those with a count > X.
Now you have a list of TIDs that match those in table1 and have more than X entries in table2. You select those rows.
Those are used a the source of an insert statement into table1.
The SQL might looks something like:
insert into table1
values (select * from table2 where tid in
(select tid, count(*) as cnt
from table2
where tid in (select distinct tid from table1)
group by tid
having cnt > 10)));
I doubt the syntax is correct (cant remember the exact syntax for an insert from a select), and make no claim it will work off the bat, but its what my first shot would be if I wanted to do it all in one query.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Removing somewhat duplicate records in table using MySQL - mysql

in the select use the actual names of the fields and use the DISTINCT clause instead of using t1.* . in the list of field make sure you do not include the airline_id as those are different and they would make your records not duplicates.

Have you tried using "SELECT DISTINCT t1.* FROM ..."?

Related

Determining whether each row exists in another MySQL table

How to write a mysql join query using IN()

Performing Post SQL query on result set?

mysql comparing two tables then insert into a new one [duplicate]

mysql getting result from two identical table

Categories

Resources