mysql getting result from two identical table - mysql

I have two identical tables. I want to compare these two tables and getting the result from them. The condition are:
each record in TABLE1 grouped by TID will be compared to all records in TABLE2 grouped by their each TID.
if each grouped record in TABLE1 are to be discovered in TABLE2 (records in TABLE2 that grouped by each tid, too), as many as N (N is the user input variable), then that record will be inserted into new table.
For example, like the ss below, ITEM C-F-A grouped by TID 2 has 3 occurrences in table2, thus they will be inserted into new table:
I've already tried writing the code for this and it worked (vb.net), but the compiler takes ridiculous time to complete. The main cause is I'm processing a huge database.
The method I've done in program is populate the two table into 2d array. assigning value to array while comparing the two element with if clause.
Below is the 2d array that I've created:
But this method is really expensive, my real database on pic above is 1st 2d array has 2k records and 2nd 2d array has 800 records, and when I try to calculate the estimate time for compiling to completed, it showed a fantastic number, about 16 hours.. gosh!!
So I was wondering, whether this problem can be solved with mysql query,
or other method that is more effective than what I have done?

INSERT INTO tbl3
SELECT tbl1.TID, tbl1.ITEM
FROM tbl1
JOIN tbl2 ON tbl2.TID = tbl1.TID AND tbl2.ITEM = tbl1.ITEM
This will insert a record into tbl3 for each record in tbl1 that has a corresponding record in tbl2 identified by TID and ITEM.
This assumes that TID/ITEM is a unique index in both tbl1 and tbl2.

Ok, here's a wild, untested, guess (WUG).
The approach goes like this:
You need a list of TID's from table1. So you build a distinct list (inner most query).
You use that list in a where clause when selecting from table2, so that you only get rows that have TIDs in table1. You group that query, and use HAVING to then limit the rows to only those with a count > X.
Now you have a list of TIDs that match those in table1 and have more than X entries in table2. You select those rows.
Those are used a the source of an insert statement into table1.
The SQL might looks something like:
insert into table1
values (select * from table2 where tid in
(select tid, count(*) as cnt
from table2
where tid in (select distinct tid from table1)
group by tid
having cnt > 10)));
I doubt the syntax is correct (cant remember the exact syntax for an insert from a select), and make no claim it will work off the bat, but its what my first shot would be if I wanted to do it all in one query.

Related

SQL NOT LIKE returning all rows instead of dissimilarities

I have two tables, with 2 PKs. Table 1 has 478 records. Field 1 is a unique ID for that table only. Table 1 field 2 is a ID (shared with table 2) and 3rd field is a category field. IDs from field 2 can be repeated within a table, but I cannot have ID+category twice.
I have a 2nd table, that contains 757 records. It has a ID column and a category column (such as table1) and I want to know which records from table 1 are included on table 2. By the moment I am just checking which IDs are included in both tables (I want to clean up the database so I can use an AND query to obtain ID + category)
My SQL query does not return the desired result. When I do
SELECT DISTINCT(table1.field1) FROM table1, table2 WHERE table1.ID = table2.ID;
I get all the results that do match, but, when I do the opposite
SELECT table1.field1 FROM table1, table2 WHERE table1.ID != table2.ID;
SQL gives all the rows from table 1, when, the expected outcome would be
total rows from table 1 - IDs that do match with the ones at table 2
I've tried to invert the order in which the query is displayed as:
SELECT table1.field1 FROM table1, table2 WHERE table2.ID != table1.ID;
But then a loop occurs and I get 36000+ results which is, of course, impossible (I imagine that checking a bigger record table against a smaller one makes the small one loop over and over, and seeing that I get the full table all the time, the loop is Xtimes478, hence the 36000+ results).
I have checked this matched/unmatched query using R (just for testing) and I got 170 matches (that I can obtain in SQL) and 308 "not coincident" results (170+308=478, so I imagine it makes sense even if I am using R instead of a proper relational database system)
How can I search for unmatched IDs in a query rather than checking for matched ones and substracting from total? How to get the 308 records that do not match?
If you want values in table 1 that are not in table 2, then use not exists or something similar:
select t1.*
from table1 t1
where not exists (select 1 from table2 where t2.id = t1.id);

Determining whether each row exists in another MySQL table

I have two tables that are very similar. For example, let's say that each row has two ID numbers, and a data value. The first ID number may occur once, twice, or not be included, and the second ID number is either 1 or -1. The data value is not important, but for the sake of this example, we'll say it's an integer. For each pair of ID numbers, there can only be one data value, so if I have a data point where the ID's are 10 and 1, there won't be another 10 and 1 row with a different data value. Similarly, in the other table, the data point with ID's 10 and 1 will be the same as in the first table. I want to be able to select the rows that exist in both tables for the sake of changing the data value in all of the rows that are in both. My command for MySQL so far is as follows:
SELECT DISTINCT * FROM schema.table1
WHERE EXISTS (SELECT * from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
I want to be able to have this code select all the rows in table1 that are also in table2, but allow me to edit table1 values.
I understand that by creating a union of the two tables, I can see the rows that exist in both tables, but would this allow me to make changes to the actual data values if I changed the values in the merged set? For example, if I did:
SELECT DISTINCT * FROM schema.table1 inner join schema.table2
WHERE schema.table1.ID1 = schema.table2.ID1
schema.table1.ID2 = schema.table2.ID2;
If I call UPDATE on the rows that I get from this query, would the actual values in table1/table2 be changed or is this union just created in dynamic memory and I would just be changing values that get deleted when the query is over?
Update as follows:
UPDATE table1 SET data = whateverupdate
WHERE ID1 IN (SELECT ID1 from schema.table1
WHERE schema.table1.ID1 = schema.table2.ID1
and schema.table1.ID2 = schema.table2.ID2);
In your inner select statement, you cannot do a select * you'll have to select a particular column. This should work because your inner select finds the row in question and feeds it to your update statement. That being said, your inner select has to return the right row you need, else, the wrong row will be updated. Hope this helps.

MySQL sub select and return multiple records from the sub select table

I don't know if this is possible, but can mysql do a sub select and retrieve multiple records?
Here is my simplified query:
SELECT table1.*,
(
SELECT table2.*
FROM Table2 table2
WHERE table2.key_id = table1.key_id
)
FROM Table1 table1
Basically, Table2 has X amount of records that I need to pull back in the query and I don't want to have to run a secondary query (for instance get the results from Table1 and then loop over those results and then get all the results from Table2).
Thanks.
No. The subquery in the SELECT clause is called a scalar subquery. A scalar subquery has two important properties:
It can only retrieve one column.
It can only retrieve zero or one rows.
A scalar subquery -- as its name implies -- substitutes for a scalar value in an expression. If the subquery returns no rows, the value used in the expression is NULL.
In your case, you can use a LEFT JOIN instead:
SELECT t1.*, t2.*
FROM Table1 t1 LEFT JOIN
Table2 t2
ON t2.key_id = t1.keyid;
Note that table aliases are a good thing. However, they should make the query simpler, so repeating the table name is not a big win.
MySQL can do a subquery that returns multiple rows or multiple columns, but it's not valid to do that in a scalar context.
You're putting a subquery in a scalar context. In other words, in the select-list, a subquery must return one column and one row (or zero rows), because it will be used for one item on the respective row as it uses the select-list to build a result.

mysql comparing two tables then insert into a new one [duplicate]

I have two identical tables. I want to compare these two tables and getting the result from them. The condition are:
each record in TABLE1 grouped by TID will be compared to all records in TABLE2 grouped by their each TID.
if each grouped record in TABLE1 are to be discovered in TABLE2 (records in TABLE2 that grouped by each tid, too), as many as N (N is the user input variable), then that record will be inserted into new table.
For example, like the ss below, ITEM C-F-A grouped by TID 2 has 3 occurrences in table2, thus they will be inserted into new table:
I've already tried writing the code for this and it worked (vb.net), but the compiler takes ridiculous time to complete. The main cause is I'm processing a huge database.
The method I've done in program is populate the two table into 2d array. assigning value to array while comparing the two element with if clause.
Below is the 2d array that I've created:
But this method is really expensive, my real database on pic above is 1st 2d array has 2k records and 2nd 2d array has 800 records, and when I try to calculate the estimate time for compiling to completed, it showed a fantastic number, about 16 hours.. gosh!!
So I was wondering, whether this problem can be solved with mysql query,
or other method that is more effective than what I have done?
INSERT INTO tbl3
SELECT tbl1.TID, tbl1.ITEM
FROM tbl1
JOIN tbl2 ON tbl2.TID = tbl1.TID AND tbl2.ITEM = tbl1.ITEM
This will insert a record into tbl3 for each record in tbl1 that has a corresponding record in tbl2 identified by TID and ITEM.
This assumes that TID/ITEM is a unique index in both tbl1 and tbl2.
Ok, here's a wild, untested, guess (WUG).
The approach goes like this:
You need a list of TID's from table1. So you build a distinct list (inner most query).
You use that list in a where clause when selecting from table2, so that you only get rows that have TIDs in table1. You group that query, and use HAVING to then limit the rows to only those with a count > X.
Now you have a list of TIDs that match those in table1 and have more than X entries in table2. You select those rows.
Those are used a the source of an insert statement into table1.
The SQL might looks something like:
insert into table1
values (select * from table2 where tid in
(select tid, count(*) as cnt
from table2
where tid in (select distinct tid from table1)
group by tid
having cnt > 10)));
I doubt the syntax is correct (cant remember the exact syntax for an insert from a select), and make no claim it will work off the bat, but its what my first shot would be if I wanted to do it all in one query.

Removing somewhat duplicate records in table using MySQL

I have a table that compares the competitiveness of airline routes in United States. So, some of the fields in the table are id, route_id1, route_id2, airline_id1, airline_id2, sources_airport_id, and destination_airport_id.
This table is the result of self joining the routes table which consists of route maps.
But as the result, the table has somewhat duplicate records.
For example,
route 1 is competitive with route2 because they have the same source_airport and destination_airport but different airline_id. But I have two records comparing route1 to route2 and route2 to route1. They are the same comparison, but just ordered differently.
I've tried to fetch the duplicates by self-joining:
SELECT t1.*
FROM routes AS t1, routes AS t2
WHERE t1.route_id1 = t2.route_id2 AND t1.route_id2 = t2.route_id1
But this query just gets the same number of records in the table.
How do I get rid of the "duplicate" data?
Thanks in advance.
The problem is that you have no condition to separate t1 and t2. First you'll get duplicates where t1 and t2 are swapped. Secondly, if any rows have route_id1 = route_id2, you'll get those rows too, in both t1 and t2 of the result set.
The simplest way to get around this would be:
SELECT t1.* FROM routes AS t1, routes AS t2
WHERE t1.route_id1 = t2.route_id2 AND t1.route_id2 = t2.route_id1
AND t2.id > t1.id
The added criterion is that one row must have a larger id than the other. This means that t1, as returned, will always be the row with the lower id. You can of course replace it with a < or swap the parameters to get the row with the upper id.
That will get rid of most of the duplicates. If you have proper duplicates too in the database, those will create some duplicate rows in the result set of the above query. The reason is that a "duplicate" might be detected as being a "duplicate" of two different corresponding rows, which in turn are actual duplicates of each other.
in the select use the actual names of the fields and use the DISTINCT clause instead of using t1.* .
in the list of field make sure you do not include the airline_id as those are different and they would make your records not duplicates.
Have you tried using "SELECT DISTINCT t1.* FROM ..."?