Select from one table where not in another - mysql

I'm trying to find the rows that are in one table but not another, both tables are in different databases and also have different column names on the column that I'm using to match.
I've got a query, code below, and I think it probably works but it's way too slow:
SELECT `pm`.`id`
FROM `R2R`.`partmaster` `pm`
WHERE NOT EXISTS (
SELECT *
FROM `wpsapi4`.`product_details` `pd`
WHERE `pm`.`id` = `pd`.`part_num`
)
So the query is trying to do as follows:
Select all the ids from the R2R.partmaster database that are not in the wpsapi4.product_details database. The columns I'm matching are partmaster.id & product_details.part_num

Expanding on Sjoerd's anti-join, you can also use the easy to understand SELECT WHERE X NOT IN (SELECT) pattern.
SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN (SELECT pd.part_num FROM wpsapi4.product_details pd)
Note that you only need to use ` backticks on reserved words, names with spaces and such, not with normal column names.
On MySQL 5+ this kind of query runs pretty fast.
On MySQL 3/4 it's slow.
Make sure you have indexes on the fields in question
You need to have an index on pm.id, pd.part_num.

You can LEFT JOIN the two tables. If there is no corresponding row in the second table, the values will be NULL.
SELECT id FROM partmaster LEFT JOIN product_details ON (...) WHERE product_details.part_num IS NULL

To expand on Johan's answer, if the part_num column in the sub-select can contain null values then the query will break.
To correct this, add a null check...
SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN
(SELECT pd.part_num FROM wpsapi4.product_details pd
where pd.part_num is not null)
Sorry but I couldn't add a comment as I don't have the rep!

So there's loads of posts on the web that show how to do this, I've found 3 ways, same as pointed out by Johan & Sjoerd. I couldn't get any of these queries to work, well obviously they work fine it's my database that's not working correctly and those queries all ran slow.
So I worked out another way that someone else may find useful:
The basic jist of it is to create a temporary table and fill it with all the information, then remove all the rows that ARE in the other table.
So I did these 3 queries, and it ran quickly (in a couple moments).
CREATE TEMPORARY TABLE
`database1`.`newRows`
SELECT
`t1`.`id` AS `columnID`
FROM
`database2`.`table` AS `t1`
.
CREATE INDEX `columnID` ON `database1`.`newRows`(`columnID`)
.
DELETE FROM `database1`.`newRows`
WHERE
EXISTS(
SELECT `columnID` FROM `database1`.`product_details` WHERE `columnID`=`database1`.`newRows`.`columnID`
)

Related

MySQL - Join View and Table

I have a view called "xml_links" which is in the following format
model_number fk_link sd_link
10R46 www.fk.com/10R46 www.sd.com/10R46
10R47 www.fk.com/10R47 www.sd.com/10R47
And a table called "prohub"
page_url fk_price sd_price
www.fk.com/10R46 $155
www.sd.com/10R46 $161
www.fk.com/10R47 $117
www.sd.com/10R47 $146
I'm trying to join them using the following query and all I get is a blank table.
select
xml_links.model_number,
prohub.fk_price,
prohub.sd_price
from
xml_links, prohub
where
xml_links.fk_link=prohub.page_url
and
xml_links.sd_link=prohub.page_url
I'm looking for the following result:
model_number fk_price sd_price
10R46 $155 $161
10R47 $117 $146
Thanks for your help
They query that you are looking for
SELECT model_number,
(SELECT flipkart_price FROM pro_hub WHERE page_url=flipkart_link and flipkart_price is not null),
(SELECT snapdeal_price FROM pro_hub WHERE page_url=snapdeal_link and snapdeal_price is not null)
FROM xml_links
But this is really a patchwork solution for a database that neesd to be normalized
The solution that you are really looking for
Is a redesign of your tables.
In a proper RDBMS one does not have repetitive data. In your database values like 'www.fk.com/10R46' repeat not in just one table (which is bad enough but in two tables).
Secondly you are storing values in columns rather than in rows. The solution I have given you will work very well as long as you have only two vendors but what happens when you add a third vendor? YOu need to add a third column to both of your tables. That might takes hours if you have millions of records and during that time the site will be unresponsive.
what about this, is this what you needed?
select
model_number
, sum(fk_price) as fk_price
, sum(sd_price) as sd_price
from xml_links xml
INNER JOIN prohub pro
on right(pro.page_url, 5) = xml.model_number
GROUP BY model_number;

mariadb - LEFT OUTER JOIN

recently have migrated a server, and I have found this "error", I had mysql as a DB, and what I wanted (I'm not an expert on SQL), was to join 2 related tables by 1:N, as an example,
Table 1: Badges_Person
Table 2: Badges
Badges is a table with the badges, and Badges_Person contains a relation like (id_badge, id_person), easy, uh?
Well this SQL query always seemed to work fine:
SELECT id, nombre, descripcion, insignias.time, obtained
FROM insignias LEFT OUTER JOIN
(SELECT *, '1' as obtained
FROM insignias_user
WHERE insignias_user.username = 'Octal'
) as insignias_user_seleccionado
ON insignias.id = insignias_user_seleccionado.id_insignia;
The output of this query was the list of badges with a 'obtained' column (0 or 1) which says if the user 'Octal' has that badge or not.
So..., now, I have mariadb as DB, and it returns a different output, where all the rows are being marked with 'obtained' = 1.
I came here because as far as I have tried I have discarded all the silly posible errors.
I cannot speak to why the query is not working. That would seem to be a data issue -- all the rows match.
But, there is a better way to write the query:
SELECT i.id, i.nombre, i.descripcion, i.time, ius.obtained
FROM insignias i LEFT OUTER JOIN
insignias_user iu
ON i.id = ius.id_insignia AND ius.username = 'Octal';
This is much more efficient because the intermediate table does not need to be materialized and the database can make use of appropriate indexes on insgnias_user.
Also note: I changed the column references to qualified column names. The table alias may not be correct.
SELECT i.id, i.nombre, i.descripcion, i.time, IF(ius.id_insignia IS NULL, 0, 1)
FROM insignias i LEFT OUTER JOIN insignias_user ius
ON i.id = ius.id_insignia AND ius.username = 'Octal';
Ok, it works again, thank you.

Optimize "JOIN" query

this is my query from my source code
SELECT `truyen`.*, MAX(chapter.chapter) AS last_chapter
FROM (`truyen`)
LEFT JOIN `chapter` ON `chapter`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE \'%%\'
GROUP BY `truyen`.`Id`
LIMIT 250
When I install it on iFastnet host, It cause over 500,000 rows to be examined due to the join, and the query is being blocked (this would used over 100% of a CPU, which ultimately would cause server instability).
I also tried to add this line before the query, it fixed the problem above but lead to another issue making some of functions can not run correctly
mysql_query("SET SQL_BIG_SELECTS=1");
How can I fix this problem without buying another hosting ?
Thanks.
You might be looking for an INNER JOIN. That would remove results that do not match. I find INNER JOINs to be faster than LEFT JOINs.
However, I'm not sure what results you are actually looking for. But because you are using the GROUP BY, it looks like the INNER JOIN might work for you.
One thing I would recommend is copy and paste the query that it generates into SQL with DESCRIBE before it.
So if the query ended up being:
SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
You would type:
DESCRIBE SELECT truyen.*, MAX(chapter.chapter) AS last_chapter FROM truyen
LEFT JOIN chapter ON chapter.truyen = truyen.Id
WHERE truyen.title LIKE '%queryString%'
This will tell you if you could possibly ad an index to your table to JOIN on faster.
I hope this at least points you in the right direction.
Michael Berkowski seems to agree with the indexing, which you will be able to see from the DESCRIBE.
Please look if you have indexes on chapter.chapter and chapter.truyen. If not, set them and try again. If this is not successful try these suggestions:
Do you have the possibility to flag permanently on insert/update your last chapter in a column of your chapter table? Then you could use it to reduce the joined rows and you could drop out the GROUP BY. Maybe in this way:
SELECT `truyen`.*, `chapter`.`chapter` as `last_chapter`
FROM `truyen`, `chapter`
WHERE `chapter`.`truyen` = `truyen`.`Id`
AND `chapter`.`flag_last_chapter` = 1
AND `truyen`.`title` LIKE '%queryString%'
LIMIT 250
Or create a new table for that instead:
INSERT INTO new_table (truyen, last_chapter)
SELECT truyen, MAX(chapter) FROM chapter GROUP BY truyen;
SELECT `truyen`.*, `new_table`.`last_chapter`
FROM (`truyen`)
LEFT JOIN `new_table` ON `new_table`.`truyen` = `truyen`.`Id`
WHERE `truyen`.`title` LIKE '%queryString%'
GROUP BY `truyen`.`Id`
LIMIT 250
Otherwise you could just fetch the 250 rows of truyen, collect your truyen ids in an array and build another SQL Statement to select the 250 rows of the chapter table. I have seen in your original question that you can use PHP for that. So you could merge the results after that:
SELECT * FROM truyen
WHERE title LIKE '%queryString%'
LIMIT 250
SELECT truyen, MAX(chapter) AS last_chapter
FROM chapter
WHERE truyen in (comma_separated_ids_from_first_select)

MySQL query for finding rows that are in one table but not another

Let's say I have about 25,000 records in two tables and the data in each should be the same. If I need to find any rows that are in table A but NOT in table B, what's the most efficient way to do this.
We've tried it as a subquery of one table and a NOT IN the result but this runs for over 10 minutes and almost crashes our site.
There must be a better way. Maybe a JOIN?
Hope LEFT OUTER JOIN will do the job
select t1.similar_ID
, case when t2.similar_ID is not null then 1 else 0 end as row_exists
from table1 t1
left outer join (select distinct similar_ID from table2) t2
on t1.similar_ID = t2.similar_ID // your WHERE goes here
I would suggest you read the following blog post, which goes into great detail on this question:
Which method is best to select values present in one table but missing
in another one?
And after a thorough analysis, arrives at the following conclusion:
However, these three methods [NOT IN, NOT EXISTS, LEFT JOIN]
generate three different plans which are executed by three different
pieces of code. The code that executes EXISTS predicate is about 30%
less efficient than those that execute index_subquery and LEFT JOIN
optimized to use Not exists method.
That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT
EXISTS.
If the performance you're seeing with NOT IN is not satisfactory, you won't improve this performance by switching to a LEFT JOIN / IS NULL or NOT EXISTS, and instead you'll need to take a different route to optimizing this query, such as adding indexes.
Use exixts and not exists function instead
Select * from A where not exists(select * from B);
Left join. From the mysql documentation
If there is no matching row for the right table in the ON or USING
part in a LEFT JOIN, a row with all columns set to NULL is used for
the right table. You can use this fact to find rows in a table that
have no counterpart in another table:
SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id =
right_tbl.id WHERE right_tbl.id IS NULL;
This example finds all rows in left_tbl with an id value that is not
present in right_tbl (that is, all rows in left_tbl with no
corresponding row in right_tbl).

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.