I've got two tables that have one to many associations on a pmid. So if one table has an pmid, the second table should have multiple rows with the same pmid. However, something went sideways and I'm missing my latest batch of pmids in the second table. These queries, should help illustrate the problem, but I can't figure out how to get the ids from the first table that are actually missing in the second table.
select count(*) from abstract_mesh am; #2167101
select count(*) from abstract_mesh am
join abstracts a on am.pmid = a.pmid; #2133848
select 2167101 - 2133848; #33253
select count(*) from abstract_mesh where pmid is NULL; #33253
So as you can see there are 33,253 rows in abstract_mesh that have no pmids. I simply want to identify which pmids I should be interested in from the abstracts table.
You can use NOT EXITS to filter out the records, e.g.
select *
from table1 t1
where not exists
select * from table2 t2 where t1.pmid = t2.pmid;
You need and anti-join. SQL lacks an explicit anti-join operator. Standard SQL has EXCEPT (relational minus) by mySQL lacks this. Here I'm using NOT IN <table expression> to simulate anti-join (though not 100% sure I have the tables the right way round):
SELECT DISTINCT pmid
FROM abstract_mesh
WHERE pmid NOT IN ( SELECT pmid FROM abstracts );
Related
I need to check two tables and find inconsistencies, ie where the value of table T1 is not present in the italy_cities table. I'll explain:
T1: Includes personal data (with place of birth)
italy_city: Includes all the municipalities of Italy.
Table T1 has about 9000 tuples.
T2 has 7,903 tuples.
Using "NOT IN" the query takes approximately 16 seconds to execute.
Here is the query:
SELECT
`T1`.*
FROM
T1
WHERE
(
`T1`.place NOT IN ( SELECT municipality FROM italy_cities )
)
MY QUESTION
what is the best and fast option to check for inconsistencies? to check all the "incorrect" municipalities that do not exist in the official database?
Thanks in advance
I generally recommend NOT EXISTS for this purpose:
SELECT T1.*
FROM T1
WHERE NOT EXISTS (SELECT 1
FROM italy_cities ic
WHERE t1.place = ic.municipality
);
Why? There are two reasons:
NOT IN does not do what you expect if the subquery returns any NULL values. If even one value is NULL all rows end up being filtered out.
This version of the query can take advantage of an index on italy_cities(municipality) which seems like a reasonable index on the table.
Not exists can perform better but there is also another way which is left join as follows:
SELECT T1.*
FROM T1
LEFT JOIN italy_cities I ON I.municipality = T1.PLACE
WHERE I.municipality IS NULL;
I have a table, System, with a bunch of fields including System.serial.
I have a list of serial numbers that I want to get the status of.
Simple enough:
Select * from System where System.serial in ('s1','s2', 'sn');
However the list of serial numbers also has serials NOT IN the System table.
Obviously they are not in the results.
I want the missing serials to show in the results also but with no data.
The best way I can think of doing this is to make a temporary table with one column, serial, and then left join System on it.
How can I do this without creating a temporary table?
Something like:
Select listOfSerials.serial, System.*
from (Select ('s1','s2', 'sn') as serial ) as ListOfSerials
left join System on System.serial = ListOfSerials.serial;
Thanks,
Ryan
You're on the right track with your solution of creating a virtual table with which to do LEFT JOIN against your real data.
You can create a derived table as a series of UNIONed SELECT statements that select literal values with no table reference.
SELECT listOfSerials.serial, System.*
FROM (
SELECT 's1' AS serial
UNION SELECT 's2'
UNION SELECT 'sn'
) AS ListOfSerials
LEFT JOIN System ON System.serial = ListOfSerials.serial;
You only need to define a column alias in the first SELECT in the UNION. The rest are required to use that column alias.
Creating a reference table to store the list of serials is probably your best option. That would allow you to write a query like:
SELECT r.serial reference_serial, s.serial system_serial
FROM reference_table r
LEFT JOIN system_table s ON s.serial = r.serial
With the LEFT JOIN, serials declared in the reference table but unavailable in the system table will have second column set to NULL.
A quick and dirty work around is to use UNIONed subqueries to emulate the reference table:
SELECT r.serial reference_serial, s.serial system_serial
FROM (
SELECT 'serial1' AS serial
UNION ALL SELECT 'serial2'
UNION ALL SELECT 'serial2'
...
) r
LEFT JOIN system_table s ON s.serial = r.serial
I can't understand whats happening...
I use two sql queries which do not return the same thing...
this one :
SELECT * FROM table1 t1 JOIN table1 t2 on t1.attribute1 = t2.attribute1
I get 10 rows
this other :
SELECT * FROM table1 NATURAL JOIN table1
I get 8 rows
With the NATURAL JOIN 2 rows aren't returned... I look for the missing lines and they are the same values for the attribute1 column ...
It's impossible for me.
If anyone has an answer I could sleep better ^^
Best regards
Max
As was pointed out in the comments, the reason you are getting a different row count is that the natural join is connecting your self join using all columns. All columns are being compared because the same table appears on both sides of the join. To test this hypothesis, just check the column values from both tables, which should all match.
The moral of the story here is to avoid natural joins. Besides not being clear as to the join condition, the logic of the join could easily change should table structure change, e.g. if a new column gets added.
Follow the link below for a small demo which tried to reproduce your current results. In a table of 8 records, the natural join returns 8 records, whereas the inner join on one attribute returns 10 records due to some duplicate matching.
Demo
You need to 'project away' the attribute you don't want used in the join e.g. in a derived table (dt):
SELECT *
FROM table1
NATURAL JOIN ( SELECT attribute1 FROM table1 ) dt;
I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.
I'm trying to find the rows that are in one table but not another, both tables are in different databases and also have different column names on the column that I'm using to match.
I've got a query, code below, and I think it probably works but it's way too slow:
SELECT `pm`.`id`
FROM `R2R`.`partmaster` `pm`
WHERE NOT EXISTS (
SELECT *
FROM `wpsapi4`.`product_details` `pd`
WHERE `pm`.`id` = `pd`.`part_num`
)
So the query is trying to do as follows:
Select all the ids from the R2R.partmaster database that are not in the wpsapi4.product_details database. The columns I'm matching are partmaster.id & product_details.part_num
Expanding on Sjoerd's anti-join, you can also use the easy to understand SELECT WHERE X NOT IN (SELECT) pattern.
SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN (SELECT pd.part_num FROM wpsapi4.product_details pd)
Note that you only need to use ` backticks on reserved words, names with spaces and such, not with normal column names.
On MySQL 5+ this kind of query runs pretty fast.
On MySQL 3/4 it's slow.
Make sure you have indexes on the fields in question
You need to have an index on pm.id, pd.part_num.
You can LEFT JOIN the two tables. If there is no corresponding row in the second table, the values will be NULL.
SELECT id FROM partmaster LEFT JOIN product_details ON (...) WHERE product_details.part_num IS NULL
To expand on Johan's answer, if the part_num column in the sub-select can contain null values then the query will break.
To correct this, add a null check...
SELECT pm.id FROM r2r.partmaster pm
WHERE pm.id NOT IN
(SELECT pd.part_num FROM wpsapi4.product_details pd
where pd.part_num is not null)
Sorry but I couldn't add a comment as I don't have the rep!
So there's loads of posts on the web that show how to do this, I've found 3 ways, same as pointed out by Johan & Sjoerd. I couldn't get any of these queries to work, well obviously they work fine it's my database that's not working correctly and those queries all ran slow.
So I worked out another way that someone else may find useful:
The basic jist of it is to create a temporary table and fill it with all the information, then remove all the rows that ARE in the other table.
So I did these 3 queries, and it ran quickly (in a couple moments).
CREATE TEMPORARY TABLE
`database1`.`newRows`
SELECT
`t1`.`id` AS `columnID`
FROM
`database2`.`table` AS `t1`
.
CREATE INDEX `columnID` ON `database1`.`newRows`(`columnID`)
.
DELETE FROM `database1`.`newRows`
WHERE
EXISTS(
SELECT `columnID` FROM `database1`.`product_details` WHERE `columnID`=`database1`.`newRows`.`columnID`
)