Situation Overview
The current question is a problem about selecting values from two tables table A (material) and table B (MaterialRevision). However, The PK of table A might or Might not exist in Table B. When it doesnt exists, the query described in this question wont return the values of table A, but IT SHOULD. So basically here's whats happening :
The query is only returning values when A.id exists in B.id, when In fact, I need it to return values from A when A.id ALSO dont exist in B.id.
Problem:
Suppose two tables. Table Material and Table Material Revision.
Notice that the PK idMaterial is a FK in MaterialRevision.
Current "Mock" Tables
Query Objective
Obs: remember these two tables are a simplification of the real
tables.
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no RevisionDate, print BLANK ("") for the "last revision date".
What is wrongly happening
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no Revision for the Material, doesnt print the Material (SKIP).
Current Code
SELECT
Material.idMaterial,
Material.nextRevisionDate,
Material.obsolete,
lastRevisionDate
FROM Material,
(SELECT MaterialRevision.idMaterial, max(MaterialRevision.revisionDate) as "revisionDate" from MaterialRevision
GROUP BY MaterialRevision.idMaterial
) AS Revision
WHERE (Material.idMaterial = Revision.idMaterial AND Material.obsolete = 0)
References and Links used to reach the state described in this question
Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
MySQL get last date records from multiple
MySQL - How to SELECT based on value of another SELECT
MySQL Query Select where id does not exist in the JOIN table
PS I hope this question is correctly understood since it took me a lot of time to build it. I researched a lot in stackoverflow and after
several failed attempts I had no option but to ask for help.
You should use JOIN :
SELECT m.idMaterial, m.nextRevisionDate, mr.revisionDate AS "lastRevisionDate"
FROM Material m
LEFT JOIN MaterialRevision AS mr ON mr.idMaterial = m.idMaterial AND mr.revisionDate = (
SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE m.obsolete = 0
Here is an explanation of what INNER JOIN, LEFT JOIN and RIGHT JOIN are. (You will love them if you often cross tables in your queries)
As m.obsolete will always be true, I ommited it in the SELECT clause
You should use the left outer join instead of using the cross product.
You're query should be something like this:
SELECT idMaterial, nextRevisionableDate, obsolete,
revisionDate AS lastRevisionDate
FROM Material
LEFT OUTER JOIN MaterialRevision AS mr On
Material.idMaterial = MaterialRevision.id
AND mr.revisionDate = (SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE obsolete = 0;
Here you can find some documentation about types of join.
Related
recently have migrated a server, and I have found this "error", I had mysql as a DB, and what I wanted (I'm not an expert on SQL), was to join 2 related tables by 1:N, as an example,
Table 1: Badges_Person
Table 2: Badges
Badges is a table with the badges, and Badges_Person contains a relation like (id_badge, id_person), easy, uh?
Well this SQL query always seemed to work fine:
SELECT id, nombre, descripcion, insignias.time, obtained
FROM insignias LEFT OUTER JOIN
(SELECT *, '1' as obtained
FROM insignias_user
WHERE insignias_user.username = 'Octal'
) as insignias_user_seleccionado
ON insignias.id = insignias_user_seleccionado.id_insignia;
The output of this query was the list of badges with a 'obtained' column (0 or 1) which says if the user 'Octal' has that badge or not.
So..., now, I have mariadb as DB, and it returns a different output, where all the rows are being marked with 'obtained' = 1.
I came here because as far as I have tried I have discarded all the silly posible errors.
I cannot speak to why the query is not working. That would seem to be a data issue -- all the rows match.
But, there is a better way to write the query:
SELECT i.id, i.nombre, i.descripcion, i.time, ius.obtained
FROM insignias i LEFT OUTER JOIN
insignias_user iu
ON i.id = ius.id_insignia AND ius.username = 'Octal';
This is much more efficient because the intermediate table does not need to be materialized and the database can make use of appropriate indexes on insgnias_user.
Also note: I changed the column references to qualified column names. The table alias may not be correct.
SELECT i.id, i.nombre, i.descripcion, i.time, IF(ius.id_insignia IS NULL, 0, 1)
FROM insignias i LEFT OUTER JOIN insignias_user ius
ON i.id = ius.id_insignia AND ius.username = 'Octal';
Ok, it works again, thank you.
I wonder if any can help me understand something I'm trying to solve.
I'm working on a wordpress site but this is more a sql question as I'm just querying to get some results within a template file.
I have a gallery of pictures which are advert boxes, and I need to pull these in relation to a supplied movie name, to do this Im using some custom fields on the ad pic called 'adlink' (link off ad) and ad
I'm using the nextgen gallery plugin and querying those tables, and I have three tables in total that contain the data I need to query.
ngg_pictures, nggcf_field_values & nggcf_fields.
the nggcf tables are custom fields tables,
I have got so far I can get what I need in two seperate queries, but I can't combine these into one query as it means querying the nggcf_field_values table twice, which I can't seem to sort.
I have hardcoded the search criteria in for the mo, but the 'close-encounters' bit would be a passed var, and the '156' would be the pid from the first query.
SELECT `eg_ngg_pictures`.`filename`, `eg_nggcf_field_values`.`fid`, `eg_nggcf_field_values`.`pid`
FROM eg_ngg_pictures, eg_nggcf_field_values
WHERE ((`eg_nggcf_field_values`.`field_value` LIKE 'close-encounters') AND (`eg_nggcf_field_values`.`pid` = eg_ngg_pictures.pid))
SELECT `eg_nggcf_field_values`.`field_value`
FROM eg_nggcf_field_values, eg_nggcf_fields
WHERE ((`eg_nggcf_fields`.`field_name` = 'adlink') AND (`eg_nggcf_fields`.`id` = eg_nggcf_field_values.fid) AND (`eg_nggcf_field_values`.`pid` = '156'))
any help would be greatly appreciated, I can get the results with what I have, but I like to understand how to combine these two and write better SQl. Thanks MRO
After looking at the Wordpress extension, I think the eg_nggcf_fields is the table that contains the name for a custom field. The eg_nggcf_field_values table contains the values of that custom field per picture.
So if you're looking for two fields called moviename and adlink, you have to look up two rows in the field_values table. You can join a table twice if you give it a different alias:
select pic.filename
, pic.pid
, fv1.field_value as MovieName
, fv2.field_value as Adlink
from eg_ngg_pictures pic
inner join -- Find ID for the field called 'moviename'
eg_nggcf_fields f1
on f1.field_name = 'moviename'
inner join -- Find value for field moviename for this picture
eg_nggcf_field_values as fv1
on fv1.pid = pic.pid
and fv1.fid = f1.fid
inner join -- Find ID for the field called 'adlink'
eg_nggcf_fields f2
on f2.field_name = 'adlink'
inner join -- Find value for field adlink for this picture
eg_nggcf_field_values as fv2
on fv2.pid = pic.pid
and fv2.fid = f2.fid
where fv1.field_value like 'close-encounters'
First of all, I'd recommend sticking to modern ANSI syntax for JOINing tables, which means using the JOIN clause.
Instead of using:
FROM table1, table2 WHERE table1.id = table2.pid
use:
FROM Table 1 JOIN table2 ON table1.id = table2.id
For simplicity's sake, I'd also recommend you to alias tables, as that tends to make the code more readable. Instead of having to write out egg_ngg_pictures every time, you can simply refer to the alias you assign it instead.
Lastly, when you use a LIKE operator, you usually add a wild-card character (typically %. I.e. LIKE '%123' or LIKE '123%'). You seem to look only for complete matches, which means you can just stick to using =, as that should give you slightly better performance.
Now to rewrite your query, I'd use something like the following:
SELECT
pic.filename
, fieldval.fid
, fieldval.pid
, fieldval.field_value
FROM
eg_ngg_pictures pic
JOIN eg_nggcf_field_values fieldval ON fieldval.pid = pic.pid
JOIN eg_nggcf_fields fields ON fields.id = fieldval.fid
WHERE
((fieldval.field_value = 'close-encounters')
AND fields.field_name = 'ad_link'
Note that I am not able to test the query, as I do not have your schema. But by incorporating the two queries into a single query, the join on the field_Values.PID retreieved with the 'close_encounters' value should already exist.
If the query does not work, feel free to create a SQL fiddle with the relevant tables and some data, and I'll try and get it to work with that.
I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.
I just imported a large amount of data into two tables. Let's call them shipments and returns.
When trying to do a simple join (left or inner) based on any criteria in these two tables. query looks like it tries to do a cross join or find every combination instead of what the query should be pulling.
each table has an PK id field, but there is not FK relationship between the two other than some shared field.
I'm currently just trying to related them on shipment_id.
I feel this is a simple answer. Am I missing a reference or something obvious that is causing this? Thanks!
here's an example. This should returned under 100 rows. This instead returns hundreds of thousands.
SELECT r.*
FROM returns as r
left outer join shipments as s
on r.shipment_id = s.shipment_id
where r.date = '2011-06-20'
Here is a query that should work:
SELECT T0.*, T1.*
FROM shipments AS T0 LEFT JOIN returns AS T1 ON T0.shipment_id = T1.shipment_id
ORDER BY T0.shipment_id;
This query join assumes 1:1 on the shipment_id
It would be nice if you included the query you were using
You need to specify what you are joining on, otherwise it will do a cartesian join:
SELECT r.*
FROM returns as r
LEFT JOIN shipments as s ON s.shipment_id = r.shipment_id
where r.date = '2011-06-20'
Josh,
I would be interested in seeing what would happen if you forced a join to a specific record or set of records instead of the whole table. Assuming there is a shipment with an id of 5 in your table, you could try:
SELECT r.* FROM returns as r
left join shipments as s
ON 5 = r.shipment_id
WHERE r.date = '2011-06-20'
While just a fancy where clause, it would at least prove that the join you are attempting will eventually work correctly. The issue is that your on clause is always returning true, no matter what the value is. This could be because it's not interpreting the shipment_id as an integer, but instead as a true/false variable where any value evaluates to true.
Original Rejected Solution:
No Foreign Key relationship should be needed in order to make the joins happen. The PK id fields I'm assuming are an integer (or number, or whatever your rdms equivalent is)?
Can you past a snippet of your sql query?
Updating based on posted query:
I would add your explicit join criteria in order to rule out any funny business (my guess is since no criteria is specified, it's using 1=1, which always joins). So I would change your query to look like:
SELECT r.*
FROM returns as r
left join shipments as s ON
s.ShipId = R.ReturnId
where r.date = '2011-06-20'
The issue turned out to be very simple, just not readily apparent until going through all the columns. It turns out that the shipment ID was duplicated through every row as it hit the upper limit for the int datatype. This is why joins were returning every record.
After switching the datatype to bigint and reimporting, everything worked great. Thanks all for looking into it.
I get out of memory exception in my application, when the condition for IN or NOT IN is very large. I would like to know what is the limitation for that.
Perhaps you would be better off with another way to accomplish your query?
I suggest you load your match values into a single-column table, and then inner-join the column being queried to the single column in the new table.
Rather than
SELECT a, b, c FROM t1 WHERE d in (d1, d2, d3, d4, ...)
build a temp table with 1 column, call it "dval"
dval
----
d1
d2
d3
SELECT a, b, c FROM t1
INNER JOIN temptbl ON t1.d = temptbl.dval
Having to ask about limits when either doing a SQL query or database design is a good indicator that you're doing it wrong.
I only ever use IN and NOT IN when the condition is very small (under 100 rows or so). It performs well in those scenarios. I use an OUTER JOIN when the condition is large as the query doesn't have to look up the "IN" condition for every tuple. You just have to check the table that you want all rows to come from.
For "IN" the join condition IS NOT NULL
For "NOT IN" the join condition IS NULL
e.g.
/* Get purchase orders that have never been rejected */
SELECT po.*
FROM PurchaseOrder po LEFT OUTER JOIN
(/* Get po's that have been rejected */
SELECT po.PurchaesOrderID
FROM PurchaseOrder po INNER JOIN
PurchaseOrderStatus pos ON po.PurchaseOrderID = pos.PurchaseOrderID
WHERE pos.Status = 'REJECTED'
) por ON po.PurchaseOrderID = por.PurchaseOrderID
WHERE por.PurchaseOrderID IS NULL /* We want NOT IN */
I"m having a similar issue but only passing 100 3 digit ids in my IN clause. When I look at the stack trace, it actually cuts off the comma separate values in the IN clause. I don't get an error, I just don't get all the results to return. Has anyone had an issue like this before? If its relevant, I'm using the symfony framework... I'm checking to see if its a propel issue but just wanted to see if it could be sql
I have used IN with quite large lists of IDs - I suspect that the memory problem is not in the query itself. How are you retrieving the results?
This query, for example is from a live site:
SELECT DISTINCT c.id, c.name FROM categories c
LEFT JOIN product_categories pc ON c.id = pc.category_id
LEFT JOIN products p ON p.id = pc.product_id
WHERE p.location_id IN (
955,891,901,877,736,918,900,836,846,914,771,773,833,
893,782,742,860,849,850,812,945,775,784,746,1036,863,
750,763,871,817,749,838,986,794,867,758,923,804,733,
949,808,837,741,747,954,939,865,857,787,820,783,760,
911,745,928,818,887,847,978,852
) ORDER BY c.name ASC
My first pass at the code is terribly naive and there are about 10 of these queries on a single page and the database doesn't blink.
You could, of course, be running a list of 100k values which would be a different story altogether.
I don't know what the limit is, but I've run into this problem before as well. I had to rewrite my query something like this:
select * from foo
where id in (select distinct foo_id from bar where ...)