MySQL query and compare two different tables - mysql

I'm very new to SQL queries, so forgive me if this is a really easy question.
I have 2 database tables HWC and collection, HWC.id is referenced in collection.col
HWC
- id (PRIMARY)
- stuff
- more stuff
- lots more stuff
- Year
collection
- id (PRIMARY)
- userId
- col
Question:
I want to query the collection table for a specific user to see what entries from HWC they are missing.
I don't even know where to start logically, I don't expect anyone to build the query for me, but pointing me in the correct direction would be very much appreciated.

You want items from the collect that the user is missing. This suggests a left outer join. In particular, you want to keep everything in the HWC table and find those things that are missing:
select hwc.*
from hwc left join
collection c
on hwc.id = c.col
where hwc.id is null and c.user_id = #UserId;
When learning SQL, students often learn this syntax:
select hwc.*
from hwc
where hwc.id not in (select c.col from collection c where c.user_id = #UserId);
This is perfectly good SQL. Some databases don't do a great job optimizing not in. And, it can behave unexpectedly when c.col is NULL. For these reasons, this is often rewritten as a not exists query:
select hwc.*
from hwc
where not exists (select 1
from collection c
where c.col = hwc.id and c.user_id = #UserId
);
I offer you these different alternatives because you are learning SQL. It is worth learning how all three work. In the future, you should find each of these mechanisms (left join, not in, and not exists) useful.

It sounds like you mean SQL JOINS.
SQL Joins Tutorial:
Lets say you want to Query your collection like so:
SELECT collection.userId, HWC.stuff
FROM collection
INNER JOIN HWC ON collection.col = HWC.id
This will pick userId from collection, and stuff from HWC, where these ID's have relations.
Hope I helped, good luck!

Related

mysql - How to perform joining of of two junctional tables in case where one of them has foreign key of another?

I am new in the database design, so I am still learning, so sorry for maybe inappropriate terms using, so I will try to explain on common language what problem I have. I learned how to join two tables (getting result) over junction table which is in between, but I got into problem when I want to join one "regular" table and one junction table over another junction table.
I have a relational database which has tables and relations between them like this:
I know how to join hgs_transliterations, hgs_gardiners, hgs_meanings, hgs_word_types using hgs_translations, but what I don't know how to do is how to join those 4 tables and the hgs_references table.
This is my code for joining lower 4 tables:
SELECT hgs_transliterations.transliteration, hgs_gardiners.gardiners_code, hgs_meanings.meaning, hgs_word_types.word_type
FROM hgs_translations
JOIN hgs_transliterations ON hgs_translations.transliteration_id = hgs_transliterations.id
JOIN hgs_gardiners ON hgs_translations.gardiners_id = hgs_gardiners.id
JOIN hgs_meanings ON hgs_translations.meaning_id = hgs_meanings.id
JOIN hgs_word_types ON hgs_translations.word_type_id = hgs_word_types.id
I read some tutorials on this subject which mention AS, INNER JOIN, OUTER JOIN, but I didn't quite understand terminology and how I can use this to create what I need. Sorry for maybe basic questions, but as I say, I am just a beginner and I am trying to understand something deeply so I can use it appropriately. Thank you in advance.
P.S. If someone thinks that this is not good database design (design of relations between tables), I would like to hear that.
Just add two more joins:
SELECT hgs_transliterations.transliteration, hgs_gardiners.gardiners_code,
hgs_meanings.meaning, hgs_word_types.word_type,
hgs_references.reference
FROM hgs_translations
JOIN hgs_transliterations ON hgs_translations.transliteration_id = hgs_transliterations.id
JOIN hgs_gardiners ON hgs_translations.gardiners_id = hgs_gardiners.id
JOIN hgs_meanings ON hgs_translations.meaning_id = hgs_meanings.id
JOIN hgs_word_types ON hgs_translations.word_type_id = hgs_word_types.id
JOIN junc_translation_reference ON junc_translation_reference.translation_id = hgs_translations.id
JOIN hgs_references ON hgs_references.id = junc_translation_reference.reference_id

Finding which of an array of IDs has no record with a single query

I'm generating prepared statements with PHP PDO to pull in information from two tables based on an array of IDs.
Then I realized that if an ID passed had no record I wouldn't know.
I'm locating records with
SELECT
r.`DEANumber`,
TRIM(r.`ActivityCode`) AS ActivityCode,
TRIM(r.`ActivitySubCode`) as ActivitySubCode,
// other fields...
a.Activity
FROM
`registrants` r,
`activities` a
WHERE r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
AND a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
But I am having trouble figuring out the negative join that says which of the IDs has no record.
If two tables were involved I think I could do it like this
SELECT
r.DEAnumber
FROM registrant r
LEFT JOIN registrant2 r2 ON r.DEAnumber = r2.DEAnumber
WHERE r2.DEAnumber IS NULL
But I'm stumped as to how to use the array of IDs here. Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go...
Obviously I could iterate over the array and track which queries had not result but it seems like such a manual and wasteful way to go.
What could be a real waste is spending time solving this non-existent "problem".
Yes, you could iterate. Either manually, or using a syntax sugar like array_diff() in PHP.
I suggest that instead of making your query more complex (means heavier to support) for little gain, you just move on.
As old man Knuth once said 'premature optimization is the root of all evil'.
The only thing I could think of a help from PDO is a fetch mode that will put IDs as keys for the returned array, and thus you'll be able to make it without [explicitly written] loop, like
$stmt->execute($ids);
$data = $stmt->fetchAll(PDO::FETCH_UNIQUE);
$notFound = array_diff($ids, array_keys($data));
Yet a manual loop would have taken only two extra lines, which is, honestly, not that a big deal to talk about.
You are on the right track - a left join that filters out matches will give you the missing joins. You just need to move all conditions on the left-joined table up into the join.
If you leave the conditions on the joined table in the where clause you effectively cause an inner join, because the where clause is executed on the rows after the join is made, which is too late if there was no join in the first place.
Change the query to use proper join syntax, specifying a left join, with the conditions on activity moved to the join'n on clause:
SELECT
r.DEANumber,
TRIM(r.ActivityCode) AS ActivityCode,
TRIM(r.ActivitySubCode) as ActivitySubCode,
// other fields...
a.Activity
FROM registrants r
LEFT JOIN activities a ON a.Code = ActivityCode
AND a.Subcode = ActivitySubCode
WHERE r.DEAnumber IN (?,?,?,?,?,?,?,?)
In your app code, if Activity is null then you know there was no activity for that id.
This won't affect performance much, other than to return (potentially) more rows.
To just select all registrants without activities:
select r.DEAnumber
from registrants r
left join activities a on a.Code = ActivityCode
and a.Subcode = ActivitySubCode
where r.`DEAnumber` IN ( ?,?,?,?,?,?,?,? )
and a.Code is null

SQL query to select based on many-to-many relationship

This is really a two-part question, but in order not to mix things up, I'll divide into two actual questions. This one is about creating the correct SQL statement for selecting a row based on values in a many-to-many related table:
Now, the question is: what is the absolute simplest way of getting all resources where e.g metadata.category = subject AND where that category's corresponding metadata.value ='introduction'?
I'm sure this could be done in a lot of different ways, but I'm a novice in SQL, so please provide the simplest way possible... (If you could describe briefly what the statement means in plain English that would be great too. I have looked at introductions to SQL, but none of those I have found (for beginners) go into these many-to-many selections.)
The easiest way is to use the EXISTS clause. I'm more familiar with MSSQL but this should be close
SELECT *
FROM resources r
WHERE EXISTS (
SELECT *
FROM metadata_resources mr
INNER JOIN metadata m ON (mr.metadata_id = m.id)
WHERE mr.resource_id = r.id AND m.category = 'subject' AND m.value = 'introduction'
)
Translated into english it's 'return me all records where this subquery returns one or more rows, without returning the data for those rows'. This sub query is correlated to the outer query by the predicate mr.resource_id = r.id which uses the outer row as the predicate value.
I'm sure you can google around for more examples of the EXIST statement

Getting stuck doing a complicated SQL query for patent research purposes

I am trying to gather data for a research study for my university thesis. Unfortunately I am not a computer science or programming expert and do not have any SQL experience.
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span". The data I want to extract is stored on a database called PATSTAT (where I have a 1 month trial) and is using - dont be surprised SQL.
I tried a lot of queries but all the time I am getting different syntax errors.
This is how the interface looks like:
http://www10.pic-upload.de/07.07.13/7u5bqf7jsow.png
I think I have a really good understanding of what (also from an SQL POV) needs to be done but I cannot execute it.
My idea: As result I want the names of the companies (with reference to the company entered below)
SELECT person_name from tls206_person table
Now because I need a criteria like
WHERE nb_applicants > 1 from tls201_appln table
I need to join these two tables tls206 and tls201. I did read some brief introduction guide on SQL (provided by european patent office) and because both tables have no common "reference key" we need to use the table tls207_pers_appln als "intermediate" so to speak. Now thats the point where I am getting stuck. I tried the following but this is not working
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id= tls207_pers_appln.person_id
INNER JOIN tls207_pers_appln ON tls201_appln.appln_id=tls201_appln.appln_id
WHERE person_name = "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
I get the following error: "0:37:11 [SELECT - 0 row(s), 0 secs] [Error Code: 1064, SQL State: 0] Not unique table/alias: 'tls207_pers_appln'"
I think for just 4 Hours SQL my approach is not to bad but I really need some guidance on how to proceed because I am not making any progress.
Ideally I would like to count (for every company) and for every row respectively how many "nb_applicants" were found.
If you need further information for giving me guidance, just let me know.
Looking forward to your answers.
Best regards
Kendels
another way of doing the same thing, which you might find easier to understand (if you are new to sql it is impressive you have got so far), is:
SELECT tls206_person.person_name, tls201_appln.nb_applicants
FROM tls206_person, tls207_pers_appln, tls201_appln
WHERE tls206_person.person_id = tls207_pers_appln.person_id
AND tls201_appln.appln_id = tls201_appln.appln_id
AND tls206_person.person_name LIKE "%Samsung%"
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
(it's equivalent to the other answer, but instead of trying to understand the JOIN syntax, you just write out all the logic and SQL is smart enough to make it work - this is often called the "new" or "ISO" inner join syntax, if you want to google for more info) (although it is possible, i suppose, that this newer syntax isn't supported by the database you are using).
You are referencing the table tls201_appln, but it is not in the from clause. I am guessing that the second reference to tls207_pers_appln should be to the other table:
SELECT person_name, tls201_appln.nb_applicants
FROM tls206_person
INNER JOIN tls207_pers_appln ON tls206_person.person_id = tls207_pers_appln.person_id
INNER JOIN tls201_appln ON tls201_appln.appln_id = tls207_pers_appln.appln_id
WHERE person_name like '%Samsung%"'
AND tls201_appln.nb_applicants > 1
AND tls201_appln.ipr_type = "PI"
For my thesis I need to do a SQL query answering the question: "Give me all patents of a company X where there is more than one applicant (other company) in a specific time span".
Let me rephrase that for you :
SELECT * FROM patents p -- : "Give me all patents
WHERE p.company = 'X' -- of a company X
AND EXISTS ( -- where there is
SELECT *
FROM applicants x1
WHERE x1.patent_id = p.patent_id
AND x1.company <> 'X' -- another company:: exclude ourselves
AND x1.application_date >= $begin_date -- in a specific time span
AND x1.application_date < $end_date
-- more than one applicant (other company)
-- To avoid aggregation: Just repeat the same subquery
AND EXISTS ( -- where there is
SELECT *
FROM applicants x2
WHERE x2.patent_id = p.patent_id
AND x2.company <> 'X' -- another company:: exclude ourselves
AND x2.company <> x1.company -- :: exclude other other company, too
AND x2.application_date >= $begin_date -- in a specific time span
AND x2.application_date < $end_date
)
)
;
[Note: Since the OP did not give any table definitions, I had to invent these]
This is not the perfect query, but it does express your intentions. Given sane keys/indexes it will perform reasonably, too.

MySQL -- joining then joining then joining again

MySQL setup: step by step.
programs -> linked to --> speakers (by program_id)
At this point, it's easy for me to query all the data:
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
Nice and easy.
The trick for me is this. My speakers table is also linked to a third table, "books." So in the "speakers" table, I have "book_id" and in the "books" table, the book_id is linked to a name.
I've tried this (including a WHERE you'll notice):
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
No results.
My questions:
What am I doing wrong?
What's the most efficient way to make this query?
Basically, I want to get back all the programs data and the books data, but instead of the book_id, I need it to come back as the book name (from the 3rd table).
Thanks in advance for your help.
UPDATE:
(rather than opening a brand new question)
The left join worked for me. However, I have a new problem. Multiple books can be assigned to a single speaker.
Using the left join, returns two rows!! What do I need to add to return only a single row, but separate the two books.
is there any chance that the books table doesn't have any matching columns for speakers.book_id?
Try using a left join which will still return the program/speaker combinations, even if there are no matches in books.
SELECT *
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
LIMIT 5
Btw, could you post the table schemas for all tables involved, and exactly what output (or reasonable representation) you'd expect to get?
Edit: Response to op author comment
you can use group by and group_concat to put all the books on one row.
e.g.
SELECT speakers.speaker_id,
speakers.speaker_name,
programs.program_id,
programs.program_name,
group_concat(books.book_name)
FROM programs
JOIN speakers on programs.program_id = speakers.program_id
LEFT JOIN books on speakers.book_id = books.book_id
WHERE programs.category_id = 1
GROUP BY speakers.id
LIMIT 5
Note: since I don't know the exact column names, these may be off
That's typically efficient. There is some kind of assumption you are making that isn't true. Do your speakers have books assigned? If they don't that last JOIN should be a LEFT JOIN.
This kind of query is typically pretty efficient, since you almost certainly have primary keys as indexes. The main issue would be whether your indexes are covering (which is more likely to occur if you don't use SELECT *, but instead select only the columns you need).