I'm not seeing a clean way to write this query without subselects which I avoid because they are generally not portable, and harder to read and debug than individual queries.
Table A has exactly 2 foreign keys to table B, which are always different, but always defined. Sort of like:
MARRIAGE_TABLE
M_KEY
LAST_NAME
PERSON_HUSBAND_FK
PERSON_WIFE_FK
PERSON_TABLE
PERSON_KEY
SEX
FIRST_NAME
The PERSON_HUSBAND_FK will always point at a SEX=MALE, and the WIFE_FK will always point at a female. There will always be one of each. (this is in no way a statement on same-sex marriage BTW I'm all for it)..
I want to create a result like:
MARRIAGE HUSBAND WIFE
-------- ------- ----
SMITH TOM KATHY
JONES BILL EVE
My current approach is to get all records from the MARRIAGE TABLE and store them in a hash. Then I augment the hash with names {wife_name} and {husband_name} using 2 more queries using the husband and wife FK's. Then I format and print the hash. It works, but I'm not wild about 3 queries per row.
I'm not sure I ever encountered a table having >1 FK to another table. I've done years of table-design, but I'm not really sure this design even meets normalization. It seems like no, to me. Like they created a many-many without an intermediate table; a cheat?
Just join table PERSON_TABLE twice:
SELECT m.last_name AS marriage, p1.first_name AS husband, p2.first_name AS wife
FROM marriage_table m
INNER JOIN person_table p1 ON p1.person_key = m.person_husband_fk
INNER JOIN person_table p2 ON p2.person_key = m.person_wife_fk
Related
I have an SQL question in which I am struggling to understand and find relevant resources to help me.
The question is:
"Write an SQL query to identify data inconsistencies between two tables."
I need to compare the following tables of data:
AssetManager
AssetManagerName
John Doe
Joe Smith
Dave Grey
Lisa Sparks
Kate Green
Trip
PropertyCode
AssetManagerName
Date
P001
John Doe
2022-01-22
P001
Joe Smith
2022-01-19
P002
Dave Grey
2022-02-25
P002
John Doe
2022-04-23
P003
Kate Greens
2022-02-25
P004
Joe Smith
2022-05-29
P002
Dave Grey
2022-01-25
P001
John Doe
2022-02-24
Image translated to text from Original Source
What are the inconsistencies in this case? Is it maybe that "Kate Green" is in the AssetManager table, and you have "Kate Greens" in the Trip table? That's the only thing I can see.
What MySQL commands could I use that would help me to achieve this query?
In SQL, when we talk about inconsistencies, we are generally referring to data that would not correctly translate into a normalised form, when we try to join between tables this would result in missing data or orphaned rows. Commonly inconsistencies arise when there is no referential constraints in a schema to maintain consistency. In such cases simple spelling mistakes can easily creep into the dataset, but entirely wrong values could also be used. In this case, If there is a table that represents all the possible Asset Managers, then we would expect that in other tables that refer to Asset Managers that only values from the Asset Managers table would be used, spelling mistakes and entirely missing names will be treated the same.
In the Trip Table we can identify inconsistency with the AssetManager table by looking for any records in Trip that do not have a match in AssetManager using the AssetManagerName column.
One simple way to do this is to use an OUTER JOIN and to exclude all the matches:
SELECT Trip.*
FROM Trip
LEFT OUTER JOIN AssetManager ON Trip.AssetManagerName = AssetManager.AssetManagerName
WHERE AssetManager.AssetManagerName IS NULL
This returns the following result: (See db-fiddle)
PropertyCode
AssetManagerName
Date
P003
Kate Greens
2022-02-25
The LEFT OUTER JOIN (or LEFT JOIN) will return all the rows from the Trip table, even if there is no corresponding match in the AssetManager table on the AssetManagerName column. For the rows that do not match, all the values for the AssetManager table in the result set will be NULL.
We can then use a WHERE clause to exclude all the matches data records and only return those records that DO NOT MATCH, we do this by only allowing where AssetManager.AssetManagerName has a null value.
There are no records in Trip with a legitimate null value in the AssetManagerName, the null only exists in the recordset at a result of the LEFT OUTER JOIN evaluation.
You could also use a NOT EXISTS Clause, this syntax is sometimes easier to read and identify the intent, we want to find the records that DO NOT MATCH. But specifically in MySQL it's execution plan generally less efficient than the LEFT OUTER JOIN expression above.
SELECT Trip.*
FROM Trip
WHERE NOT EXISTS (
SELECT AssetManager.AssetManagerName
FROM AssetManager
WHERE AssetManager.AssetManagerName = Trip.AssetManagerName
)
Another variation of this is to use NOT IN. For this query we first evaluate a list of possible values for AssetManagerName and use that to identify the values that do not match.
This is helpful when there might be some legitimate null values in either of the tables for AssetManagerName as IN handles NULL values differently to EXISTS
SELECT Trip.*
FROM Trip
WHERE Trip.AssetManagerName NOT IN (
SELECT AssetManager.AssetManagerName
FROM AssetManager
)
For an interesting analysis of these options and performace considerations have a read over this article:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL
I have a database with a table called "Relations" that looks as follows:
Relations (PersonId1, PersonId2, RelationTypeId)
The primary key is (PersonId1, PersonId2, RelationTypeId)
There are two other tables, referencing to the foreign keys but that does not really matter here.
So a relation is defined for example (Mary, Andre, 3) where 3 would be referenced to an other table and would mean for example ("a friend").
My requirement is to see all friends of a specifig person but also the friends of the persons friends, so not only the first layer but also the second.
For example this would be the relation table
Andre Mary 3
Mary Carl 3
Chris James 3 (irrelevent in our case)
So i want a query where I have the PersonId of Andre and the RelationTypeId. The result should be this:
Andre Mary 3
Mary Carl 3
In my understanding it is not possible to build a query that would give this result, but i am not sure, that is why i want to know it.
Hope you understand my question, thanks in advance.
Below query will return the list of friends of person1 and their friends.
select
distinct personId2
from
relations
where
personId1 in (select distinct personId2 from relations where personId1 = <person_name>)
or personId1 = <person_name>
It’s a recursive CTE (common table expression). It’ll process query results multiple times because the main SELECT query calls the CTE part recursively. CTE is a part of a SELECT query (starts with WITH). This code will return the data subsets you’re looking for.
I use it to boost data access efficiency, when I need to, e.g., select, paginate, or display page rows linked with a specific page, etc. It’s works in actual for MySQL 8.
Say I have two tables:
Table: customers
Fields: customer_id, first_name, last_name
Table: customer_cars
Fields: car_id, customer_id, car_brand, car_active
Say I am trying to write a query that shows all customers with a first name of "Karl," and the brands of the ** active ** cars they have. Not all customers will have an active car. Some cars are active, some are inactive.
Please keep in mind that this is a representative example that I just made up, for sake of clarity and simplicity. Please don't reply with questions about why we would do it this way, that I could use table aliases, how it's possible to have an inactive car, or that my field names could be better written. It's a fake example that is intended be very simple in order to illustrate the point. It has a structure and issue that I encounter all the time.
It seems like this would be best done with a LEFT JOIN and subquery.
SELECT
customer_id,
first_name,
last_name,
car_brand
FROM
customers
LEFT JOIN
(SELECT
customer_id,
car_brand
FROM
customer_cars
INNER JOIN customers ON customer_cars.customer_id = customers.customer_id
WHERE
first_name = 'Karl' AND
customer_cars.car_active = '1') car_query ON customers.customer_id = car_query.customer_id
WHERE
first_name = 'Karl'
The results might look like this:
first_name last_name car_brand
Karl Johnson Dodge
Karl Johnson Jeep
Karl Smith NULL
Karl Davis Chrysler
Notice the duplication of 'Karl' in both WHERE clauses, and the INNER JOIN in the subquery that is the same table in the outer query. My understanding of mySQL is that this duplication is necessary because it processes the subquery first before processing the outer query. Therefore, the subquery must be properly limited so it doesn't scan all records, then it tries to match on the resulting records.
I am aware that removing the car_active = '1' condition would change things, but this is a requirement.
I am wondering if a query like this can be done in a different way that only causes the criteria and joins to be entered once. Is there a recommended way to prioritize the outer query first, then match to the inner one?
I am aware that two different queries could be written (find all records with Karl, then do another that finds matching cars). However, this would cause multiple connections to the database (one for every record returned) and would be very taxing and inefficient.
I am also aware of correlating subqueries, but from my understanding and experience, this is for returning one field per customer (e.g., an aggregate field such as how much money Karl spent) within the fieldset. I am looking for a similar approach as this, but where one customer could be matched to multiple other records like in the sample output above.
In your response, if you have a recommended query structure that solves this problem, it would be really helpful if you could write a clear example instead of just describing it. I really appreciate your time!
First, is a simple and straight query not enough?
Say I am trying to write a query that shows all customers with a first
name of "Karl," and the brands of the ** active ** cars they have. Not
all customers will have an active car. Some cars are active, some are
inactive.
Following this requirement, I can just do something like:
SELECT C.first_name
, C.last_name
, CC.car_brand
FROM customers C
LEFT JOIN cutomer_cars CC ON CC.customer_id = C.customer_id
AND car_active = 1
WHERE C.first_name = 'Karl'
Take a look at the SQL Fiddle sample.
I am working with learning SQL, I have taken the basics course on pluralsight, and now I am using MySQL through Treehouse, with dummy databases they've set up, through the MySQL server. Once my training is complete I will be using SQLServer daily at work.
I ran into a two-part challenge yesterday that I had some trouble with.
The first question in the challenge was:
"We have a 'movies' table with a 'title' and 'genre_id' column and a
'genres' table which has an 'id' and 'name' column. Use an INNER JOIN
to join the 'movies' and 'genres' tables together only selecting the
movie 'title' first and the genre 'name' second."
Understanding how to properly set up JOINS has been a little confusing for me, because the concepts seem simple but like in cooking, execution is everything ---and I'm doing it wrong. I was able to figure this one out after some trial and error, work, and rewatching the Treehouse explanation a few times; here is how I solved the first question, with a Treehouse-accepted answer:
SELECT movies.title, genres.name FROM movies INNER JOIN genres ON movies.genre_id = genres.id;
--BUT--
The next question of the challenge I have not been so successful with, and I'm not sure where I'm going wrong. I would really like to get better with JOINS, and picking the brains of all you smartypantses is the best way I can think of to get an explanation for this specific (and I'm sure, pitifully simple for you guys) problem. Thanks for your help, here's where I'm stumped:
"Like before, bring back the movie 'title' and genre 'name' but use
the correct OUTER JOIN to bring back all movies, regardless of whether
the 'genre_id' is set or not."
This is the closest (?) solution that I've come up with, but I'm clearly doing something (maybe a lot) wrong here:
SELECT movies.title, genres.name FROM movies LEFT OUTER JOIN genres ON genres.id;
I had initially tried this (below) but when it didn't work, I decided to cut out the last portion of the statement, since it's mentioned in the requirement criteria that I need a dataset that doesn't care if genre_id is set in the movies table or not:
SELECT movies.title, genres.name FROM movies LEFT OUTER JOIN genres ON movies.genre_id = genres.id;
I know this is total noob stuff, but like I said, I'm learning, and the questions I researched on Stack and on the Internet at large were not necessarily geared for the same problem. I am very grateful to have your expertise and help to draw on. Thank you for taking the time to read this and help out if you choose to do so!
Your solution is correct:
SELECT movies.title, genres.name
FROM movies
LEFT OUTER JOIN genres ON movies.genre_id = genres.id
This is my interpretation:
When you tell "Left join" or "left outer join", in fact,
it's not that "You don't care if genre_id is set in the movies table or not",
but "You want all genres of each movie to be shown, however, you don't care if genre_id is not set in the movies table for some records; just show the movie in these cases [and show 'genre = NULL' for those records]"
generally, in "left join", you want:
all the records of the left table, with their corresponding records in the other table, if any. Otherwise with NULL.
In your example, these two sets of records will be shown:
1- All the movies which have been set to a genre
(give movie.title, Genres.name)
2- All other movies [which do not have a genre, i.e., genre_id = NULL]
(give movie.title, NULL)
Example (with left join):
Title, Genre
--------------
Movie1, Comedy
Movie1, Dramma
Movie1, Family
Movie2, NULL
Movie3, Comedy
Movie3, Dramma
Movie4, Comedy
Movie5, NULL
Example (with inner join):
Title, Genre
--------------
Movie1, Comedy
Movie1, Dramma
Movie1, Family
Movie3, Comedy
Movie3, Dramma
Movie4, Comedy
Your'e specific question was already answered, though:
I'd like to add another perspective about JOIN, that i think will help you understand how to use it in the future (after that, I also recommend you follow this link: SQL JOINS ).
This perspective is from the DB eyes, which is "dumb" and can't guess what you really want it to do for you.
I help it helps and won't confuse you too match:
Lets first understand what a join does (without using any SQL script), and than we'll understand better how to use it.
Say this is a movie list:
Armageddon
Batman
Cinderella
and a list of genres:
Action
Fantasy
Western
When you join both tables, the DB creates a new tables, that for each row in movies table, you'll get all possible rows in genres table, like this:
Armageddon <-> Action
Armageddon <-> Fantasy
Armageddon <-> Western
Batman <-> Action
Batman <-> Fantasy
Batman <-> Western
Cinderella <-> Action
Cinderella <-> Fantasy
Cinderella <-> Western
You can also see that the NEW table row number is 3*3 ([table 1 row number] multiply [table 2 row number]). Can you explain yourself why? If so, lets continue to our second step...
In your DB, you keep track of which movie is which genre (identifying genre by it's id), so lets talk about NEW tables, that look like this and have info about movies genre:
1 - Armageddon - 1
2 - Armageddon - 2
4 - Batman - 1
5 - Batman - 2
6 - Batman - 3
7 - Cinderella - 2
And the genre:
1 - Action
2 - Fantasy
3 - Western
As we've just explained, joining both tables will get you... 18 rows (6*3=18. why? because for each row in movies table, you'll get all possible rows from genres table). I won't write those 18 rows, I hope you get the point...
Each time you call a join (doesn't matter which kind of join: LEFT/RIGHT/OUTER/INNER), the DB creates a new table with all passible options([table 1 row number] multiply [table 2 row number]). Now, you're probably thinking: How does the DB erase the rows I don't want?
First, you define an ON condition. You tell your DB: "please mark for me all rows that meet my condition: movies.genre_id = genres.id (But don't drop any unmarked rows yet!!!)".
Second, you tell your DB which kind of rows you want to drop (or edit!!!): now comes the JOIN kind, which is a bit tricky.
INNER JOIN is easy to understand- just tell the DB: "drop all rows that don't meet my condition: movies.genre_id = genres.id" (and of course show me the updated table, after you've dropped these rows I don't need).
LEFT/RIGHT JOINs are more complicated. Lets start for example with LEFT JOIN. You're telling your DB: "well, in case a row doesn't match my condition: movies.genre_id = genres.id, mark the RIGHT part of my row (meaning, the columns that represent my 2nd table) as null, AND LEAVE THE ROW.
That way, I know you this row in table1, doesn't have a matching row in table2.
In RIGHT JOIN, it's the opposite: you tell the DB, that if your condition isn't met, mark the LEFT side with null.
FULL JOIN tells your DB: "well, from a row that doesn't meet my condition, make 2 rows: 1 that has it's RIGHT part marked as null, and a second that has it's LEFT part marked as null" (this is a bit complicated for understanding for why the hack you'll need that, and you'll hardly need to use FULL JOIN in your first steps, so drop it for now).
In conclusion, my advice for you when you design your JOIN query
first, understand what YOU want to get, see illustration in answer: SQL JOINS.
Then, comes the part when you need to explain to you DB what it should do:
first, tell it which rows it should mark,
than, tell it which rows it should drop/edit.
I've got 3 tables that are something like this (simplified here ofc):
users
user_id
user_name
info
info_id
user_id
rate
contacts
contact_id
user_id
contact_data
users has a one-to-one relationship with info, although info doesn't always have a related entry.
users has a one-to-many relationship with contacts, although contacts doesn't always have related entries.
I know I can grab the proper 'users' + 'info' with a left join, is there a way to get all the data I want at once?
For example, one returned record might be:
user_id: 5
user_name: tom
info_id: 1
rate: 25.00
contact_id: 7
contact_data: 555-1212
contact_id: 8
contact_data: 555-1315
contact_id: 9
contact_data: 555-5511
Is this possible with a single query? Or must I use multiple?
It is possible to do what you're asking in one query, but you'd either need a variable number of columns which is evil because SQL isn't designed for that, or you'd have to have a fixed number of columns, which is even more evil because there is no sensible fixed number of columns you could choose.
I'd suggest using one of two alternatives:
1. Return one row for each contact data, repeating the data in other columns:
5 tom 1 25.00 7 555-1212
5 tom 1 25.00 8 555-1315
5 tom 1 25.00 9 555-5511
The problem with this of course is that redundant data is normally a bad idea, but if you don't have too much redundant data it will be OK. Use your judgement here.
2. Use two queries. This means a slightly longer turnaround time, but less data to transfer.
In most cases I'd prefer the second solution.
You should try to avoid making a large number of queries inside a loop. This can almost always be rewritten to a single query. But if using two queries is the most natural way to solve your problem, just use two queries. Don't try to cram all the data you need into a single query just for the sake of reducing the number of queries.
Each row of result must have the same columns, so you can't aggregate multiple rows of contact not having the other columns as well.
Hopefully, this query would achieve what you need:
SELECT
u.user_id as user_id,
u.user_name as user_name,
i.info_id as info_id,
i.rate as rate,
c.contact_id as contact_id,
c.contact_data as contact_data
FROM users as u
LEFT JOIN info as i ON i.user_id = u.user_id
LEFT JOIN contacts as c ON c.user_id = u.user_id