Some books have more than one author, I have a table with with book_id and author_id1, author_id2, author_id3, and author_id4. I have a table with author_id and author_name.
How can I join these two tables and the main table with book_id to get the authors names together in a data row from a sql query join.
Example:
SELECT book.book_id, book.title, author.author, book.location
FROM books AS b JOIN book_authors AS ba ON b.book_id = ba.book_id JOIN authors AS a ON REGEX ba.authors_id$ = a.authors_id
Not sure about REGEX ($) use in sql Should display id, title, authors, location
How do I get all authors_id# to match authors_id ( notice one has number at end other does not)?
update: So, I would like to get book_authors.authors_id1 to match authors.authors_id, book_authors.authors_id2 to match authors.authors_id, book_authors.authors_id3 to match authors.authors_id, book_authors.authors_id4 to match authors.authors_id and return all the matching authors in list.
...
# merge book_authors and authors into one dataframe
ba_df.rename(columns= {'authors_id1': 'authors_id'}, inplace=True)
ba_df['authors_id'] = ba_df['authors_id'].map(a_df.set_index('authors_id')['authors_name'])
ba_df.rename(columns = {'authors_id':'authors_name1', 'authors_id2': 'authors_id'}, inplace = True)
ba_df['authors_id'] = ba_df['authors_id'].map(a_df.set_index('authors_id')['authors_name'])
ba_df.rename(columns = {'authors_id':'authors_name2', 'authors_id3': 'authors_id'}, inplace = True)
ba_df['authors_id'] = ba_df['authors_id'].map(a_df.set_index('authors_id')['authors_name'])
ba_df.rename(columns = {'authors_id':'authors_name3', 'authors_id4': 'authors_id'}, inplace = True)
ba_df['authors_id'] = ba_df['authors_id'].map(a_df.set_index('authors_id')['authors_name'])
ba_df.rename(columns = {'authors_id':'authors_name4'}, inplace = True)
...
Was working through another dataframe and got the idea to use map after rename to set_index the same on both dataframes. Now, the map lines can work, just have to rename the common column , so as not to overwrite, in this case it was authors_id, replaced with authors_name1, 2, 3 & 4, which equates to the authors_id1, 2, 3 & 4. And yes, it is not pure sql, but it works for python, which is where I had the problem.
Related
I opened up this new question because I'm not sure the user's request and wording matched each other: pandas left join where right is null on multiple columns
What is the equivalent pandas code to this SQL? Contextually we're finding entries from a column in table_y that aren't in table_x with respect to several columns.
SELECT
table_x.column,
table_x.column2,
table_x.column3,
table_y.column,
table_y.column2,
table_y.column3,
FROM table_x
LEFT JOIN table_y
ON table_x.column = table_y.column
ON table_x.column2 = table_y.column2
WHERE
table_y.column2 is NULL
Is this it?
columns_join = ['column', 'column2']
data_y = data_y.set_index(columns_join)
data_x = data_x.set_index(columns_join)
data_diff = pandas.concat([data_x, data_y]).drop_duplicates(keep=False) # any row not in both
# Select the diff representative from each dataset - in case datasets are too large
x1 = data_x[data_x.index.isin(data_diff.index)]
x2 = data_y[data_y.index.isin(data_diff.index)]
# Perform an outer join with the joined indices from each set,
# then remove the entries only contributed from table_x
data_compare = x1.merge(x2, how = 'outer', indicator=True, left_index=True, right_index=True)
data_compare_final = (
data_compare
.query('_merge == left_join')
.drop('_merge', axis=1)
)
I don't think that's equivalent because we only removed entries from table_x that aren't in the join based on multiple columns. I think we have to continue and compare the column against table_y.
data_compare = data_compare.reset_index().set_index('column2')
data_y = data_y.reset_index().set_index('column2')
mask_column2 = data_y.index.isin(data_compare.index)
result = data_y[~mask_column2]
Without test data it is a bit difficult to be sure that this helps but you can try:
# Only if columns to join on in the right dataframe have the same name as columns in left
table_y[['col_join_1', 'col_join_2']] = table_y[['column', 'column2']] # Else this is not needed
# Merge left (LEFT JOIN)
table_merged = table_x.merge(
table_y,
how='left',
left_on=['column', 'column2'],
right_on=['col_join_1', 'col_join_2'],
suffixes=['_x', '_y']
)
# Filter dataframe
table_merged = table_merged.loc[
table_merged.column2_y.isna(),
['column_x', 'column2_x', 'column3_x', 'column_y', 'column2_y', 'column3_y']
]
I found an equivalent that amounts to setting the index to the join column(s), union'ing the tables, dropping the duplicates, and performing a cross join between the contributions to the union. From there, one can select
left_only for this equivalent SQL
SELECT
table_x.*,
table_y.*
FROM table_x
LEFT JOIN table_y
ON table_x.column = table_y.column
ON table_x.column2 = table_y.column2
WHERE
table_y.column2 is NULL
right_only for this equivalent SQL
SELECT
table_x.*,
table_y.*
FROM table_y
LEFT JOIN table_x
ON table_y.column = table_x.column
ON table_y.column2 = table_x.column2
WHERE
table_x.column2 is NULL
def create_dataframe_joined_diffs(dataframe_prod, dataframe_new, columns_join):
"""
Set the indices to the columns_key
Concat the dataframes and remove duplicates
Select the diff representative from each dataset
Reset the indices and perform an outer join
Pseudo-SQL:
SELECT
UNIQUE(*)
FROM dataframe_prod
OUTER JOIN dataframe_new
ON columns_join
"""
data_new = dataframe_new.set_index(columns_join)
data_prod = dataframe_prod.set_index(columns_join)
# Get any row not in both (may be removing too many)
data_diff = pandas.concat([data_prod, data_new]).drop_duplicates(keep=False) # any row not in both
# Select the diff representative from each dataset
x1 = data_prod[data_prod.index.isin(data_diff.index)]
x2 = data_new[data_new.index.isin(data_diff.index)]
# Perform an outer join and keep the joined indices from each set
# Sort the columns to make them easier to compare
data_compare = x1.merge(x2, how = 'outer', indicator=True, left_index=True, right_index=True).sort_index(axis=1)
return data_compare
mask_left = dataframe_compare['_merge'] == 'left_only'
mask_right = dataframe_compare['_merge'] == 'right_only'
I am having issues pulling in null values in my query. I am looking for patients who have a specific document name in their chart but also want to show patients who do not have this specific document name as well. Right now my code is only pulling in the patients with the document name History and Physical (Transcription) but I need to see Null values as well. Below is my code:
snip of code
SELECT CV3ClientVisit.ClientDisplayName, CV3ClientVisit.CurrentLocation, CV3ClientVisit.IDCode, CV3ClientVisit.VisitIDCode, CV3ClientVisit.VisitStatus, CV3ClientVisit.TypeCode, CV3ClientDocumentCUR.DocumentName
FROM CV3ClientVisit INNER JOIN
CV3ClientDocumentCUR ON CV3ClientVisit.GUID = CV3ClientDocumentCUR.ClientVisitGUID
WHERE (CV3ClientVisit.VisitStatus = 'ADM') AND (CV3ClientVisit.TypeCode = 'INPATIENT ADMIT') AND (CV3ClientDocumentCUR.DocumentName = 'History & Physical (transcription)' OR CV3ClientDocumentCUR.DocumentName IS NULL )
Use a LEFT JOIN with the condition in the ON clause:
SELECT cv.ClientDisplayName, cv.CurrentLocation, cv.IDCode,
cv.VisitIDCode, cv.VisitStatus, cv.TypeCode, cd.DocumentName
FROM CV3ClientVisit cv LEFT JOIN
CV3ClientDocumentCUR cd
ON cv.GUID = cd.ClientVisitGUID AND
cd.DocumentName = 'History & Physical (transcription)'
WHERE cv.VisitStatus = 'ADM' AND
cv.TypeCode = 'INPATIENT ADMIT' ;
I also added table aliases to simplify the query.
I have two django-models
class ModelA(models.Model):
title = models.CharField(..., db_column='title')
text_a = models.CharField(..., db_column='text_a')
other_column = models.CharField(/*...*/ db_column='other_column_a')
class ModelB(models.Model):
title = models.CharField(..., db_column='title')
text_a = models.CharField(..., db_column='text_b')
other_column = None
Then I want to merge the two querysets of this models using union
ModelA.objects.all().union(ModelB.objects.all())
But in query I see
(SELECT
`model_a`.`title`,
`model_a`.`text_a`,
`model_a`.`other_column`
FROM `model_a`)
UNION
(SELECT
`model_b`.`title`,
`model_b`.`text_b`
FROM `model_b`)
Of course I got the exception The used SELECT statements have a different number of columns.
How to create the aliases and fake columns to use union-query?
You can annotate your last column to make up for column number mismatch.
a = ModelA.objects.values_list('text_a', 'title', 'other_column')
b = ModelB.objects.values_list('text_a', 'title')
.annotate(other_column=Value("Placeholder", CharField()))
# for a list of tuples
a.union(b)
# or if you want list of dict
# (this has to be the values of the base query, in this case a)
a.union(b).values('text_a', 'title', 'other_column')
In SQL query, we can use NULL to define the remaining columns/aliases
(SELECT
`model_a`.`title`,
`model_a`.`text_a`,
`model_a`.`other_column`
FROM `model_a`)
UNION
(SELECT
`model_b`.`title`,
`model_b`.`text_b`,
NULL
FROM `model_b`)
In Django, union operations needs to have same columns, so with values_list you can use those specific columns only like this:
qsa = ModelA.objects.all().values('text_a', 'title')
qsb = ModelB.objects.all().values('text_a', 'title')
qsa.union(qsb)
But there is no way(that I know of) to mimic NULL in union in Django. So there are two ways you can proceed here.
First One, add an extra field in your Model with name other_column. You can put the values empty like this:
other_column = models.CharField(max_length=255, null=True, default=None)
and use the Django queryset union operations as described in here.
Last One, the approach is bit pythonic. Try like this:
a = ModelA.objects.values_list('text_a', 'title', 'other_column')
b = ModelB.objects.values_list('text_a', 'title')
union_list = list()
for i in range(0, len(a)):
if b[i] not in a[i]:
union_list.append(b[i])
union_list.append(a[i])
Hope it helps!!
I have a rails join table between 2 models superhero and superpower. Now I have 3 different superpower id and I want all the superheroes which have all the selected superpowers
To do that I'm trying to do the following:
matches = Superhero.all
matches = matches.joins(:superpowers).where('superpowers.id = ?', 17).where('superpowers.id = ?', 12).where('superpowers.id = ?', 6)
But this gives me an empty object even though I have superheroes which have all the given superpowers in my join table
The query generated from the above is:
SELECT "superheroes".* FROM "superheroes" INNER JOIN "superheroes_superpowers" ON "superheroes_superpowers"."superhero_id" = "superheroes"."id" INNER JOIN "superpowers" ON "superpowers"."id" = "superheroes_superpowers"."superpower_id" WHERE (superpowers.id = 17) AND (superpowers.id = 17) AND (superpowers.id = 12) AND (superpowers.id = 6)
So weirdly it tries to check for the superpower with id 17 twice (but it shouldn't affect the result I think) and the rest of the query seems to be correct.
try using an in clause
superpowers_ids = [17,12,6]
matches = Superhero.all
matches = matches.joins(:superpowers).where('superpowers.id in (?)', superpowers_ids)
Superhero.joins(:superpowers).where(superpowers: { id: [17,12,6] } )
This gives the following SQL query (formatted for readibility):
SELECT "superheros".*
FROM "superheros"
INNER JOIN "superhero_superpowers" ON "superhero_superpowers"."superhero_id" = "superheros"."id"
INNER JOIN "superpowers" ON "superpowers"."id" = "superhero_superpowers"."superpower_id"
WHERE "superpowers"."id" IN (17, 12, 6)
SELECT F.* FROM FlightSchedule F, Aircrafts A
WHERE F.aircraftType = A.aircraftType
LIKE CONCAT('\"','%', F.aircraftType, '%','\"') AND F.flightNum_arr='3913';
SAMPLE CONTENT OF DB TABLES:
Table "Schedule"
aircraftType = "320"
Table "Aircrafts"
aircraftType = "A320"
aircraftType = "A330"
The expected result of the query is the selection of an entry that has aircraftType = "320" and flightNum_arr = "3913", because "320" is identical to "A320". The problem is that "320" and "A320" are not considered as identical in this query. How to fix the problem?
Either use = for exact match, or LIKE for pattern match, but don't put both of them into the same expression. And you don't need to concat quotes into the LIKE pattern.
SELECT F.*
FROM FlightSchedule F
JOIN Aircrafts A
ON A.aircraftType LIKE CONCAT('%', F.aircraftType, '%')
WHERE F.flightNum_arr = '3913'