Combine table data in MySQL using JOIN - mysql

I'm trying to join two tables in MySQL, in one I have a set of IDs (of the type GTEX-14BMU-1526-SM-5TDE6) and a set of type's of tissue(SMTS), I have to select the IDs for the tissue type 'Blood' (which is another column of the same table), and then I have to take only the first two strings (GTEX-14BMU) from the ID name and make a list of the different ones.
Then I have to compare this to a second table in which I have a list of IDs that already are in the type (GTEX-14BMU) which have to meet the condition that the column sex of this same table is 2.
The expected result is a list with the IDs which are sex type 2 and have tissue type 'Blood', meaning the ones that are coinciding. I'm trying to solve this by using JOIN and all the needed conditions in the same statement, which is:
mysql> SELECT DISTINCT SUBSTRING_INDEX(g.SAMPID,'-',2) AS sampid, m.SUBJID, g.SMTS, m.SEX
-> FROM GTEX_Sample AS g
-> JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
-> WHERE m.SEX=2
-> AND g.SMTS='Blood';
But I'm either getting too many results from the combination of all possibilities or I'm getting an empty set. Is there any other way to do this?

Here:
JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
I suspect that your intent is to refer to the substring_index() expression that is defined in the select clause (which is aliased sampid as well). In SQL, you can't reuse an alias defined in the select clause in the same scope (with a few exceptions, such as the ORDER BY clause, or the GROUP BY clause in MySQL). So the database thinks you are referring to column sampid of the sample table. If you had given a different alias (say sampid_short) and tried to use in the ON clause of the join, you would have met a compilation error.
You need to either repeat the expression, or use a subquery:
select substring_index(g.sampid, '-', 2) as sampid, m.subjid, g.smts, m.sex
from gtex_sample as g
inner join gtex_pheno as m on substring_index(g.sampid, '-', 2) = m.subjid
where m.sex = 2 and g.smts = 'blood';

Related

Create table from joined table in MySQL

I have no problem joining the tables, but when I go to create a new table using the joined tables, I get an error saying that I have duplicate columns.
My code:
SELECT *
FROM field
INNER JOIN race
ON field.raceID = race.raceID;
Updated code:
CREATE TABLE fieldrace AS
SELECT f.*, r.*
FROM field f
INNER JOIN race r
ON f.raceID = r.raceID;
That's true of any select. If there are duplicated column names, you have to reference them somehow. For a .* query this would work:
SELECT f.*, r.*
FROM field f
INNER JOIN race r
ON f.raceID = r.raceID;
Individually you can also add aliases. Maybe you have an id column in both race and field tables.
SELECT f.id as field_id, r.id as race_id, ....
FROM field f
INNER JOIN race r
ON f.raceID = r.raceID;
In the query
CREATE TABLE fieldrace AS
SELECT f.*, r.*
FROM field f
INNER JOIN race r
ON f.raceID = r.raceID;
SELECT part produces two columns with the same name in the output.
Two columns with the same name presence is not allowed in table's structure, and the whole query will fail.
General solution is to list each output column in the SELECT part separately with assigning them unique aliases.
If raceID column which is used for joining is the only column whose name interferes then you may use either USING clause instead of ON clause or NATURAL JOIN instead of INNER JOIN.
CREATE TABLE fieldrace AS
SELECT f.*, r.*
FROM field f
INNER JOIN race r USING (raceID);
-- or
CREATE TABLE fieldrace AS
SELECT f.*, r.*
FROM field f
NATURAL INNER JOIN race r;
In both cases the interfered columns will be collapsed into one column which will be placed to the top of created table structure.
Of course when raceID is not the only column whose name interferes then 1st of these queries will fail due to another column duplication whereas 2nd query will use all interfered columns for joining.
You may specify complete or partial structure of newly create table. In this case the amount and relative posession of the columns in the created table won't be changed (will match SELECT output) but all another properties of the columns (datatype, nullability, etc.) and additional objects (indices, constraints, etc.) listed in the structure will be applied. The columns which are absent in the output (including generated ones) will be added into the structure with default values as the most first ones, before the columns used in USING or during NATURAL JOIN even. DEMO.
you can create "view" or name a subquery using "with"
in both cases, you can access it from anywhere in your main query

SELECT from many different tables and JOIN

I need to combine a few different columns from different tables.
These are listed here. I just can't seem to get the syntax right. I'm a beginner so be patient with me!
The tables are 'report', 'mission' and 'hist_unit'
and the following values are the same
mission.id = report.mission_id
hist_unit.id = report.deployed_unit_id
Tried something along these lines
SELECT
mission_id AS mission_id,
deployed_unit_id AS depl_unit_id,
accepted AS accepted,
character_id AS character_id,
pilot_status AS pilot_status
FROM report
id AS depl_unit_id
faction AS faction
FROM hist_unit
mission.id AS mission_id
hist_date AS hist_date
FROM mission
What I want this query to do is putting together the columns shown above and checking that the values shown at the top correspond to each other.
Then I want it to show me only the lines where faction = 3 and accepted = 1.
Then I want it to show me only the entries
WHERE hist_date BETWEEN '1941-11-15 00:00:00.000' AND '1942-04-15 23:59:59:999'
Output should be something like this
mission_id,depl_unit_id,faction,character_id,pilot_status,accepted,hist_date
Something like this:
SELECT r.id AS report_id
, r.mission_id AS mission_id
, r.deployed_unit_id AS depl_unit_id
, r.accepted AS accepted
, r.character_id AS character_id
, r.pilot_status AS pilot_status
, h.id AS h_depl_unit_id
, h.faction AS faction
, m.id AS m_mission_id
FROM report r
JOIN hist_unit h
ON h.id = r.deployed_unit_id
JOIN mission m
ON m.id = r.mission_id
WHERE h.hist_date >= '1941-11-15'
AND h.hist_date < '1942-04-15' + INTERVAL 1 DAY
AND h.faction = 3
AND r.accepted = 1
ORDER
BY r.id
, r.mission_id
After the SELECT keyword, list all of the expressions to be returned.
The FROM clause references the tables. Multiple tables should be separated by JOIN keyword, and then either an ON clause. (INNER keyword is not required; CROSS keyword is not required, but can be included when there is no join condition, as an aid to the future reader; outer joins require addition of LEFT (or RIGHT keyword).
Consider assigning an alias to each table reference.
Qualify all column references with the assigned table alias (or the table name, if an alias is not assigned.)
When the AS column_name is omitted for a column in the SELECT list, the column name is used. For example, we could omit AS accepted and that fourth column would still have the name accepted.
Avoid returning multiple columns with the sane name. It's not illegal to do that, but consider modifying the column aliases (column names in the resultset) to be unique. i.e. one of the depl_unit_id columns can be renamed.
When comparing date time values in ranges, strongly consider using >= and < comparisons. Don't try to muck with less than or equal to 23:59:59.999 ... what if precision is microseconds, we are leaving a small gap. Let's avoid the gap, and just do a "less than" midnight of the next day, the first datetime value we want to exclude. (Yes, we have to specify the column name / expression twice, once for each comparison, because there is not BETWEEN_GE_LT comparison operator that substitutes a "less than" comparison for the "less than or equal to" comparison of the BETWEEN operator. That's a small price to pay for a more explicitly accurate representation of what we are attempting to achieve.

SQL: How do add a prefix to column names in a JOIN?

I have sql query like this
SELECT * FROM phlegm WHERE JOIN mucus ON phlegm.id = mucus.id JOIN snot ON phlegm.id = snot.id
The problem is those tables contain several columns with identical names.
For example all 3 tables contain the column named test
If I retrieve the result of the query in PHP, then I will only get one value named test ($query->get_result()->fetch_object()->test;), because the other two get overwritten.
Is there some way to edit that query so that it adds a prefix to all columns from a table? For example, column test from table mucus would be referenced in the query as mucus_test and column test from phlegm would be phlegm_test.
One way would be doing
SELECT phlegm.test as phlegm_test, mucus.test as mucus_test FROM phlegm...
But I have a LOT of columns and tables and it would make the query longer than the Great Wall of China if I had to name each field one by one.
So is there some way to add the prefix en masse?
SELECT *, phlegm.test as phlegm_test, mucus.test as mucus_test FROM phlegm...
Used aliasing to retrieve all values associated from all three tables. if you want to reference only specific column do so by using the alias_name.column_name instead of p.*, where * means all columns belonging to table that the alias is associated with( ie. p refers to phlegm).
SELECT p.*, m.*, s.*
FROM phlegm p
JOIN mucus m ON p.id = m.id
JOIN snot s ON p.id = s.id;
I removed the WHERE from your original query above, not sure why it was there.

MySQL - Left joins without duplicate columns

I have an issue with some of the join statements I'm trying to use. I have two tables that need to be joined, with both featuring all of their information. They're as follows.
INSTITUTION
IName | ALocation_ID | IPicture
ADDRESS
ALocation_ID | AStreet | AZip | ...(other relevant fields)
I've been trying to use:
CREATE VIEW InstitutionView
AS SELECT * FROM INSTITUTION
LEFT JOIN ADDRESS
ON INSTITUTION.ALocation_ID=ADDRESS.ALocation_ID;
but the error I receive says something about duplicate columns. What am I doing wrong?
You will have to select the columns individually. Hopefully this helps you out a little.
CREATE VIEW InstitutionView
AS
SELECT address.id,address.iname,address.alocation_id,ipicture,institution.astreet,institution.azip
FROM INSTITUTION
LEFT JOIN ADDRESS
ON INSTITUTION.ALocation_ID=ADDRESS.ALocation_ID;
That is because ALocation_ID column is present in both the tables.
Try creating the view explicitly naming the required columns.
CREATE VIEW InstitutionView
AS SELECT Iname,INSTITUTION.ALocation_ID,IPicture,AStreet,AZip ...
FROM INSTITUTION
LEFT JOIN ADDRESS
ON INSTITUTION.ALocation_ID=ADDRESS.ALocation_ID;
Conceptually, JOIN first creates an intermediate cross-product where the columns are referenced by a table name or alias dotted by a column name from that table; then ON and WHERE filter out rows that don't match to give a second intermediate result. If a column name appears only in one table then you can leave out the table & dot to refer to the column.
MySQL 5.6 Reference Manual :: 13.2.9 SELECT Syntax
You can refer to a column as col_name, tbl_name.col_name, or db_name.tbl_name.col_name. You need not specify a tbl_name or db_name.tbl_name prefix for a column reference unless the reference would be ambiguous. See Section 9.2.1, “Identifier Qualifiers”, for examples of ambiguity that require the more explicit column reference forms.
MySQL 5.6 Reference Manual :: 9.2.1 Identifier Qualifiers
Suppose that tables t1 and t2 each contain a column c, and you retrieve c in a SELECT statement that uses both t1 and t2. In this case, c is ambiguous because it is not unique among the tables used in the statement. You must qualify it with a table name as t1.c or t2.c to indicate which table you mean.
Hence:
CREATE VIEW InstitutionView
AS SELECT IName,I.ALocation_ID,IPicture,AStreet,AZip,...
FROM INSTITUTION I
LEFT JOIN ADDRESS A
ON I.ALocation_ID=A.ALocation_ID;
You might think that if a JOIN is ON or WHERE "=" then there would be no ambiguity. However:
In the case of INNER JOIN, if there were no implicit conversions then columns compared equal would have the same value; but otherwise different values can compare "=". So you can't use just the column name to identify one value.
Moreover for LEFT JOIN, unmatched rows in the left table are extended by NULLs and are added to give a third intermediate result; so in a row a non-NULL in a column in one table can appear with a NULL in the same column of the other table. So again you can't use just the column name to identify one value.
Moreover there doesn't even need to be a test of equality of two columns in a JOIN, or even mention both of or either of columns with a shared name. So the result can have two columns (one from each input table) sharing a name where there is no expectation of equality.

Using an INNER JOIN without returning any columns from the joined table

Running an INNER JOIN type of query, i get duplicate column names, which can pose a problem. This has been covered here extensively and i was able to find the solution to this problem, asides from it being fairly logical, by SELECTing only the columns i need.
However, i would like to know how i could run such a query without actually returning any of the columns from the joined table.
This is my MySQL query
SELECT * FROM product z
INNER JOIN crosslink__productXmanufacturer a
ON z.id = a.productId
WHERE
(z.title LIKE "%search_term%" OR z.search_keywords LIKE "%search_term%")
AND
z.availability = 1
AND
a.manufacturerId IN (22,23,24)
Question
How would i modify this MySQL query in order to return only columns from product and none of the columns from crosslink__productXmanufacturer?
Add the table name to the *. Replace
SELECT * FROM product z
with
SELECT z.* FROM product z
Often when you are doing this, the intention may be clearer using in or exists rather than a join. The join is being used for filtering, so putting the condition in the where clause makes sense:
SELECT p.*
FROM product p
WHERE (p.title LIKE '%search_term%' OR p.search_keywords LIKE '%search_term%') AND
p.availability = 1 AND
exists (SELECT 1
FROM pXm
WHERE pXm.productId = p.id AND pxm.manufacturerId IN (22, 23, 24)
);
With the proper indexes, this should run at least as fast as the join version (the index is crosslink__productXmanufacturer(productId, manufacturerId). In addition, you don't have to worry about returning duplicate records, if there are multiple matches in crosslink__productXmanufacturer.
You may notice two other small changes I made to the query. First, the table aliases are abbreviates for the table names, making the logic easier to follow. Second, the string constants use single quotes (the ANSI standard) rather than double quotes. Using single quotes only for string and date constants helps prevent inadvertent syntax errors.