I'm trying to join two tables in MySQL, in one I have a set of IDs (of the type GTEX-14BMU-1526-SM-5TDE6) and a set of type's of tissue(SMTS), I have to select the IDs for the tissue type 'Blood' (which is another column of the same table), and then I have to take only the first two strings (GTEX-14BMU) from the ID name and make a list of the different ones.
Then I have to compare this to a second table in which I have a list of IDs that already are in the type (GTEX-14BMU) which have to meet the condition that the column sex of this same table is 2.
The expected result is a list with the IDs which are sex type 2 and have tissue type 'Blood', meaning the ones that are coinciding. I'm trying to solve this by using JOIN and all the needed conditions in the same statement, which is:
mysql> SELECT DISTINCT SUBSTRING_INDEX(g.SAMPID,'-',2) AS sampid, m.SUBJID, g.SMTS, m.SEX
-> FROM GTEX_Sample AS g
-> JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
-> WHERE m.SEX=2
-> AND g.SMTS='Blood';
But I'm either getting too many results from the combination of all possibilities or I'm getting an empty set. Is there any other way to do this?
Here:
JOIN GTEX_Pheno AS m ON sampid=m.SUBJID
I suspect that your intent is to refer to the substring_index() expression that is defined in the select clause (which is aliased sampid as well). In SQL, you can't reuse an alias defined in the select clause in the same scope (with a few exceptions, such as the ORDER BY clause, or the GROUP BY clause in MySQL). So the database thinks you are referring to column sampid of the sample table. If you had given a different alias (say sampid_short) and tried to use in the ON clause of the join, you would have met a compilation error.
You need to either repeat the expression, or use a subquery:
select substring_index(g.sampid, '-', 2) as sampid, m.subjid, g.smts, m.sex
from gtex_sample as g
inner join gtex_pheno as m on substring_index(g.sampid, '-', 2) = m.subjid
where m.sex = 2 and g.smts = 'blood';
I'm trying to find out if the code below is in the right format to retrieve the yearly sum of payments
select sum(payment)
select mem_type.mtype, member.name, payment.payment_amt
from mem_type, member, payment
where mem_type.mtype = member.mtype
and member.mem_id = payment.mem_id
group by mem_id
having payment.date > '2014-1-1' <'2014-12-31';
There's a few problems with the statement.
The keyword SELECT appears twice, and that's not valid the way you have it. (A SELECT keyword is needed in a subquery or an inline view, but otherwise, it's not valid to repeat the keyword SELECT.
The predicate in the HAVING clause isn't quite right. (MySQL may accept that as valid syntax, but it's not doing what you are wanting to do. To return rows that have a payment.date in a specific year, we'd typically specify that as predicates in the WHERE clause:
WHERE payment.date >= '2014-01-01'
AND payment.date < '2015-01-01'
Also, I'd recommend you ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead, and relocate the join predicates from the WHERE clause to an ON clause. For example:
SELECT ...
FROM member
JOIN mem_type
ON mem_type.mtype = member.mtype
JOIN payment
ON payment.mem_id = member.mem_id
It's good to see that you've qualified all the column references.
Unfortunately, it's not possible to recommend the syntax that will return the resultset you are looking for. There are too many unknowns, we'd just be guessing. An example of the result you are wanting returned, from what data, that would go a long ways towards a specification.
If I had to take a blind "guess" at a query that would meet the ambiguous specification, without any knowledge of the tables, columns, datatypes, et al. my guess would be something like this:
SELECT m.mem_id
, t.mtype
, m.name
, IFNULL(SUM(p.payment_amt),0) AS total_payments_2014
FROM member m
LEFT
JOIN mem_type t
ON t.mtype = m.mtype
LEFT
JOIN payment p
ON p.mem_id = m.mem_id
WHERE p.date >= '2014-01-01'
AND p.date < '2014-01-01' + INTERVAL 1 YEAR
GROUP BY m.mem_id
This only serves as an example. This is premised on a whole lot of information that isn't provided (e.g. what is the datatype of the date column in the payment table? Do we want to exclude payments with dates of 1/1 or 12/31? Is the mem_id column unique in member table? Is mtype column unique in the mem_type table, can mem_type column in the members table be NULL, do we want all rows from the members table returned, or only those that had a payment in 2014, etc. Can the mem_id column on the payment table be NULL, are there rows in payment that we want included but which aren't related to a member? et al.
I try to return a group_concat on 2 tables
One being my list of schools and the other, some numeric data.
For some dates, i have NO DATA at all in the table SimpleData and so my lEFT OUTER JOINS returns 10 results where i have 11 schools (i need 11 rows for javascript treatment in order too)
here is my query (tell me if i need to give more details about tables
SELECT A.nomEcole,
A.Compteur,
IFNULL(SUM(B.rendementJour), '0') AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
WHERE jourUS LIKE '2013-07-%'
GROUP BY ecole
in this example, i have no data in SimpleData for this month( not data was recorded at all)
I have to show either NULL or '0' for this missing school and i'm starting to lose my head on something easy apparently :(
Thanks for any help !
olivier
As one way is mentioned by #Abhik Chakraborty where will filter out the records which doesn't match the criteria ,another is you can use CASE statement
SELECT A.nomEcole,
A.Compteur,
SUM(CASE WHEN jourUS LIKE '2013-07-%' THEN B.rendementJour ELSE 0 END) AS TOTAL,
B.jourUS,
B.rendementJour
FROM ecoles A
LEFT OUTER JOIN SimpleData B ON A.Compteur = B.compteur
GROUP BY ecole
I suspect you just need to move the where condition to the on clause:
SELECT A.nomEcole, A.Compteur, IFNULL(SUM(B.rendementJour), 0) AS TOTAL,
B.jourUS, B.rendementJour
FROM ecoles A LEFT OUTER JOIN
SimpleData B
ON A.Compteur = B.compteur and b.jourUS >= '2013-07-01' and b.jourUS < '2013-08-01'
GROUP BY A.ecole;
Some other changes:
Don't use single quotes for numeric constants. Single quotes should really only be used for date and string constants.
Don't use like for dates. like is an operation on strings, not dates, and the date has to be implicitly converted to a string. Instead, do direct comparisons on the date ranges you are interested in.
I would also recommend that the table aliases be abbreviations for the tables you are using. This makes the query easier to read. (So e instead of A for ecoles.)
Also note that the values that you are returning for JourUS and RendementJour are indeterminate. If there are multiple rows in the B table that match, then an arbitrary value will be returned. Perhaps you want max() or group_concat() for them.
Your WHERE clause turns the LEFT OUTER JOIN into an INNER JOIN, because outer-joined records values are NULL and NULL is never LIKE '2013-07-%'.
This is the reason you must move jourUS LIKE '2013-07-%' to the ON clause, because you only want to join records where jourUS LIKE '2013-07-%' and otherwise outer join a null record.
Is there a way to select rows where one of the column contains only , but any number of, predefined values?
I've been using this, but it returns any rows where my column contains at least one of the values (which is exactly what it's suppose to do, I know).
But I'm looking for a way to only select rows that have ONLY my keywords in the keyword column.
SELECT *
FROM
`products`.`product`
WHERE
keywords LIKE '%chocolate%'
AND keyword LIKE '%vanilla%';
Example Keywords: chocolate, sugar, milk, oats
Using the above keywords, I would want the first two results returned, but not the last two:
Product1: chocolate, sugar
Product2: chocolate
Product3: chocolate, sugar, milk, oats, bran
Product4: chocolate, sugar, salt
My column contains a comma separated list of all keywords applicable to that product row.
Since you are storing the list as a string containing a comma separated list, rather than as a set, MySQL isn't going to be able to help much with that. When it was inserted into the database, MySQL saw it as a single string. When it's retrieved from the database, MySQL sees it as a single string. When we refer to it in a query, MySQL sees it as a single string.
If the "list" was stored as a standard relational set, with each keyword for a product stored as a separate row in the table, then returning the result set you specified is almost trivial.
For example, if we had this table:
CREATE TABLE product_keyword
product_id BIGINT UNSIGNED COMMENT 'FK ref products.id'
keyword VARCHAR(20)
With each keyword associated to a particular product as a separate row:
product_id keyword
---------- ---------
1 chocolate
1 sugar
2 chocolate
3 bran
3 chocolate
3 milk
3 oats
3 sugar
4 chocolate
4 salt
4 sugar
Then to find all rows in product that have a keyword other than 'chocolate' or 'vanilla'
SELECT p.id
FROM product p
JOIN product_keyword k
WHERE k.product_id = p.id
ON k.keyword NOT IN ('chocolate','vanilla')
GROUP BY p.id
--or--
SELECT p.id
FROM product p
LEFT
JOIN ( SELECT j.id
FROM product_keyword j
WHERE j.keyword NOT IN ('chocolate','vanilla')
GROUP BY j.id
) k
ON k.id = p.id
WHERE k.id IS NULL
To get products that have at least one of the keywords 'chocolate' and 'vanilla', but that have no other keywords associated, it's the same query above, but with an additional join:
SELECT p.id
FROM product p
JOIN ( SELECT g.id
FROM product_keyword g
WHERE g.keyword IN ('chocolate','vanilla')
GROUP BY g.id
) h
ON h.id = p.id
LEFT
JOIN ( SELECT j.id
FROM product_keyword j
WHERE j.keyword NOT IN ('chocolate','vanilla')
GROUP BY j.id
) k
ON k.id = p.id
WHERE k.id IS NULL
We can unpack those queries, they aren't difficult. Query h returns a list of product_id that have at least one of the keywords, query k returns a list of product_id that have some keyword other than those specified. The "trick" there (if you want to call it that) is the anti-join pattern... doing an outer join to match rows, and include rows that didn't have a match, and a predicate in the WHERE clause that eliminates rows that had a match, leaving the set of rows from product that didn't have a match.
But with the set stored as a "comma separated list" in a single character column, we lose all the advantages of relational algebra; there isn't any easy way to process the list of keywords as a "set".
With the entire list stored as a single string, we've got some horrendous SQL to get the specified result.
One approach to doing the kind of check you specify would be to create a set of all possible "matches", and check those. This is workable for a couple of keywords. For example, to get a list of products that have ONLY the keywords 'vanilla' and/or 'chocolate', (that is, that have at least one of those keywords and does not have any other keyword):
SELECT p.id
FROM product
WHERE keyword_list = 'chocolate'
OR keyword_list = 'vanilla'
OR keyword_list = 'chocolate,vanilla'
OR keyword_list = 'vanilla,chocolate'
But extending that to three, four or five keywords quickly becomes unwieldy (unless the keywords are guaranteed to appear in a particular order. And it's very difficult to check for three out of four keywords.
Another (ugly) approach is to transform the keyword_list into a set, so that we can use queries like the first ones in my answer. But the SQL to do the transformation is limited by an arbitrary maximum number of keywords that can be extracted from the keyword_list.
It's fairly easy to extract the nth element from a comma separated list, using some simple SQL string functions, for example, to extract the first five elements from a comma separated list:
SET #l := 'chocolate,sugar,bran,oats'
SELECT NULLIF(SUBSTRING_INDEX(CONCAT(#l,','),',',1),'') AS kw1
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(#l,','),',',2),',',-1),'') AS kw2
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(#l,','),',',3),',',-1),'') AS kw3
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(#l,','),',',4),',',-1),'') AS kw4
, NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(#l,','),',',5),',',-1),'') AS kw5
But those are still on the same row. If we want to do checks on those, we'd have a bit of comparing to do, we'd need to check each one of those to see if it was in the specified list.
If we can get those keywords, on the one row, transformed into a set of rows with one keyword on each row, then we could use queries like the first ones in my answer. As an example:
SELECT t.product_id
, NULLIF(CASE n.i
WHEN 1 THEN SUBSTRING_INDEX(CONCAT(t.l,','),',',1)
WHEN 2 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',2),',',-1)
WHEN 3 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',3),',',-1)
WHEN 4 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',4),',',-1)
WHEN 5 THEN SUBSTRING_INDEX(SUBSTRING_INDEX(CONCAT(t.l,','),',',5),',',-1)
END,'') AS kw
FROM ( SELECT 4 AS product_id,'fee,fi,fo,fum' AS l
UNION ALL
SELECT 5, 'coffee,sugar,milk'
) t
CROSS
JOIN ( SELECT 1 AS i
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) n
HAVING kw IS NOT NULL
ORDER BY t.product_id, n.i
That gets us individual rows, but it's limited to a row for each of the first 5 keywords. It's easy to see how that would be extended (having n return 6,7,8,...) and extending the WHEN conditions in the CASE to handle 6,7,8...
But there is going to be some arbitrary limit. (I've used an inline view, aliased as t, to return two "example" rows, as a demonstration. That inline view could be replaced with a reference to the table containing the product_id and keyword_list columns.)
So, that query gets us a rowset like would be returned from the product_keyword table I gave as an example above.
In the example queries, references to the product_keyword table could be replaced with this query. But that's a whole lot of ugly SQL, and its horrendously inefficient, creating and populating temporary MyISAM tables anytime a query is run.
You probably want to setup a fulltext index on keywords for your table. This allows you to search the keywords column and specify what keywords to include or not include. Here's a command which sets up the index:
ALTER TABLE products ADD FULLTEXT index_products_keywords (keywords);
Once you've done that, you can use the MATCH AGAINST phrase and specify keywords. You can use it like WHERE MATCH(keywords) AGAINST ('chocolate') to just search for the term chocolate. Or you can use BOOLEAN MODE to "turn-off" certain keywords.
SELECT * FROM products
WHERE MATCH(keywords) AGAINST ('+chocolate -bran' IN BOOLEAN MODE);
Here's a small tutorial about fulltext indexes
I'm practicing for my internship and I mixed in some baseball as well. Working on a view that will give me things like average and obp. I'm getting an error, can someone tell me what's wrong with my syntax?
CREATE VIEW `Master Batter 2009 - 2013` AS
select (select * from baseball.batting),
(select Sum(H)/Sum(AB) from baseball.batting) as 'Average',
(select (Sum(H)+Sum(BB)+Sum(HBP))/(Sum(AB)+Sum(BB)+Sum(HBP)+Sum(SF))) as 'OBP',
join baseball.master
on baseball.master.playerID = baseball.batting.playerID
where yearID = '2013'
Group By playerID
Here's a list of few things that are wrong (some syntactically invalid, some just violations of best practice.)
using identifiers (e.g. view names) that include spaces, dashes and/or other special characters is syntactically valid, but these are way too problematic to be useful
the query has no FROM clause (the JOIN keyword appears where we expect a FROM keyword)
extra comma following the last item in the SELECT list
ON clause only valid following JOIN, which requires a preceding FROM
predicates in ON clause reference invalid identifier, baseball.batting is not a valid reference to table, view or row source alias (referenced in the FROM clause of the query)
first item in SELECT list can be a subquery, but the subquery can return at most one column, and return at most one row
references to H, AB, player_id are all unqualified; best practice is to qualify ALL column references
identifiers and aliases should be enclosed in backticks, not single quotes
I recommend you get a query developed and tested before you preface it with CREATE VIEW
Absent table definitions (which columns are in which tables), it's nearly impossible to decipher what you are trying to accomplish.
I think you want something like this:
SELECT m.player_id AS `player_id`
, SUM(b.H)/SUM(b.AB) AS `Average`
, (SUM(b.H)+SUM(b.BB)+SUM(b.HBP))/(SUM(b.AB)+SUM(b.BB)+SUM(b.HBP)+SUM(b.SF))) AS `OBP`
FROM baseball.master m
LEFT
JOIN baseball.batting b
ON b.player_id = m.player_id
AND b.yearID = 2013
GROUP BY m.playerID
Note that there are some edge cases you may want to handle... divide by zero, addition of NULL results in NULL, etc.