I have the following query:
SELECT mutations.id, genes.loc FROM mutations, genes where mutations.id=genes.id;
and outputs this:
| SL2.50ch02_51014904 | intergenic |
| SL2.50ch02_51014907 | upstream |
| SL2.50ch02_51014907 | downstream |
| SL2.50ch02_51014907 | intergenic |
| SL2.50ch02_51014911 | upstream |
| SL2.50ch02_51014911 | downstream |
My desired output is this:
| SL2.50ch02_51014904 | intergenic |
| SL2.50ch02_51014907 | upstream,downstream,intergenic |
| SL2.50ch02_51014911 | upstream,downstream |
I thought GROUP_CONCAT was useful for this. However, doing this:
SELECT mutations.id, GROUP_CONCAT(distinct(genes.loc)) FROM mutations, genes WHERE mutations.id=genes.id;
I have a unique row like this:
SL2.50ch02_51014904 | downstream,intergenic,upstream
How can I solve this?
You need to add group by:
SELECT m.id, GROUP_CONCAT(distinct(g.loc))
FROM mutations m JOIN
genes g
ON m.id = g.id
GROUP BY m.id;
Along the way, you should learn a couple other things:
Use explicit join syntax. A simple rule: never use commas in the from clause.
Use table aliases (the m and g). They make the query easier to write and to read.
You forgot the GROUP BY:
SELECT
mutations.id,
GROUP_CONCAT(DISTINCT(genes.loc))
FROM
mutations, genes
WHERE
mutations.id=genes.id
GROUP BY
mutations.id
Related
Extracting a set record ids from a translations table using a subquery.
Then need to feed this set of ids to several WHERE clauses of another query in order to extract the record from a specific table (product_listings) via a series of joins.
Table join structure
product_brands(1) <-> (n)products(1) <-> (n)product_categories(1) <-> (n)product_listings
The set of ids returned by the subquery can be for any of the 4 tables above.
Subquery returning the sets of ids
select
record_id
from
translations
where
translations.locale = 'en_CA'
and (
translations.table = 'product_listings'
or translations.table = 'product_categories'
or translations.table = 'products'
or translations.table = 'product_brands'
)
and MATCH (translations.translation) AGAINST ('+jack*' IN BOOLEAN MODE);
Main query here using the ids in WHERE clauses
select
product_listings.*
from
product_listings
left join product_categories on product_categories.ch_id = product_listings.ch_vintage_id
left join products on products.ch_id = product_categories.ch_product_id
left join product_brands on product_brands.ch_id = products.ch_brand_id
where
product_listings.ch_id in (5951765, 252242) <---| Replace these fixed ids
or product_categories.ch_id in (5951765, 252242) <---| with the "record_id" set
or products.ch_id in (5951765, 252242) <---| returned by the subquery
or product_brands.ch_id in (5951765, 252242); <---|
Both queries works perfectly independently. But cannot succesfully merge them into one.
Only dirty way I found is to repeat the subquery at each WHERE clause. Tried it and it works, but doubtfully the most effective and optimized way to do it.
Tried using variable, but only one value can be stored - unfortunately not a viable option.
Spent countless hours ressearching on how to avoid repeating a subquery and been rewriting those in many ways, but still can't get it to work.
Any suggestion on how to integrate the subquery elegantly and efficiently?
Currently working with Mysql Ver 14.14 Distrib 5.7.37, for Linux (x86_64)
UPDATE 2022/04/16: Adding sample data of translations table and expected results of both queries
Sample of the translations table with those 2 ids
+-----------+----------------+--------+-------------------------------+
| record_id | table | locale | translation |
+-----------+----------------+--------+-------------------------------+
| 5951765 | products | en_CA | Jack Daniel's |
| 252242 | product_brands | en_CA | Dixon's & Jack Daniel's |
+-----------+----------------+--------+-------------------------------+
Here is the subquery response
+-----------+
| record_id |
+-----------+
| 5951765 |
| 252242 |
+-----------+
And a the main query response (final expected results) using the set of hardcoded ids. I modified the select clause to return specific columns to make the table readable instead of the '*'.
First 2 columns are the located set of ids in the products and product_brands table and 2 other one are from the corresponding product_listings record extracted via the joins.
+------------+----------+--------------+-----------------+
| product_id | brand_id | listing_cspc | listing_format |
+------------+----------+--------------+-----------------+
| 5951765 | 5936861 | 798248 | 6x750 |
| 5951765 | 5936861 | 545186 | 6x750 |
| 5951956 | 252242 | 400669 | 12x750 |
| 5951955 | 252242 | 400666 | 12x750 |
| 5951701 | 252242 | 437924 | 12x750 |
| 5951337 | 252242 | 20244 | 6x750 |
| 5950782 | 252242 | 65166 | 12x750 |
| 5950528 | 252242 | 104941 | 12x750 |
| 5949763 | 252242 | 13990091 | 12x750 |
| 5949750 | 252242 | 614064 | 12x750 |
...
| 1729121 | 252242 | 280248 | 12x750 |
| 1729121 | 252242 | 36414 | 12x750 |
+------------+----------+--------------+-----------------+
As you can see, the ids from the subquery are matching different column. In this case 5951765 is the products.ch_id and the 252242 is the product_brands.ch_id.
Below is a visual representation of what I'm trying to achieve considering the current (1):(n) relations of the tables
Translations seems to be the driver for this so I would consider a view and drive from the view
Create view yoursubquery as vids;
select
product_listings.*
from vids
left join product_listings pn product_listings.id = vids.record_id
left join product_categories on product_categories.ch_id = product_listings.ch_vintage_id
left join products on products.ch_id = product_categories.ch_product_id
left join product_brands on product_brands.ch_id = products.ch_brand_id
Sample data would be good..
FINALLY! Got it to work.
With #P.Salmon suggestion to store the subquery result in a view, I then did a cross join on that view and use the results in the WHERE clause of the main query.
But that led me to now simply skip the view and the true final solution is to put the subquery in the cross join thus skipping the view.
Sleek and VERY performant.
Final query with subquery in the croos join
select
product_listings.*
from
product_listings
cross join (
select
record_id
from
translations
where
translations.locale = 'en_CA'
and (
translations.table = 'product_listings'
or translations.table = 'product_categories'
or translations.table = 'products'
or translations.table = 'product_brands'
)
and MATCH (translations.translation) AGAINST ('+jack*' IN BOOLEAN MODE)
) as vids
left join product_categories on product_categories.ch_id = product_listings.ch_vintage_id
left join products on products.ch_id = product_categories.ch_product_id
left join product_brands on product_brands.ch_id = products.ch_brand_id
where
product_listings.ch_id = vids.record_id
or product_categories.ch_id = vids.record_id
or products.ch_id = vids.record_id
or product_brands.ch_id = vids.record_id
order by
product_brands.ch_id desc,
products.ch_id desc;
I have a first table called emails with a list of all the emails of my colleagues
| email |
| ----------------------- |
| saramaia#email.com |
| miguelferreira#email.com |
| joaosilva#email.com |
| joanamaia#email.com |
I have a second table called aliases, with a list of all the secondary emails/aliases my colleagues are using
| alias1 | alias2 |
| ------------------------ | ------------------- |
| joanamaia#email.com | maiajoana#email.com |
| maiajoana#email.com | maia#email.com |
| miguelferreira#email.com | miguel#email.com |
| maia#email.com | joana#email.com |
| joanamaia#email.com | jomaia#email.com |
| joana#email.com | jmaia#email.com |
I can see that the users joanamaia#email.com and miguelferreira#email.com are using aliases. But let's focus on the user joanamaia#email.com.
I need to get a list of all the email addresses the user joanamaia#email.com is using. The difficult part is that I need to get a list with the main email address plus all the intersections where the first email and consecutive ones are being used by this user. The end result should look like this
| emails |
| ------------------- |
| joanamaia#email.com |
| jomaia#email.com |
| maiajoana#email.com |
| maia#email.com |
| joana#email.com |
| jmaia#email.com |
If I do WHERE email='joanamaia#email.com' it should look like this, but I also need the same result if I do
WHERE email='jmaia#email.com'
I've been through some days of testing queries and I don't seem to have a solution for this (I've been using right joins, full outer joins and unions, but no luck so far). Is there a good way to do this?
You can use a recursive CTE to walk the graph and get the full list of interconnected aliases. Care needs to be taken to handle cycles; that requires the query to use UNION instead of the traditional UNION ALL to separate the anchor and recursive member of the CTE.
The query can take the form:
with recursive
n as (
select 'joanamaia#email.com' as email
union
select case when a.alias1 = n.email then a.alias2 else a.alias1 end
from n
join aliases a on (a.alias1 = n.email or a.alias2 = n.email)
and a.alias1 <> a.alias2
)
select * from n;
Result:
email
-------------------
joanamaia#email.com
maiajoana#email.com
jomaia#email.com
maia#email.com
joana#email.com
jmaia#email.com
See running example at DB Fiddle.
I asked earlier about a solution to my problem which worked however now when I'm trying to get some information from a second table (that stores more information) I'm running into a few issues.
My tables are as follows
Users
+----+----------------------+---------------+------------------+
| id | username | primary_group | secondary_groups |
+----+----------------------+---------------+------------------+
| 1 | Username1 | 3 | 7,10 |
| 2 | Username2 | 7 | 3,5,10 |
| 3 | LongUsername | 1 | 3,7 |
| 4 | Username3 | 1 | 3,10 |
| 5 | Username4 | 7 | |
| 6 | Username5 | 5 | 3,7,10 |
| 7 | Username6 | 2 | 7 |
| 8 | Username7 | 4 | |
+----+----------------------+---------------+------------------+
Profile
+----+---------------+------------------+
| id | facebook | steam |
+----+---------------+------------------+
| 1 | 10049424151 | 11 |
| 2 | 10051277183 | 55 |
| 3 | 10051281183 | 751 |
| 4 | | 735 |
| 5 | 10051215770 | 4444 |
| 6 | 10020210531 | 50415 |
| 7 | 10021056938 | 421501 |
| 8 | 10011547143 | 761 |
+----+---------------+------------------+
My SQL is as follows (based off the previous thread)
SELECT u.id, u.username, p.id, p.facebook, p.steam
FROM users u, profile p
WHERE p.id=u.id AND FIND_IN_SET( '7', secondary_groups )
OR primary_group = 7
GROUP BY u.id
The problem is my output is displayed as below
+----+----------------------+-------------+-------+
| id | username | facebook | steam |
+----+----------------------+-------------+-------+
| 1 | Username1 | 10049424151 | 11 |
| 2 | Username2 | 10051277183 | 55 |
| 3 | LongUsername | 10051281183 | 751 |
| 4 | Username4 | 10051215770 | 4444 |
| 5 | Username5 | 10049424151 | 11 |
| 6 | Username6 | 10049424151 | 55 |
+----+----------------------+-------------+-------+
I'm guessing that the problem is that profile rows with a primary_group of 7 are getting matched to all user rows. Remove the GROUP BY, and you'll be able to better see what is happening.
But that's just a guess. It's not clear what you are attempting to achieve.
I suspect you are getting tripped up with the order of precedence of the AND and OR. (The AND operator has a higher order of precedence than OR operator. That means the AND will be evaluated before the OR.)
The quick fix is to just add some parens, to override the default order of operations. Something like this:
WHERE p.id=u.id AND ( FIND_IN_SET('7',secondary_groups) OR primary_group = 7 )
-- ^ ^
The parens will cause the OR operation to be evaluated (as either TRUE, FALSE or NULL) and then the result from that will be evaluated in the AND.
Without the parens, it's the same as if the parens were here:
WHERE ( p.id=u.id AND FIND_IN_SET('7',secondary_groups) ) OR primary_group = 7
-- ^ ^
With the AND condition evaluated first, and the result from that is operated on by OR. This is what is causing profile rows with a 7 to be matched to rows in user with different id values.
A few pointers on style:
avoid the old-school comma operator for join operations, and use the newer JOIN syntax
place the join predicates (conditions) in the ON clause, other filtering criteria in the WHERE clause
qualify all column references
As an example:
SELECT u.id
, u.username
, p.id
, p.facebook
, p.steam
FROM users u
JOIN profile p
ON p.id = u.id
WHERE u.primary_group = 7
OR FIND_IN_SET('7',u.secondary_groups)
ORDER BY u.id
We only need a GROUP BY clause if we want to "collapse" rows. If the id column is unique in both the users and profile tables, then there's no need for a GROUP BY u.id. We can add an ORDER BY clause if we want rows returned in a particular sequence.
I don't know, what exactly do you want to do with output, but you can't group informations like this. MySQL isn't really a classic programming language, it's more like powerful tool for set mathematics. So if you want to get informations based on corelations between two or more tables, first you write a select statement which contains raw data which you want to work with, like this:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id
GROUP BY u.id;
Now you select relevant data with WHERE statement:
SELECT * FROM users u INNER JOIN profile p ON p.id=u.id WHERE
FIND_IN_SET( '7', secondary_groups ) OR primary_group = 7
GROUP BY u.id;
Now you should see grouped joined tables profile and users, and can start mining data. For example, if you want to count items in these groups, just add count function in SELECT and so on.
When debugging SQL, I highly recommend these steps:
1.) First, you should write down all corelations between data, all foreign keys between tables, so you will know if your selection is fully deterministic. You can now start JOINing tables from left to right
2.) Try small bits of querys on model database. Then you will see which selection works right and which doesn't do what you expected.
I think #SIDU has it in the comments: You are experiencing a Boolean order of operations problem. See also SQL Logic Operator Precedence: And and Or
For example:
SELECT 0 AND 0 OR 1 AS test;
+------+
| test |
+------+
| 1 |
+------+
When doing complex statements with both AND and OR, use parenthesis. The operator order problem is leading to you doing an unintended outer join that's being masked by your GROUP BY. You shouldn't need a GROUP BY for that statement.
Although I don't personally care for the style #spencer7593 suggests in his answer(using INNER JOIN, etc.), it does have the advantage of preventing or identifying errors early for people new to SQL, so it's something to consider.
I have these tables in my MySQL database:
General table:
+----generalTable-----+
+---------------------+
| id | scenario | ... |
+----+----------+-----+
| 1 | facebook | ... |
| 2 | chief | ... |
| 3 | facebook | ... |
| 4 | chief | ... |
Facebook Table:
+----facebookTable-----+
+----------------------+
| id | expiresAt | ... |
+----+-----------+-----+
| 1 | 12345678 | ... |
| 3 | 45832458 | ... |
Chief Table:
+------chiefTable------+
+----------------------+
| id | expiresAt | ... |
+----+-----------+-----+
| 2 | 43547343 | ... |
| 4 | 23443355 | ... |
Basically, the general table holds some (obviously) general data. Based on the generalTable.scenario you can look up more details in the other two tables, which are in some columns familiar (expiresAt for example) but in others not.
My question is, how to get the joined data of generalTable and the right detailed table in just one query.
So, I would like a query like this:
SELECT id, scenario, expiresAt
FROM generalTable
JOIN facebookTable
ON generalTable.id = facebookTable.id
JOIN chiefTable
ON generalTable.id = chiefTable.id
And an output like this:
| id | scenario | expiresAt |
+----+----------+-----------+
| 1 | facebook | 12345678 |
| 2 | chief | 43547343 |
| 3 | facebook | 45832458 |
| 4 | chief | 23443355 |
However, this doesn't work, because both facebookTable and chiefTable have ambiguous column name "expiresAt". For the ease of use I want to keep it that way. The result table should also only have one column "expiresAt" that is automatically filled with the right values from either facebookTable or chiefTable.
You might want to consider adding expiredAt to your general table, and removing it from the others, to remove duplication in the schema, and to make this particular query simpler.
If you need to stick with your current schema, you can use table aliases to resolve the name ambiguity, and use two joins and a union to create the result you are looking for:
SELECT g.id, g.scenario, f.expiresAt
FROM generalTable g
JOIN facebookTable f
ON g.id = f.id
UNION ALL
SELECT g.id, g.scenario, c.expiresAt
FROM generalTable g
JOIN chiefTable c
ON g.id = c.id;
The outer join approach mentioned in another answer would also solve the problem.
One way you could accomplish it is with LEFT JOIN. In the result fields you can do something like this for common fields IF(fTbl.id IS NULL, cTbl.expiresAt, fTbl.expiresAt) AS expiresAt.
i have two tables named as oc_users and oc_groups there is no specific relation between both the tables as shown below so, here i want to map each user with each group:
1)table 1:
select uid from oc_users;
+-----------------+
| uid |
+-----------------+
| manesh#abc.in |
| pankaj |
| sumit |
+-----------------+
2)table 2:
select gid from oc_groups;
+---------+
| gid |
+---------+
| qlc |
| qlc-web |
+---------+
Then i want o/p like:
+---------+-----------------+
| gid | uid |
+---------+-----------------+
| qlc | manesh#abc.in |
| qlc | pankaj |
| qlc | sumit |
| qlc-web | manesh#abc.in |
| qlc-web | pankaj |
| qlc-web | sumit |
+---------+-----------------+
Use this
select * from oc_user , oc_groups ORDER BY gid, uidS
This will print all columns with Cartesian multiplecation of rows.
you need to use CROSS JOIN (new SQL Syntax: ANSI SQL-92)
SELECT gid, uid
FROM oc_users CROSS JOIN oc_groups
ORDER BY gid, uid
SQLFiddle Demo
Using CROSS JOIN(Cartesian Product Join) will solve your purpose. For your information
A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table.
So simply this will do,
SELECT gid, uid
FROM oc_users CROSS JOIN oc_groups;
Always go for 'CROSS JOIN' syntax instead of giving nothing. Since this is ANSI format you can use it across database and it will be useful during migration also as you dont need to change anything in your query.
Hope this helps you!!