I have requirement of getting intersection of some results in mysql DB. But after googling came to know that there is no mysql intersect keyword available . Following are my sample tables.
gene table
+------+--------+---------+
| id | symbol | test_id |
+------+--------+---------+
| -1 | A | -1 |
| 8 | A | 3 |
| 9 | G | 3 |
| -1 | A | -1 |
| -2 | B | -1 |
| -3 | C | -1 |
| 1 | A | 1 |
| 2 | B | 1 |
| 3 | C | 1 |
| 4 | B | 2 |
| 5 | C | 2 |
| 6 | D | 2 |
| 7 | E | 2 |
| 8 | A | 3 |
| 9 | G | 3 |
| 10 | F | 3 |
| 11 | C | 3 |
| 12 | C | 4 |
| 13 | G | 4 |
| 14 | F | 4 |
| 15 | M | 4 |
| 16 | N | 4 |
+------+--------+---------+
test table
+------+-------+
| id | name |
+------+-------+
| -1 | test0 |
| 3 | test3 |
| -1 | test0 |
| 1 | test1 |
| 2 | test2 |
| 3 | test3 |
| 4 | test4 |
+------+-------+
Now I want to formulate a query which will give me the tests which are common for provided genes. e.g. I will provide gene A, B, C and I should get the following result:
id name id symbol
---------------------------
-1 | test0 | -1 | A
-1 | test0 | -2 | B
-1 | test0 | -3 | C
1 | test1 | 1 | A
1 | test1 | 2 | B
1 | test1 | 3 | C
I just tried to form a query by following way but didn't work, getting empty resultset and if I use 'or' in where clause getting tests for all genes in where clause.
select distinct t.id, t.name, g.id, g.symbol from tests t
join genes g on t.id = g.test_id
where g.symbol = 'A' and g.symbol='B' and g.symbol='C';
Please help me to construct the query.
The trick is to filter the records with your criteria, then group by test.id to check that it matches all the criteria:
SELECT t.id
FROM tests AS t
INNER JOIN genes AS g
ON t.id = g.test_id
WHERE g.symbol in ('A','B','C')
GROUP BY t.id
HAVING COUNT(DISTINCT g.symbol) = 3;
So the key line is here:
HAVING COUNT(DISTINCT g.symbol) = 3;
If, like test 2, there is only a match on 'B', then the count will return 1 and the test will be excluded. The number of items you are checking for must match the number in the HAVING clause.
If you then need to get the full data out, you just need to join back to your table:
SELECT t.id, t.name, g.id, g.symbol
FROM genes AS g
INNER JOIN
( SELECT t.id, t.name
FROM tests AS t
INNER JOIN genes AS g
ON t.id = g.test_id
WHERE g.symbol in ('A','B','C')
GROUP BY t.id, t.name
HAVING COUNT(DISTINCT g.symbol) = 3
) t
ON t.id = g.test_id;
Example on SQL Fiddle
Change those AND conditions to OR condition like below cause at any point in time g.symbol can hold only one value and not multiple value. that's why you are getting empty result set.
select t.id, t.name, g.id, g.symbol from tests t
join genes g on t.id = g.test_id
where (g.symbol = 'A' or g.symbol='B' or g.symbol='C')
and g.test_id = 1;
(OR) use a IN operator like
select t.id, t.name, g.id, g.symbol from tests t
join genes g on t.id = g.test_id
where g.symbol in ('A','B','C')
and g.test_id = 1;
Related
so what I am trying to do is having 3 tables (pictures, collections, and bridge) with the following columns:
Collections Table:
| id | name |
------------------
| 1 | coll1 |
| 2 | coll2 |
------------------
Pictures Table: (timestamps are unix timestamps)
| id | name | timestamp |
-------------------------
| 5 | Pic5 | 1 |
| 6 | Pic6 | 19 |
| 7 | Pic7 | 3 |
| 8 | Pic8 | 892 |
| 9 | Pic9 | 4 |
-------------------------
Bridge Table:
| id | collection | picture |
-----------------------------
| 1 | 1 | 5 |
| 2 | 1 | 6 |
| 3 | 1 | 7 |
| 4 | 1 | 8 |
| 5 | 2 | 5 |
| 6 | 2 | 9 |
| 7 | 2 | 7 |
-----------------------------
And the result should look like this:
| collection_name | picture_count | newest_picture |
----------------------------------------------------
| coll1 | 4 | 8 |
| coll2 | 3 | 9 |
----------------------------------------------------
newest_picture should always be the picture with the heighest timestamp in that collection and I also want to sort the result by it. picture_count is obviously the count of picture in that collection.
Can this be done in a single statement with table joins and if yes:
how can I do this the best way?
A simple method uses correlated subqueries:
select c.*,
(select count(*)
from bridge b
where b.collection = c.id
) as pic_count,
(select p.id
from bridge b join
pictures p
on b.picture = b.id
where b.collection = c.id
order by p.timestamp desc
limit 1
) as most_recent_picture
from collections c;
A more common approach would use window functions:
select c.id, c.name, count(bp.collection), bp.most_recent_picture
from collections c left join
(select b.*,
first_value(p.id) over (partition by b.collection order by p.timestamp desc) as most_recent_picture
from bridge b join
pictures p
on b.picture = p.id
) bp
on bp.collection = c.id
group by c.id, c.name, bp.most_recent_picture;
I have this data in a table called PROD
| Project | Position | Status |
|---------|----------|--------|
| 1 | 1 | A |
| 1 | 2 | A |
| 2 | 1 | A |
| 2 | 2 | B |
| 3 | 1 | B |
| 3 | 2 | B |
| 4 | 1 | A |
| 4 | 2 | A |
I'm trying to get all the Projects that has at least one Position with Status = B.
| Project | Position | Status |
|---------|----------|--------|
| 2 | 1 | A |
| 2 | 2 | B |
| 3 | 1 | B |
| 3 | 2 | B |
I've tried using a JOIN like this:
SELECT * FROM PROD A JOIN PROD B ON A.PROD-Project = B.PROD-Project WHERE B.PROD-Status = 'B'
This give me an empty response.
With EXISTS:
SELECT p.* FROM PROD p
WHERE EXISTS (
SELECT 1 FROM PROD
WHERE Project = p.Project AND Status = 'B'
)
or with IN:
SELECT * FROM PROD
WHERE Project IN (SELECT Project FROM PROD WHERE Status = 'B')
If you want a solution with JOIN:
SELECT DISTINCT p.*
FROM PROD p JOIN PROD pp
ON pp.Project = p.Project
WHERE pp.Status = 'B'
See the demo.
Results:
> Project | Position | Status
> ------: | -------: | :-----
> 2 | 1 | A
> 2 | 2 | B
> 3 | 1 | B
> 3 | 2 | B
You could try using a join wit the subquery
select * from PROD
INNER JOIN (
select distinct project
from PROD
where status ='B';
) t on t.project = PROD.project
I'm trying to get all the Projects that has at least one Position with Status = B.
No need for a JOIN, just do:
SELECT DISTINCT PROD.Project WHERE PROD.Status='B'
i have a table structures in my database like this:
/*city*/
+----------+------------+
| id | name |
|-----------------------|
| 1 | Gotham |
| 2 | Metropolis |
| 3 | Smallville |
| 4 | Fawcett |
+----------+------------+
/*district*/
+----------+------------+------------+
| id | name | city_id |
|------------------------------------|
| 1 | A | 1 |
| 2 | B | 1 |
| 3 | C | 2 |
| 4 | D | 2 |
| 5 | E | 2 |
| 6 | F | 3 |
| 7 | G | 3 |
| 8 | H | 4 |
+----------+------------+------------+
/*distance*/
+----------+-------------+------------------+-------------------------+---------+
| id | origin_city | city_destination | district_destination | length |
|---------------------------------------------------------------------|---------|
| 1 | 2 | 2 | 1 | 4 |
| 2 | 3 | 3 | 1 | 5 |
| 3 | 1 | 1 | 2 | 6 |
| 4 | 2 | 2 | 3 | 5 |
| 5 | 4 | 4 | 1 | 8 |
| 6 | 4 | 2 | 4 | 9 |
| 7 | 4 | 3 | 5 | 11 |
| 8 | 1 | 4 | 6 | 13 |
+----------+-------------+------------------+-------------------------+---------+
the table district is connected to city table via city_id foreign key, and the distance table is connected to both city and district table, the problem is if in distance table, there are wrong city_destination data that don't match with the district_destination, i need to fix this, but i don't know how to use the update query for this kind of trouble, to show the wrong city_destination data i used this query:
SELECT a.* FROM distance a, district b WHERE a.district_destination = b.id AND a.city_destination != b.city_id
First, ditch the old-school comma syntax for the join operation. Use the JOIN keyword and move the join predicates to an ON clause. Write a SELECT query that returns the existing row to be updated (along with the PK, and the new value to be assigned. (Which looks to be as far as you got.)
Assuming that we want to replace the values in the city_destination column of distance table, and seeing that this this column is functionally dependent on district_destination...
Start with a query that returns the rows to be updated.
SELECT ce.id AS id
, ce.district_destination AS district_destination
, ce.city_destination AS old_city_destination
, ct.city_id AS new_city_destination
FROM distance ce
JOIN district ct
ON ct.id = ce.district_destination
AND NOT ( ct.city_id <=> ce.city_destination )
ORDER BY ce.id
In MySQL, a multi-table update is pretty straightforward. The syntax is documented in the MySQL Reference Manual.
First, we'll write it as a SELECT, using the previous query as an inline view
SELECT t.id
, s.new_city_destination
FROM ( SELECT ce.id AS id
, ce.district_destination AS district_destination
, ce.city_destination AS old_city_destination
, ct.city_id AS new_city_destination
FROM distance ce
JOIN district ct
ON ct.id = ce.district_destination
AND NOT ( ct.city_id <=> ce.city_destination )
ORDER BY ce.id
) s
JOIN distance t
ON t.id = s.id
Then we can convert that to an UPDATE statement. Replace SELECT ... FROM with UPDATE and add a SET clause at the end. (Before the WHERE clause if there was one.)
UPDATE ( SELECT ce.id AS id
, ce.district_destination AS district_destination
, ce.city_destination AS old_city_destination
, ct.city_id AS new_city_destination
FROM distance ce
JOIN district ct
ON ct.id = ce.district_destination
AND NOT ( ct.city_id <=> ce.city_destination )
ORDER BY ce.id
) s
JOIN distance t
ON t.id = s.id
SET t.city_destination = s.new_city_destination
This must be quite easy, but I cannot find a good solution myself.
I have two tables:
file
+----+--------+
| id | system |
+----+--------+
| 1 | AA |
| 2 | AA |
| 3 | BB |
| 4 | AA |
+----+--------+
feature
+----+---------+------+
| id | file_id | name |
+----+---------+------+
| 1 | 1 | A |
| 1 | 2 | A |
| 1 | 2 | B |
| 1 | 3 | B |
| 1 | 3 | C |
| 1 | 4 | A |
| 1 | 4 | B |
| 1 | 4 | C |
+----+---------+------+
and I want to count how many times a feature was added to files with a specific system. For that, I have the following query:
SELECT f.name, COUNT(*) AS nr
FROM dossier d
JOIN feature f
ON f.file_id = d.id
WHERE d.system = 'AA'
AND d.id NOT IN (3157,3168,3192)
GROUP BY f.name
which gives the desired output:
+------+----+
| name | nr |
+------+----+
| A | 3 |
| B | 2 |
| C | 1 |
+------+----+
Now I also want to know the total amount of files with the same specific system. A simple separate query would be:
SELECT COUNT(*) FROM file WHERE system = 'AA' AND id NOT IN (3157,3168,3192)
I've added the extra AND id NOT IN (which is irrelevant for this example) just to show that the actual query is much more complex. If I use a separate query to get the total I would have to duplicate that complexity, so I want to avoid that by returning the total from the same query.
So how can I count the number of files in the first query?
Desired output:
+------+----+-------+
| name | nr | total |
+------+----+-------+
| A | 3 | 3 |
| B | 2 | 3 |
| C | 1 | 3 |
+------+----+-------+
Here is one way using Sub-query
SELECT f.NAME,
Count(*) AS nr,
(SELECT Count(*)
FROM FILE
WHERE system = 'AA'
AND id NOT IN ( 3157, 3168, 3192 )) as Total
FROM dossier d
JOIN feature f
ON f.file_id = d.id
WHERE d.system = 'AA'
AND d.id NOT IN ( 3157, 3168, 3192 )
GROUP BY f.NAME
Or Use CROSS JOIN
SELECT *
FROM (SELECT f.NAME,
Count(*) AS nr,
FROM dossier d
JOIN feature f
ON f.file_id = d.id
WHERE d.system = 'AA'
AND d.id NOT IN ( 3157, 3168, 3192 )
GROUP BY f.NAME) A
CROSS JOIN (SELECT Count(*) AS Total
FROM FILE
WHERE system = 'AA'
AND id NOT IN ( 3157, 3168, 3192 )) B
I have a source table (piece of it):
+--------------------+
| E M P L O Y E E |
+--------------------+
| ID | EQUIPMENT |
+--------------------+
| 1 | tv,car,phone |
| 2 | car,phone |
| 3 | tv,phone |
+----+---------------+
After normalization process I ended with two new tables:
+----------------+
| DICT_EQUIPMENT |
+----------------+
| ID | EQUIPMENT |
+----------------+
| 1 | tv |
| 2 | car |
| 3 | phone |
+----+-----------+
+---------------------+
| SET_EQUIPMENT |
+----+--------+-------+
| ID | SET_ID | EQ_ID |
+----+--------+-------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 2 |
| 5 | 2 | 3 |
| 6 | 3 | 1 |
| 7 | 3 | 3 |
+----+--------+-------+
(the piece/part)
+-----------------+
| E M P L O Y E E |
+-----------------+
| ID | EQ_SET_ID |
+-----------------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
+----+------------+
And now when I want to find correct SET_ID I can write something like this:
SELECT SET_ID
FROM SET_EQUIPMENT S1,
SET_EQUIPMENT S2,
SET_EQUIPMENT S3
WHERE S1.SET_ID = S2.SET_ID
AND S2.SET_ID = S3.SET_ID
AND S1.EQ_ID = 1
AND S2.EQ_ID = 2
AND S3.EQ_ID = 3;
Maybe any ideas for optimize this query? how find the correct set?
First, you should use explicit join syntax for the method you are using:
SELECT S1.SET_ID
FROM SET_EQUIPMENT S1 JOIN
SET_EQUIPMENT S2
ON S1.SET_ID = S2.SET_ID JOIN
SET_EQUIPMENT S3
ON S2.SET_ID = S3.SET_ID
WHERE S1.EQ_ID = 1 AND
S2.EQ_ID = 2 AND
S3.EQ_ID = 3;
Commas in a from clause are quite outdated. (And, this fixes a syntax error in your query.)
An alternative method is to use group by with a having clause:
SELECT S.SET_ID
FROM SET_EQUIPMENT S
GROUP BY S.SET_ID
HAVING SUM(CASE WHEN S.EQ_ID = 1 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN S.EQ_ID = 2 THEN 1 ELSE 0 END) > 0 AND
SUM(CASE WHEN S.EQ_ID = 3 THEN 1 ELSE 0 END) > 0;
Which method works better depends on a number of factors -- for instance, the database engine you are using, the size of the tables, the indexes on the tables. You have to test which method works better on your system.
You've normalised wrongly. Get rid of set_equipment
Change to have three tables: employee, equipment, employee_equipment.
If you're looking for the equipment for a given employee you want to use:
select id, equipment
from equipment eq
inner join employee_equipment ee on eq.id = ee.eq_id
inner join employee emp on emp.id = ee.emp_id
where emp.id = 2