How can I get the intersection of n SELECTS? - mysql

I have these 3 tables: data, tags and data_tag_rel.
data
id data
------------------------------------
1 A string of long data A.
2 A string of long data B.
3 A string of long data C.
4 A string of long data D.
5 A string of long data E.
6 A string of long data F.
7 A string of long data G.
tags
id tag
------------
1 gold
2 silver
3 copper
data_tag_rel
data tag
------------------
1 1
1 2
2 1
3 2
4 3
5 1
5 2
5 3
6 1
7 1
As you can see, there is data and tags, and a relationship table to determine what tags are assigned to what data. Here the data is talking about metals. In this example:
The gold tag has been assigned to 5 data strings.
The silver tag has been assigned to 3 data strings.
The copper tag has been assigned to 2 data strings.
I want to query the database and obtain an INTERSECTION of tags gold, silver and copper. Meaning I want to obtain the table_data that is assigned to all 3 tags. The result would be just 1 row from the data table, row id 5: "A string of long data E."
What query would accomplish this INTERSECTION?
So far I can get the query working querying only 1 tag:
SELECT data.id, data.data
FROM data
INNER JOIN data_tag_rel ON data.id = data_tag_rel.data
INNER JOIN tags ON data_tag_rel.tag = tags.id
WHERE tags.tag = "gold"
Thanks so much!

Aggregation provides one option:
SELECT d.id, d.data
FROM data d
INNER JOIN data_tag_rel dtr ON d.id = dtr.data
INNER JOIN tags t ON dtr.tag = t.id
WHERE t.tag IN ('gold', 'silver', 'copper')
GROUP BY d.id, d.data
HAVING COUNT(DISTINCT t.tag) = 3;
Demo

Speed is going to vary based on your data. Tim's answer is likely to be fast enough for practical purposes, but if you find it is not, you may be able to slightly improve it by not joining data until needed (the other changes here are just stylistic and unlikely to have any effect):
select d.id, d.data
from (
select dtr.data as id
from data_tag_rel dtr
where dtr.tag in (select id from tag where tag in ('gold','silver','copper')
group by dtr.data
having count(tag) = 3
) d_ids
join data d using (id)
If you have a great deal of data, doing separate joins for each tag is likely to be faster, especially if you know which tags are rare and can go from rarest to least rare:
select d.id, d.data
from data_tag_rel dtr1 on dtr
join data_tag_rel dtr2 on dtr2.data=dtr1.data and dtr2.tag=(select id from tag where tag='silver')
join data_tag_rel dtr3 on dtr3.data=dtr2.data and dtr3.tag=(select id from tag where tag='copper')
join data d on d.id=dtr3.data
where dtr1.tag=(select id from tag where tag='gold')
(both queries untested)

Related

How can I query related tags from a relationship table?

I can't find an answer to my problem on the site.
I have 3 tables: data, tags and data_tag_rel.
data
id data
------------------------------------
1 A string of long data A.
2 A string of long data B.
3 A string of long data C.
4 A string of long data D.
5 A string of long data E.
6 A string of long data F.
7 A string of long data G.
8 A string of long data H.
n Etc...
tags
id tag
------------
1 gold
2 silver
3 copper
4 emerald
5 steel
6 ruby
7 carbon
8 zinc
9 mercury
n Etc...
data_tag_rel
data tag
------------------
1 1
1 2
2 1
3 2
4 3
5 1
5 2
5 3
6 1
7 1
8 1
8 2
8 4
8 6
n n
As you can see, there is data and tags, and a relationship table to determine what tags are assigned to what data. Here the data is talking about minerals and rocks.
The query I want is to SELECT the tags (id and name) that are related to a set of more tags in the relationship table, by looking at what data id they target in common.
So for example, imagine I assign a data id 8 to be related to tags 1:"gold", 2:"silver", 4:"ruby" and 6:"emerald" in the relationship table. So now I would like to query common tags. If I query "gold", "silver", I would like to get returned either:
A. "gold", "silver", "ruby" and "emerald" (include the search tags).
or
B. "ruby" and "emerald" (don't include the search tags).
The purpose is to click a tag and see what other tags are related to that clicked tag,by what data they are related to in common, using the relationship table as a guide.
So far I managed to make it work searching for only 1 tag, but I can't make it work for 2, 3 or n tags.
SELECT DISTINCT tags.tag FROM tags, data_tag_rel WHERE tags.id = data_tag_rel.tag AND data_tag_rel.data IN (SELECT data_tag_rel.data FROM data_tag_rel WHERE data_tag_rel.tag IN (SELECT tags.id FROM tags WHERE tags.tag IN ('gold')));
How can I query related tags to a list of 2 or more tags in this database structure?
Thanks so much!
SELECT
data_tag_rel.data
FROM
data_tag_rel
WHERE
data_tag_rel.tag IN ((SELECT id FROM tags WHERE tags.tag IN ('silver', 'gold')))
GROUP BY
data_tag_rel.data
HAVING
COUNT(*) = 2
If it's possible for a data item to have the same tag multiple times, that changes slightly...
HAVING
COUNT(DISTINCT data_tag_rel.tag) = 2
The nasty nested IN() can also be replaced if you feel like it...
FROM
data_tag_rel
INNER JOIN
tags
ON tags.id = data_tag_rel.tag
WHERE
tags.tag IN ('silver', 'gold')
The crux of it is; filter normally, then use the HAVING clause to make sure you have two mathces, not just one.
EDIT:
To be explicit about how to get from there to the associated tags, just join back on to the data_tag_rel table...
SELECT DISTINCT
tags.tag
FROM
(
SELECT
data_tag_rel.data AS id
FROM
data_tag_rel
WHERE
data_tag_rel.tag IN ((SELECT id FROM tags WHERE tags.tag IN ('silver', 'gold')))
GROUP BY
data_tag_rel.data
HAVING
COUNT(*) = 2
)
AS data
INNER JOIN
data_tag_rel
ON data_tag_rel.data = data.id
INNER JOIN
tags
ON tags.id = data_tag_rel.tag
I think the logic you want is:
select t.*
from tags t
where exists (
select 1
from data_tag_rel dtr
inner join data_tag_rel dtr1 on dtr1.data = dtr.data
inner join tags t1 on t1.id = dtr1.tag
where t1.tag in ('gold', 'silver') and dtr.tag = t.id
)

How to retrieve values from normalized table in one line per dataset?

I have stored data into several MySQL 5.x tables in order to normalize, now I am struggling on how to retrieve this data in one line per dataset.
E.g.
Table 1: articles, holding also 2 values in this example per article
article_id | make | model
1 Audi A3
Table 2: article_attributes, where one article can have several attributes
article_id | attr_id
1 1
1 2
2 1
Table 3: article_attribute_names
attr_id | name
1 Turbo
2 Airbag
Now I want to retrieve it, with one line per dataset
e.g.
SELECT a.*, attr_n.name AS function
FROM `articles` a
LEFT JOIN article_attributes AS attr ON a.article_id = attr.article_id
LEFT JOIN article_attribute_names AS attr_n ON attr_n.attr_id = attr.attr_id
-- group by attr.article_id
This will gives me:
article_id | Make | Model | function
1 Audi A3 Turbo
1 Audi A3 Airbag
But I am looking for something like this:
article_id | Make | Model | function1 | function2
1 Audi A3 Turbo Airbag
Is this even possible, and if yes, how?
The simplest method is to put the values into a delimited field using group_concat():
SELECT a.*, GROUP_CONCAT(an.name) AS functions
FROM articles a LEFT JOIN
article_attributes aa
ON a.article_id = aa.article_id LEFT JOIN
article_attribute_names aan
ON aan.attr_id = aa.attr_id
GROUP BY a.article_id;
Aggregating by article_id is okay, assuming that the id is unique (or equivalently declared as a primary key).
If you actually want the results in separate columns, that is more challenging. If you know there are at most two (as in your example), just use aggregation:
SELECT a.*, MIN(an.name) AS function1,
(CASE WHEN MIN(an.name) <> MAX(an.name)
THEN MAX(an.name)
END) as function2
FROM articles a LEFT JOIN
article_attributes aa
ON a.article_id = aa.article_id LEFT JOIN
article_attribute_names aan
ON aan.attr_id = aa.attr_id
GROUP BY a.article_id;

How to query this table?

I have the following two tables.
nodes
attributes
nodes
id title
1 test
2 test2
attributes
id node_id title value
1 1 featured 1
2 1 age 13
3 2 featured 2
I would like query nodes with attribute title 'featured' along with its all attributes.
I tried to join, but I don't know how to query other attributes at the same time.
Is it possible to make a single query to do this?
You could use a subquery to get the ID's of all nodes with the attribute 'featured'. The outer query would be the JOIN to get the rest of the attributes.
Like:
SELECT n.*, a.*
FROM nodes n JOIN attributes a ON a.node_id=n.id
WHERE n.id IN
(SELECT DISTINCT no.id
FROM nodes no JOIN attributes at ON at.node_id=no.id AND at.title='featured')
I think this is a simple join
SELECT b.title as NodeTitle, a.title, a.value
FROM attributes a
INNER JOIN nodes b
ON a.node_id = b.id

MySQL selecting rows with a max id and matching other conditions

Using the tables below as an example and the listed query as a base query, I want to add a way to select only rows with a max id! Without having to do a second query!
TABLE VEHICLES
id vehicleName
----- --------
1 cool car
2 cool car
3 cool bus
4 cool bus
5 cool bus
6 car
7 truck
8 motorcycle
9 scooter
10 scooter
11 bus
TABLE VEHICLE NAMES
nameId vehicleName
------ -------
1 cool car
2 cool bus
3 car
4 truck
5 motorcycle
6 scooter
7 bus
TABLE VEHICLE ATTRIBUTES
nameId attribute
------ ---------
1 FAST
1 SMALL
1 SHINY
2 BIG
2 SLOW
3 EXPENSIVE
4 SHINY
5 FAST
5 SMALL
6 SHINY
6 SMALL
7 SMALL
And the base query:
select a.*
from vehicle a
join vehicle_names b using(vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
group
by a.id
having count(distinct c.attribute) = 2;
So what I want to achieve is to select rows with certain attributes, that match a name but only one entry for each name that matches where the id is the highest!
So a working solution in this example would return the below rows:
id vehicleName
----- --------
2 cool car
10 scooter
if it was using some sort of max on the id
at the moment I get all the entries for cool car and scooter.
My real world database follows a similar structure and has 10's of thousands of entries in it so a query like above could easily return 3000+ results. I limit the results to 100 rows to keep execution time low as the results are used in a search on my site. The reason I have repeats of "vehicles" with the same name but only a different ID is that new models are constantly added but I keep the older one around for those that want to dig them up! But on a search by car name I don't want to return the older cards just the newest one which is the one with the highest ID!
The correct answer would adapt the query I provided above that I'm currently using and have it only return rows where the name matches but has the highest id!
If this isn't possible, suggestions on how I can achieve what I want without massively increasing the execution time of a search would be appreciated!
If you want to keep your logic, here what I would do:
select a.*
from vehicle a
left join vehicle a2 on (a.vehicleName = a2.vehicleName and a.id < a2.id)
join vehicle_names b on (a.vehicleName = b.vehicleName)
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
and a.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct c.attribute) = 2;
Which yield:
+----+-------------+
| id | vehicleName |
+----+-------------+
| 2 | cool car |
| 10 | scooter |
+----+-------------+
2 rows in set (0.00 sec)
As other said, normalization could be done on few levels:
Keeping your current vehicle_names table as the primary lookup table, I would change:
update vehicle a
inner join vehicle_names b using (vehicleName)
set a.vehicleName = b.nameId;
alter table vehicle change column vehicleName nameId int;
create table attribs (
attribId int auto_increment primary key,
attribute varchar(20),
unique key attribute (attribute)
);
insert into attribs (attribute)
select distinct attribute from vehicle_attribs;
update vehicle_attribs a
inner join attribs b using (attribute)
set a.attribute=b.attribId;
alter table vehicle_attribs change column attribute attribId int;
Which led to the following query:
select a.id, b.vehicleName
from vehicle a
left join vehicle a2 on (a.nameId = a2.nameId and a.id < a2.id)
join vehicle_names b on (a.nameId = b.nameId)
join vehicle_attribs c on (a.nameId=c.nameId)
inner join attribs d using (attribId)
where d.attribute in ('SMALL', 'SHINY')
and b.vehicleName like '%coo%'
and a2.id is null
group by a.id
having count(distinct d.attribute) = 2;
The table does not seems normalized, however this facilitate you to do this :
select max(id), vehicleName
from VEHICLES
group by vehicleName
having count(*)>=2;
I'm not sure I completely understand your model, but the following query satisfies your requirements as they stand. The first sub query finds the latest version of the vehicle. The second query satisfies your "and" condition. Then I just join the queries on vehiclename (which is the key?).
select a.id
,a.vehiclename
from (select a.vehicleName, max(id) as id
from vehicle a
where vehicleName like '%coo%'
group by vehicleName
) as a
join (select b.vehiclename
from vehicle_names b
join vehicle_attribs c using(nameId)
where c.attribute in('SMALL', 'SHINY')
group by b.vehiclename
having count(distinct c.attribute) = 2
) as b on (a.vehicleName = b.vehicleName);
If this "latest vehicle" logic is something you will need to do a lot, a small suggestion would be to create a view (see below) which returns the latest version of each vehicle. Then you could use the view instead of the find-max-query. Note that this is purely for ease-of-use, it offers no performance benefits.
select *
from vehicle a
where id = (select max(b.id)
from vehicle b
where a.vehiclename = b.vehiclename);
Without going into proper redesign of you model you could
1) Add a column IsLatest that your application could manage.
This is not perfect but will satisfy you question (until next problem, see not at the end)
All you need is when you add a new entry to issue queries such as
UPDATE a
SET IsLatest = 0
WHERE IsLatest = 1
INSERT new a
UPDATE a
SET IsLatest = 1
WHERE nameId = #last_inserted_id
in a transaction or a trigger
2) Alternatively you can find out the max_id before you issue your query
SELECT MAX(nameId)
FROM a
WHERE vehicleName = #name
3) You can do it in single SQL, and providing indexes on (vehicleName, nameId) it should actually have decent speed with
select a.*
from vehicle a
join vehicle_names b ON a.vehicleName = b.vehicleName
join vehicle_attribs c ON b.nameId = c.nameId AND c.attribute = 'SMALL'
join vehicle_attribs d ON b.nameId = c.nameId AND d.attribute = 'SHINY'
join vehicle notmax ON a.vehicleName = b.vehicleName AND a.nameid < notmax.nameid
where a.vehicleName like '%coo%'
AND notmax.id IS NULL
I have removed your GROUP BY and HAVING and replaced it with another join (assuming that only single attribute per nameId is possible).
I have also used one of the ways to find max per group and that is to join a table on itself and filter out a row for which there are no records that have a bigger id for a same name.
There are other ways, search so for 'max per group sql'. Also see here, though not complete.

Merging MySQL row entries into a single row

I've got two tables, one for listings and another representing a list of tags for the listings table.
In the listings table the tag ids are stored in a field called tags as 1-2-3-. This has worked out very well for me (regular expressions and joins to separate and display the data), but I now need to pull the titles of those tags into a single row. See below.
listings table
id tags
1 1-2-3-
2 4-5-6-
tags table
id title
1 pig
2 dog
3 cat
4 mouse
5 elephant
6 duck
And what I need to produce out of the listings table is:
id tags
2 mouse, elephant, duck
Here is a query that could help. But since it is doing some string operations, it may not be as good as a regular join:
select l.id, group_concat( t.title )
from listings l, tags t
where concat( '-', l.tags ) like concat( '%-', t.id, '-%' ) group by l.id ;
Unfortunately, with your tags stored in this denormalized format, there's no easy way to go from 1-2-3 to the corresponding tags. In other words, there's no simple way to split out the ids, join to another table and then recombine. Your best option would be to create a listing_tag table with two columns
listing_id tag_id
1 1
1 2
1 3
2 4
2 5
2 6
and then it's just a simple join:
SELECT listing_id, GROUP_CONCAT(title SEPARATOR ', ') FROM listing_tag JOIN tags ON tags.id = tag_id
GROUP_CONCAT() + INNER JOIN + GROUP BY