I would like to query a relational database if a set of items exists.
The data I am modeling are of the following form:
key1 = [ item1, item3, item5 ]
key2 = [ item2, item7 ]
key3 = [ item2, item3, item4, item5 ]
...
I am storing them in a table with the following schema
CREATE TABLE sets (key INTEGER, item INTEGER);
So for example, the following insert statements would insert the above three sets.
INSERT INTO sets VALUES ( key1, item1 );
INSERT INTO sets VALUES ( key1, item3 );
INSERT INTO sets VALUES ( key1, item5 );
INSERT INTO sets VALUES ( key2, item2 );
INSERT INTO sets VALUES ( key2, item7 );
INSERT INTO sets VALUES ( key3, item2 );
INSERT INTO sets VALUES ( key3, item3 );
INSERT INTO sets VALUES ( key3, item4 );
INSERT INTO sets VALUES ( key3, item5 );
Given a set of items, I would like the key associated with the set if it is stored in the table and NULL if it is not. Is it possible to do this with an sql query? If so, please provide details.
Details that may be relevant:
I am primarily interested in the database design / query strategy, though I will eventually implement this in MySQL and preform the query from with in python using the mysql-python package.
I have the freedom to restructure the database schema if a different layout would be more convenient for this type of query.
Each set, if it exists is supposed to be unique.
I am not interested in partial matches.
The database scale is on the order of < 1000 sets each of which contains < 10 items each, so performance at this point is not a priority.
Thanks in advance.
I won't comment on whether there is a better suited schema for doing this (it's quite possible), but for a schema having columns name and item, the following query should work. (mysql syntax)
SELECT k.name
FROM (SELECT DISTINCT name FROM sets) AS k
INNER JOIN sets i1 ON (k.name = i1.name AND i1.item = 1)
INNER JOIN sets i2 ON (k.name = i2.name AND i2.item = 3)
INNER JOIN sets i3 ON (k.name = i3.name AND i3.item = 5)
LEFT JOIN sets ix ON (k.name = ix.name AND ix.item NOT IN (1, 3, 5))
WHERE ix.name IS NULL;
The idea is that we have all the set keys in k, which we then join with the set item data in sets once for each set item in the set we are searching for, three in this case. Each of the three inner joins with table aliases i1, i2 and i3 filter out all set names that don't contain the item searched for with that join. Finally, we have a left join with sets with table alias ix, which brings in all the extra items in the set, that is, every item we were not searching for. ix.name is NULL in the case that no extra items are found, which is exactly what we want, thus the WHERE clause. The query returns a row containing the set key if the set is found, no rows otherwise.
Edit: The idea behind collapsars answer seems to be much better than mine, so here's a bit shorter version of that with explanation.
SELECT sets.name
FROM sets
LEFT JOIN (
SELECT DISTINCT name
FROM sets
WHERE item NOT IN (1, 3, 5)
) s1
ON (sets.name = s1.name)
WHERE s1.name IS NULL
GROUP BY sets.name
HAVING COUNT(sets.item) = 3;
The idea here is that subquery s1 selects the keys of all sets that contain items other that the ones we are looking for. Thus, when we left join sets with s1, s1.name is NULL when the set only contains items we are searching for. We then group by set key and filter out any sets having the wrong number of items. We are then left with only sets which contain only items we are searching for and are of the correct length. Since sets can only contain an item once, there can only be one set satisfying that criteria, and that's the one we're looking for.
Edit: It just dawned on me how to do this without the exclusion.
SELECT totals.name
FROM (
SELECT name, COUNT(*) count
FROM sets
GROUP BY name
) totals
INNER JOIN (
SELECT name, COUNT(*) count
FROM sets
WHERE item IN (1, 3, 5)
GROUP BY name
) matches
ON (totals.name = matches.name)
WHERE totals.count = 3 AND matches.count = 3;
The first subquery finds the total count of items in each set and the second one finds out the count of matching items in each set. When matches.count is 3, the set has all the items we're looking for, and if totals.count is also 3, the set doesn't have any extra items.
aleksis solution requires an specific query for every posssible item set. the following suggestion provides a generic solution in the sense that the item set to be queried can be factored in as a result set of another query - just replace the set containment operators by a suitable subquery.
SELECT CASE COUNT(ddd.key) WHEN 0 THEN NULL ELSE MIN(ddd.key) END
FROM (
SELECT s4.key
, COUNT(*) icount
FROM sets s4
JOIN (
SELECT DISTINCT d.key
FROM (
SELECT s1.key
FROM sets s1
WHERE s1.item IN ('item1', 'item3', 'item5')
MINUS
SELECT s2.key
FROM sets s2
WHERE s2.item NOT IN ('item1', 'item3', 'item5')
) d
) dd ON ( dd.key = s4.key )
GROUP BY s4.key
) ddd
WHERE ddd.icount = (
SELECT COUNT(*)
FROM (
SELECT DISTINCT s3.item
FROM sets s3
WHERE s3.item IN ('item1', 'item3', 'item5')
)
)
;
the result set dd delivers a candidate set of keys who do not asscociate with other items than those from the set to be tested. the only ambiguity may arise from keys who reference a proper subset of the tested item set. thus we count the number of items associated with the keys of dd and choose that key where this number matches the cardinality of the tested item set. if such a key exists it is unique (as we know that the item sets are unique).
the case expression in the outermost select is just a fancy way to guarantee that their will be no empty result set, i.e. a null value will be returned if the item set is not represented by the relation.
maybe this solution will be useful to you,
best regards
carsten
This query has a well known name. Google "relational division", "set containment join", "set equality join".
To simplify collapsar's solution, which was already simplified by Aleksi Torhamo:
It isn't necessary to get all keys that DO NOT MATCH, which could be large, just get the ones that do match and call them partial matches.
-- get all partial matches
CREATE TEMPORARY VIEW partial_matches AS
SELECT DISTINCT key FROM sets WHERE item IN (1,3,5);
-- filter for full matches
SELECT sets.key
FROM sets, partial_matches
WHERE sets.key = partial_matches.key
GROUP BY sets.key HAVING COUNT(sets.key) = 3;
Related
I have tables
CREATE TABLE one (
op INT,
value INT
);
and
CREATE TABLE two (
tp INT,
value INT
);
Now I want to get all op values for which the set of values for the op contains all values for a given tp.
I would write this as:
SELECT op FROM one AS o1 WHERE (
(SELECT value FROM one AS o2 WHERE o1.op = o2.op)
CONTAINS ALL
(SELECT value FROM two WHERE tp=<specific-value>)
)
Unfortunately, I couldn't find such a CONTAINS ALL operator and nothing which would be close that.
Table one contains 50M entries, table two contains 1M entries. On average, there are 20 different values for a single op and tp.
Consider your tables name ops and tps.
SELECT
ops.op
FROM ops
INNER JOIN tps ON tps.value = ops.value
WHERE tps.tp = 1
GROUP BY ops.op
HAVING COUNT(DISTINCT ops.value) = (SELECT COUNT(DISTINCT tps.value) FROM tps WHERE tps.tp = 1); --- You can replace 1 with any tp value.
In my query I use join table category_attributes. Let's assume we have such rows:
category_id|attribute_id
1|1
1|2
1|3
I want to have the query which suites the two following needs. I have a variable (php) of allowed attribute_id's. If the array is subset of attribute_id then category_id should be selected, if not - no results.
First case:
select * from category_attributes where (1,2,3,4) in category_attributes.attribute_id
should give no results.
Second case
select * from category_attributes where (1,2,3) in category_attributes.attribute_id
should give all three rows (see dummy rows at the beginning).
So I would like to have reverse side of what standard SQL in does.
Solution
Step 1: Group the data by the field you want to check.
Step 2: Left join the list of required values with the records obtained in the previous step.
Step 3: Now we have a list with required values and corresponding values from the table. The second column will be equal to required value if it exist in the table and NULL otherwise.
Count null values in the right column. If it is equal to 0, then it means table contains all the required values. In that case return all records from the table. Otherwise there must be at least one required value is missing in the table. So, return no records.
Sample
Table "Data":
Required values:
10, 20, 50
Query:
SELECT *
FROM Data
WHERE (SELECT Count(*)
FROM (SELECT D.value
FROM (SELECT 10 AS value
UNION
SELECT 20 AS value
UNION
SELECT 50 AS value) T
LEFT JOIN (SELECT value
FROM Data
GROUP BY value) D
ON ( T.value = D.value )) J
WHERE value IS NULL) = 0;
You can use group by and having:
select ca.category_id
from category_attributes ca
where ca.attribute_id in (1, 2, 3, 4)
group by ca.category_id
having count(*) = 4; -- "4" is the size of the list
This assumes that the table has no duplicates (which is typical for attribute mapping tables). If that is a possibility, use:
having count(distinct ca.attribute_id) = 4
You can aggregate attribute_id into array and compare two array from php.
SELECT category_id FROM
(select category_id, group_concat(attribute_id) as attributes from category_attributes
order by attribute_id) t WHERE t.attributes = (1, 2, 3);
But you need to find another way to compare arrays or make sure that array is always sorted.
Please take a look at the following table:
I am building a search engine which returns card_id values, based on search of category_id and value_id values.
To better explain the search mechanism, imagine that we are trying to find a car (card_id) by supplying information what part (value_id) the car should has in every category (category_id).
In example, we may want to find a car (card_id), where category "Fuel Type" (category_id) has a value "Diesel" (value_id), and category "Gearbox" (category_id) has a value "Manual" (value_id).
My problem is that my knowledge is not sufficient to build a query, which will returns card_ids which contains more than one pair of category_id and value_id.
For example, if I want to search a car with diesel engine, I could build a query like this:
SELECT card_id FROM cars WHERE category_id=1 AND value_id=2
where category_id = 1 is a category "Fuel Type" and value_id = 2 is "Diesel".
My question is, how can I build a query, which will look for more category-value pairs? For example, I want to look for diesel cars with manual gearbox.
Any help will be very appreciated. Thank you in advance.
You can do this using aggregation and a having clause:
SELECT card_id
FROM cars
GROUP BY card_id
HAVING SUM(category_id = 1 AND value_id = 2) > 0 AND
SUM(category_id = 3 and value_id = 43) > 0;
Each condition in the having clause counts the number of rows that match a given condition. You can add as many conditions as you like. The first, for instance, says that there is at least one row where the category is 1 and the value is 2.
SQL Fiddle
Another approach is to create a user defined function that takes a table of attribute/value pairs and returns a table of matching cars. This has the advantage of allowing an arbitrary number of attribute/value pairs without resorting to dynamic SQL.
--Declare a "sample" table for proof of concept, replace this with your real data table
DECLARE #T TABLE(PID int, Attr Int, Val int)
--Populate the data table
INSERT INTO #T(PID , Attr , Val) VALUES (1,1,1), (1,3,5),(1,7,9),(2,1,2),(2,3,5),(2,7,9),(3,1,1),(3,3,5), (3,7,9)
--Declare this as a User Defined Table Type, the function would take this as an input
DECLARE #C TABLE(Attr Int, Val int)
--This would be populated by the code that calls the function
INSERT INTO #C (Attr , Val) VALUES (1,1),(7,9)
--The function (or stored procedure) body begins here
--Get a list of IDs for which there is not a requested attribute that doesn't have a matching value for that ID
SELECT DISTINCT PID
FROM #T as T
WHERE NOT EXISTS (SELECT C.ATTR FROM #C as C
WHERE NOT EXISTS (SELECT * FROM #T as I
WHERE I.Attr = C.Attr and I.Val = C.Val and I.PID = T.PID ))
SELECT quantity, materialTypeId ,
(SELECT typeName
FROM invTypes
WHERE TypeID IN (SELECT materialTypeId
FROM invTypeMaterials
WHERE typeId= 12743
)
) AS material
FROM invTypeMaterials
WHERE TypeID=12743
so this query gives me nice results except the column material. only shows me the first entry instead of giving the name of each row.
if i run these sql seperate they work and i do see what i want. i just need them combined into 2 columns.
what i want to do is, i query one table for data, one of the column has a value wich i want to convert to a name, and that is in another table and its linked by a unique TypeID
Chilly
May be this will work :
SELECT tm.quantity, tm.materialTypeId , t.typeName
FROM invTypeMaterials tm
INNER JOIN invTypes t ON t.TypeID = tm.materialTypeId
WHERE tm.TypeID=12743
If you want to lookup the materialTypeID's name for the current record, you must not use a separate subquery but use the materialTypeID value from the outer query.
This is called a correlated subquery:
SELECT quantity, materialTypeId,
(SELECT typeName
FROM invTypes
WHERE TypeID = invTypeMaterials.materialTypeId
) AS material
FROM invTypeMaterials
WHERE TypeID=12743
Let's say I have a table called references, which has two fields: an id and a reference field.
I want to create a query that will provide me with a reference number based upon an id. Like this:
SELECT reference
FROM references
WHERE id = x
(where x is some integer)
However if the id is not found in the table I would like the query to show -1 instead of NULL.
How can I do this?
SELECT COALESCE(reference, -1) FROM references WHERE id = x
doesn't work
Here are a few approaches:
SELECT COALESCE(MAX(reference), -1)
FROM references
WHERE id = ...
;
SELECT COALESCE(reference, -1)
FROM references
RIGHT
OUTER
JOIN (SELECT 1 c) t
ON id = ...
;
SELECT COALESCE
( ( SELECT reference
FROM references
WHERE id = ...
),
-1
)
;
(I'd go with the first one, personally, but all three work.)
if subset has cardinality 0 (elements with id = 2), there is nothing to compare, there's certainity that such (id = 2) element doesn't exist. in the other hand, if you want to find, let's say, maximum element in that empty subset, you will get unknown value (every member of superset will be a supremum and infimum of the empty set)
i'm not sure if it's correct, but imho, quite logical