Querying for multiple many-to-many associates in MySQL - mysql

Background
My program is storing a series of objects, a set of tags, and the many-to-many associations between tags and objects, in a MySQL database. To give you an idea of the structure:
CREATE TABLE objects (
object_id INT PRIMARY KEY,
...
);
CREATE TABLE tags (
tag_name VARCHAR(32) NOT NULL
);
CREATE TABLE object_tags (
object_id INT NOT NULL,
tag_name VARCHAR(32) NOT NULL,
PRIMARY KEY (object_id, tag_name)
);
Problem
I want to be able to query for all objects that are tagged with all of the tags in a given set. As an example, let's say I have a live tree, a dead flower, an orangutan, and a ship as my objects, and I want to query all of those tagged living and plant. I expect to receive a list containing only the tree, assuming the tags match the characteristics of the objects.
Current Solution
Presently, given a list of tags T1, T2, ..., Tn, I am solving the problem as follows:
Select all object_id columns from the object_tags table where tag_name is T1.
Join the result of (1) with the object_tags table, and select all object_id columns where tag_name is T2.
Join the result of (2) with the object_tags table again, and select all object_id columns where tag_name is T3.
Repeat as necessary for T4, ..., Tn.
Join the result of (4) with the objects table and select the additional columns of the objects that are needed.
In practice (using Java), I start with the query string for the first tag, then prepend/append the string parts for the second tag, and so on in a loop, before finally prepending/appending the string parts that make up the overall query. Only then does the string actually get passed into a PreparedStatement and get executed on the server.
Edit: Expanding on my example from above, using this solution I would issue the following query:
SELECT object_id FROM object_tags JOIN (
SELECT object_id FROM object_tags WHERE tag_name='living'
) AS _temp USING (object_id) WHERE tag_name='plant';
Question
Is there a better solution to this problem? Although the number of tags is not likely to be large, I am concerned about the performance of this solution, especially as the database grows in size. Furthermore, it is very difficult to read and maintain the code, especially when the additional concerns/constraints of the application are thrown in.
I am open to suggestions at any level, although the languages (MySQL and Java) are not variables at this point.

I don't know about the performance of this solution, but you can simplify by using pattern matching in MySql to match a set of pipe-delimited tags (or any delimiter). This is a solution I've used before for similar applications with tag tables (#match would be a variable passed in by your Java code, I've harded coded a value for demonstration):
set #match = 'living|plant';
set #numtags =
length(#match) - length(replace(#match, '|', '')) + 1;
select * from objects o
where #numtags =
(
select count(*) from object_tags ot
where concat('|',#match,'|')
like concat('%|',ot.tag_name,'|%')
and ot.object_id = o.object_id
)
Here is a working demo: http://sqlize.com/0vP6DgQh0j

Related

Just what exactly is the performance loss in adding a table that gets joined on every request?

I'm working on an application that previously had unique handles for users only--but now we want to have handles for events, groups, places... etc. Unique string identifiers for many different first class objects. I understand the thing to do is adopt something like the Party Model, where every entity has its own unique partyId and handle. That said, that means on pretty much every data-fetching query, we're adding a join to get that handle! Certainly for every user.
So just what is the performance loss here? For a table with just three or four columns, is a join like this negligible? Or is there a better way of going about this?
Example Table Structure:
Party
int id
int party_type_id
varchar(256) handle
Events
int id
int party_id
varchar(256) name
varchar(256) time
int place_id
Users
int id
int party_id
varchar(256) first_name
varchar(256) last_name
Places
int id
int party_id
varchar(256) name
-- EDIT --
I'm getting a bad rating on this question, and I'm not sure I understand why. In PLAIN TERMS, I'm asking,
If I have three first class objects that must all share a UNIQUE HANDLE property, unique across all three objects, does adding an additional table that must be joined with on almost any request incur a significant performance hit? Is there a better way of accomplishing this in a relational database like MySQL?
-- EDIT: Proposed Queries --
Getting one user
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle='foo'
Searching users
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
Searching all parties... I guess I'm not sure how to do this in one query. Would you have to select all Parties matching the handle and then get the individual objects in separate queries? E.g.
db.makeQuery(SELECT * FROM Party p WHERE p.handle LIKE '%foo%')
.then(function (results) {
// iterate through results and assemble lists of matching parties by type, then get those objects in separate queries
})
This last example is what I'm most concerned about I think. Is this a reasonable design?
The queries you show should be blazingly fast on any modern implementation, and should scale to tens or hundreds of thousands of millions of records without too much trouble.
Relational Database Management Systems (of which MySQL is one) are designed explicitly for this scenario.
In fact, the slow part of your second query:
SELECT * FROM Users u LEFT JOIN Party p ON u.party_id = p.id WHERE p.handle LIKE '%foo%'
is going to be WHERE p.handle LIKE '%foo%' as this will not be able to use an index. Once you have a large table, this part of the query will be many times slower than the join.

Sql: choose all baskets containing a set of particular items

Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.
Sql schema to represent it is as following:
tbl_basket
- basketId
tbl_item
- itemId
tbl_basket_item
- pkId
- basketId
- itemId
Question: how to select all baskets containing a particular set of items?
UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.
UPDATE B. Have implemented following solution, including SQL generation in PHP:
SELECT basketId
FROM tbl_basket
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 1 ) AS t0 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)
where number of JOINs equals to number of items.
That works good unless some of the items are included in almost every basket. Then performance drops dramatically.
UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:
apply post-filtering in PHP
or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time
UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:
(question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
(solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.
I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.
If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.
-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
( basketid INTEGER NOT NULL REFERENCES basket (basketid)
, itemid INTEGER NOT NULL REFERENCES item (itemid)
, PRIMARY KEY (basketid, itemid)
);
-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
SELECT * FROM item i
WHERE i.itemid IN (1,15,488)
AND NOT EXISTS (
SELECT * FROM basket_item bi
WHERE bi.basketid = b.basketid
AND bi.itemid = i.itemid
)
);
If you are going to provide the list of items, then edit id1, id2, etc. in below query:
select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)
will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.
The simplest solution is to use HAVING clause.
SELECT basketId
FROM tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId

MYSQL join tables based on column data and table name

I'm wondering if this its even posible.
I want to join 2 tables based on the data of table 1.
Example table 1 has column food with its data beeing "hotdog".
And I have a table called hotdog.
IS it possible to do a JOIN like.
SELECT * FROM table1 t join t.food on id = foodid
I know it doesnt work but, its even posible, is there a work arround?.
Thanks in advance.
No, you can't join to a different table per row in table1, not even with dynamic SQL as #Cade Roux suggests.
You could join to the hotdog table for rows where food is 'hotdog' and join to other tables for other specific values of food.
SELECT * FROM table1 JOIN hotdog ON id = foodid WHERE food = 'hotdog'
UNION
SELECT * FROM table1 JOIN apples ON id = foodid WHERE food = 'apples'
UNION
SELECT * FROM table1 JOIN soups ON id = foodid WHERE food = 'soup'
UNION
...
This requires that you know all the distinct values of food, and that all the respective food tables have compatible columns so you can UNION them together.
What you're doing is called polymorphic associations. That is, the foreign key in table1 references rows in multiple "parent" tables, depending on the value in another column of table1. This is a common design mistake of relational database programmers.
For alternative solutions, see my answers to:
Possible to do a MySQL foreign key to one of two possible tables?
Why can you not have a foreign key in a polymorphic association?
I also cover solutions for polymorphic associations in my presentation Practical Object Oriented Models In SQL, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Only with dynamic SQL. It is also possible to left join many different tables and use CASE based on type, but the tables would be all have to be known in advance.
It would be easier to recommend an appropriate design if we knew more about what you are trying to achieve, what your design currently looks like and why you've chosen that particular table design in the first place.
-- Say you have a table of foods:
id INT
foodtype VARCHAR(50) (right now it just contains 'hotdog' or 'hamburger')
name VARCHAR(50)
-- Then hotdogs:
id INT
length INT
width INT
-- Then hamburgers:
id INT
radius INT
thickness INT
Normally I would recommend some system for constraining only one auxiliary table to exist, but for simplicity, I'm leaving that out.
SELECT f.*, hd.length, hd.width, hb.radius, hb.thickness
FROM foods f
LEFT JOIN hotdogs hd
ON hd.id = f.id
AND f.foodtype = 'hotdog'
LEFT JOIN hamburgers hb
ON hb.id = f.id
AND f.foodtype = 'hamburger'
Now you will see that such a thing can be code generated (or even for a very slow prototype dynamic SQL on the fly) from SELECT DISTINCT foodtype FROM foods given certain assumptions about table names and access to the table metadata.
The problem is that ultimately whoever consumes the result of this query will have to be aware of new columns showing up whenever a new table is added.
So the question moves back to your client/consumer of the data - how is it going to handle the different types? And what does it mean for different types to be in the same set? And if it needs to be aware of the different types, what's the drawback of just writing different queries for each type or changing a manual query when new types are added given the relative impact of such a change anyway?

Matching phrases within delimiters in MySQL

I am looking to perform an exact match on a phrase within specified delimiters in MySQL. I have the following data in a full text index field.
,garden furniture,patio heaters,best offers,best deals,
I am performing the following query which is returning the aforementioned record.
SELECT id, tags
FROM Store
WHERE MATCH(tags) AGAINST(',garden,' IN BOOLEAN MODE)
I only want to return records which contain the value: ,garden, not ,garden furniture, or ,country garden, etc.
It is currently performing a greedy match and ignoring the comma delimiters specified in the query. I have attempted to escape the commas to force them to be included in the query, but this does not work.
Is is possible to specify non-alphanumeric delimiters as part of the match? I want to be able to perform an exact match, like a regular expression i.e '/,garden,/'.
From the docs:
Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents of the <ctype><map> array in one of the character set XML files to specify that ',' is a “letter.” Then use the given character set for your FULLTEXT indexes. For information about the <ctype><map> array format, see Section 9.3.1, “Character Definition Arrays”.
An other option is to add a new collation.
Either way, you'll have to rebuild the index:
REPAIR TABLE Store QUICK;
Only match against can use an index on your search.
However if your table if not too big, you can use:
SELECT id, tags
FROM Store
WHERE tags LIKE "garden" OR tags LIKE "garden,%" OR tags LIKE "%, garden,%"
There are other options (find_in_set), but I really don't want to go into those, because they perform even worse than the above SQL.
The real problem, never use CSV in a database!
Use CSV in a database is a really really bad idea, because
• It is wasteful, your data is not normalized
• You cannot join on a CSV field
• You cannot use indexes on a CSV field
• Full-text indexes does not play nice with separators (as you've seen)
The answer to create 2 extra tables.
Table tag (innoDB)
----------
id integer primary key auto_increment
tag varchar(50) //one tag per row!
Table tag_link (innoDB)
--------------
store_id integer foreign key references store(id)
tag_id integer foreign key references tag(id)
primary key = (store_id + tag_id) //composite PK
Now you can easily do all sorts of queries on tags.
SELECT s.id, GROUP_CONCAT(t2.tag) FROM store s
INNER JOIN tag_link tl1 ON (s.id = tl1.store_id)
INNER JOIN tag t1 ON (t1.id = tl1.tag_id)
INNER JOIN tag_link tl2 ON (s.id = tl2.store_id)
INNER JOIN tag t2 ON (t2.id = tl2.tag_id)
WHERE t1.tag = 'garden'
GROUP BY s.id
This will select one tag named garden (using t1 and tl1), find all stores linked to that tag and then get all tags linked to those stores (using t2 and tl2).
Very fast and very flexible.

MYSQL - Help with a more complicated Query

I have two tables:
tbl_lists and tbl_houses
Inside tbl_lists I have a field called HousesList - it contains the ID's for several houses in the following format:
1# 2# 4# 51# 3#
I need to be able to select the mysql fields from tbl_houses WHERE ID = any of those ID's in the list.
More specifically, I need to SELECT SUM(tbl_houses.HouseValue) WHERE tbl_houses.ID IN tbl_lists.HousesList -- and I want to do this select to return the SUM for several rows in tbl_lists.
Anyone can help ?
I'm thinking of how I can do this in a SINGLE query since I don't want to do any mysql loops (within PHP).
If your schema is really fixed, I'd do two queries:
SELECT HousesList FROM tbl_lists WHERE ... (your conditions)
In PHP, split the lists and create one array $houseIDs of IDs. Then run a second query:
SELECT SUM(HouseValue) FROM tbl-Houses WHERE ID IN (.join(", ", $houseIDs).)
I still suggest changing the schema into something like this:
CREATE TABLE tbl_lists (listID int primary key, ...)
CREATE TABLE tbl_lists_houses (listID int, houseID int)
CREATE TABLE tbl_houses (houseID int primary key, ...)
Then the query becomes trivial:
SELECT SUM(h.HouseValue) FROM tbl_houses AS h, tbl_lists AS l, tbl_lists_houses AS lh WHERE l.listID = <your value> AND lh.listID = l.listID AND lh.houseID = h.houseID
Storing lists in a single field really prevents you from doing anything useful with them in the database, and you'll be going back and forth between PHP and the database for everything. Also (no offense intended), "my project is highly dynamic" might be a bad excuse for "I have no requirements or design yet".
normalise http://en.wikipedia.org/wiki/Database_normalization