I've been looking all over the net and asking people for guidance but nobody seems to know the right (relatively fast) solution to the problem:
I have three tables, classic many-to-many solution:
entries: id (int), title (varchar[255]), content (text)
tags: id (int), name (varchar[255]), slug (varchar[255])
entries_tags: id (int), entry_id (int), tag_id (int)
Nothing out of ordinary so far. Now let's say I have test data in tags (I'm keeping out slugs as they are not important):
ID | name
1. | one
2. | two
3. | three
4. | four
5. | five
I also have three entries:
ID | title
1. | Something
2. | Blah blah blah
3. | Yay!
And relations:
ID | entry_id | tag_id
1. | 1 | 1
2. | 1 | 2
3. | 2 | 1
4. | 2 | 3
5. | 3 | 1
6. | 3 | 2
7. | 3 | 3
8. | 4 | 1
9. | 4 | 4
OK, we have our test data. I want to know how to get all entries that have tag One, but doesn't have tag Three (that'd be entries 1 and 4).
I know how to do it with subquery, the problem is, it takes a lot of time (with 100k entries it took about 10-15 seconds). Is there any way to do it with JOINs? Or am I missing something?
edit I guess I should've mentioned I need a solution that works with sets of data rather than single tags, so replace 'One' in my question with 'One', 'Two' and 'Two' with 'Three','Four'
edit2 The answer provided is right, but it's too slow to be used practically. I guess the only way of making it work is using a 3rd-party search engine like Lucene or ElasticSearch.
The following script selects entries that have tags One and Two and do not have tags Three and Four:
SELECT DISTINCT
et.entry_id
FROM entries_tags et
INNER JOIN tags t1 ON et.tag_id = t1.id AND t1.name IN ('One', 'Two')
LEFT JOIN tags t2 ON et.tag_id = t2.id AND t2.name IN ('Three', 'Four')
WHERE t2.id IS NULL
Alternative solution: the INNER JOIN is replaced with WHERE EXISTS, which allows us to get rid from the (rather expensive) DISTINCT:
SELECT
et.entry_id
FROM entries_tags et
LEFT JOIN tags t2 ON et.tag_id = t2.id AND t2.name IN ('Three', 'Four')
WHERE t2.id IS NULL
AND EXISTS (
SELECT *
FROM tags t1
WHERE t1.id = et.tag_id
AND t1.name IN ('One', 'Two')
)
This should do what you want.
(It may or may not be faster than the sub query solution, I suggest you compare the query plans)
SELECT DISTINCT e.*
FROM tags t1
INNER JOIN entries_tags et1 ON t1.id=et1.tag_id
INNER JOIN entries e ON e.entry_id=et1.entry_id
INNER JOIN tags t2 on t2.name='three'
INNER JOIN tags t3 on t3.name='four'
LEFT JOIN entries_tags et2 ON (et1.entryid=et2.entryid AND t2.id = et2.tag_id )
OR (et1.entryid=et2.entryid AND t3.id = et2.tag_id )
WHERE t1.name IN ('one','two') AND et2.name is NULL
By LEFT Joining the entries_tags table et2 (the data you do not want), you can then only select the records where the et2.name IS NULL (where the et2 record does not exist).
You mentioned trying a subquery. Is this what you tried?
SELECT entries.id, entries.content
FROM entries
LEFT JOIN entries_tags ON entries.id=entries_tags.entries_id
LEFT JOIN tags ON entries_tags.tag_id=tags.id
WHERE tag.id=XX
and entries.id NOT IN (
SELECT entries.id
FROM entries
LEFT JOIN entries_tags ON entries.id=entries_tags.entries_id
LEFT JOIN tags ON entries_tags.tag_id=tags.id
WHERE tag.id=YY
)
(Where XX is the tag you do want and YY is the tag you do not want)
With indices on the ID fields, that shouldn't be as slow as you say it is. It will depend on the data set, but it should be fine with indices (and with string comparisons omitted).
Related
given these tables :
id_article | title
1 | super article
2 | another article
id_tag | title
1 | great
2 | awesome
id_relation | id_article | id_tag
1 | 1 | 1
2 | 1 | 2
3 | 2 | 1
I'd like to be able to select all articles that are "great" AND "awesome" (eventually, I'll probably have to implement OR too)
And basically, if I do a select on articles the relation table joining on id_article: of course, I cant join two different values of id_tag. Only lead I had with concatenating IDs to test as a string, but that seems so lame, there has to be a prettier solution.
Oh and if it matters, I use a MySQL server.
EDIT: for ByWaleed, the typical sql select that would surely fail that I cited in my original question:
SELECT
a.id_article,
a.title
FROM articles a, relations r
WHERE
r.id_article = a.id_article and r.id_tag = 1 and r.id_tag = 2
wouldnt work because r.id_tag cant obviously be 1 and 2 on the same line. I doubt w3schools has an article on that. My search on google didnt yield any result, probably because I searched with the wrong keyword.
If you do all the joins as normal, then aggregate the rows to one group by article, then you can assert that they must have at least two different tags.
(Having already filtered to great and/or awesome, that means they have both.)
SELECT
a.id_article,
a.title
FROM
articles a
INNER JOIN
relations r
ON r.id_article = a.id_article
INNER JOIN
tags t
ON t.id_tag = r.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
COUNT(DISTINCT t.id_tag) = 2
(The DISTINCT is to avoid the possibility of one article having 'great' twice, for example.)
To do OR, you just remove the HAVING clause.
One approach is to aggregate by article, and then assert that the article both the "great" and "awesome" tags:
SELECT
a.id_article,
a.title
FROM articles a
INNER JOIN relations r
ON a.id_article = r.id_article
INNER JOIN tags t
ON r.id_tag = t.id_tag
WHERE
t.title IN ('great', 'awesome')
GROUP BY
a.id_article,
a.title
HAVING
MIN(t.title) <> MAX(t.title);
Demo
The logic here is that we first limit records, for each article, to only those of the two targets tags. Then we assert, in the HAVING clause, that both tags appear. I use a MIN/MAX trick here, because if the min and max differ, then it implies that there are two distinct tags.
Step 1: Use a temp table to get all articles with titles.
Step 2: If an article occurs multiple times in your temp table, that means it has great and awesome as titles.
Try:
CREATE TEMPORARY TABLE MyTempTable (
select t1.id_article, t2.title
from table1 t1
inner join table3 t3 on t3.id_article = t1.id_article
inner join table2 t2 on t2.id_tag = t3.id_tag
)
select m.id_article
from MyTempTable m
group by m.id_article
having count(*)>1
Edit: This solution assumes there are two possible tags, great and awesome. If more, please add a "where" clause to the select query for creating the temp table like where t2.title in ('great','awesome')
I have stored data into several MySQL 5.x tables in order to normalize, now I am struggling on how to retrieve this data in one line per dataset.
E.g.
Table 1: articles, holding also 2 values in this example per article
article_id | make | model
1 Audi A3
Table 2: article_attributes, where one article can have several attributes
article_id | attr_id
1 1
1 2
2 1
Table 3: article_attribute_names
attr_id | name
1 Turbo
2 Airbag
Now I want to retrieve it, with one line per dataset
e.g.
SELECT a.*, attr_n.name AS function
FROM `articles` a
LEFT JOIN article_attributes AS attr ON a.article_id = attr.article_id
LEFT JOIN article_attribute_names AS attr_n ON attr_n.attr_id = attr.attr_id
-- group by attr.article_id
This will gives me:
article_id | Make | Model | function
1 Audi A3 Turbo
1 Audi A3 Airbag
But I am looking for something like this:
article_id | Make | Model | function1 | function2
1 Audi A3 Turbo Airbag
Is this even possible, and if yes, how?
The simplest method is to put the values into a delimited field using group_concat():
SELECT a.*, GROUP_CONCAT(an.name) AS functions
FROM articles a LEFT JOIN
article_attributes aa
ON a.article_id = aa.article_id LEFT JOIN
article_attribute_names aan
ON aan.attr_id = aa.attr_id
GROUP BY a.article_id;
Aggregating by article_id is okay, assuming that the id is unique (or equivalently declared as a primary key).
If you actually want the results in separate columns, that is more challenging. If you know there are at most two (as in your example), just use aggregation:
SELECT a.*, MIN(an.name) AS function1,
(CASE WHEN MIN(an.name) <> MAX(an.name)
THEN MAX(an.name)
END) as function2
FROM articles a LEFT JOIN
article_attributes aa
ON a.article_id = aa.article_id LEFT JOIN
article_attribute_names aan
ON aan.attr_id = aa.attr_id
GROUP BY a.article_id;
For transaction listing I need to provide the following columns:
log_out.timestamp
items.description
log_out.qty
category.name
storage.name
log_out.dnr ( Representing the users id )
Table structure from log_out looks like this:
| id | timestamp | storageid | itemid | qty | categoryid | dnr |
| | | | | | | |
| 1 | ........ | 2 | 23 | 3 | 999 | 123 |
As one could guess, I only store the corresponding ID's from other tables in this table. Note: log_out.id is the primary key in this table.
To get the the corresponding strings, int's or whatever back, I tried two queries.
Approach 1
SELECT i.description, c.name, s.name as sname, l.*
FROM items i, categories c, storages s, log_out l
WHERE l.itemid = i.id AND l.storageid = s.id AND l.categoryid = c.id
ORDER BY l.id DESC
Approach 2
SELECT log_out.id, items.description, storages.name, categories.name AS cat, timestamp, dnr, qty
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id
ORDER BY log_out.id DESC
They both work fine on my developing machine, which has approx 99 dummy transactions stored in log_out. The DB on the main server got something like 1100+ tx stored in the table. And that's where trouble begins. No matter which of these two approaches I run on the main machine, it always returns 0 rows w/o any error *sigh*.
First I thought, it's because the main machine uses MariaDB instead of MySQL. But after I imported the remote's log_out table to my dev-machine, it does the same as the main machine -> return 0 rows w/o error.
You guys got any idea what's going on ?
If the table has the data then it probably has something to do with JOIN and related records in corresponding tables. I would start with log_out table and incrementally add the other tables in the JOIN, e.g.:
SELECT *
FROM log_out;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id;
SELECT *
FROM log_out
INNER JOIN items ON log_out.itemid = items.id
INNER JOIN storages ON log_out.storageid = storages.id
INNER JOIN categories ON log_out.categoryid = categories.id;
I would execute all the queries one by one and see which one results in 0 records. Additional join in that query would be the one with data discrepancy.
You're queries look fine to me, which makes me think that it is probably something unexpected with the data. Most likely the ids in your joins are not maintained right (do all of them have a foreign key constraint?). I would dig around the data, like SELECT COUNT(*) FROM items WHERE id IN (SELECT itemid FROM log_out), etc, and seeing if the returns make sense. Sorry I can't offer more advise, but I would be interested in hearing if the problem is in the data itself.
I am sorry this is not a high quality question and I know I am risking downvotes, but I am trying to learn as I go. I am currently working on a side project and stumbled into a situation I am not sure of.
I have two tables and need to call the data from both sharing the same id number (different names)
I will now attempt to give an example
Table 1
| psid | idd |
| 1 | 999 |
| 2 | 42 |
Table 2
| aid | other |
| 999 | hello world |
| 42 | welcome |
I am trying to link idd and aid whilst displaying all rows from table one
Example
id = 1 / Title : hello world
id = 2 / Title : welcome
I am not sure if this can be achieved with a single query to the database I have tried adding a second but it goes in a nonstop loop.
I have not done much searching as not sure what to search for.
Thanks and sorry
Cartesian Join
SQLFiddle
select
table1.*,
table2.*
from
table1,
table2
where
table1.idd = table2.aid and
table1.idd = :id
Or Left Join
SQL Fiddle
select
t1.*,
t2.*
from
table1 t1
left join
table2 t2
on
t1.idd = t2.aid
where
t1.idd = :id
SELECT table1.psid, table2.other FROM table1
JOIN table2 ON table1.idd = table2.aid
WHERE table1.idd= 'X' AND table2.aid = 'X'
this should JOIN the two tables together and by specifying the matching id's for each table in the WHERE clause should get the relevant information.
EDIT fixed SQL
I have some minor problem with an SQL query. I have to select data from multiple tables like:
offers:
| id | offer | info
| 1 | City break | information
pictures:
| id | id_offer | picture_name | title
| 1 | 1 | bucharest.jpg | Bucharest
| 2 | 1 | london.jpg | London
sql query:
SELECT offers.* as t1, pictures.* as t2
FROM offers
JOIN t2 ON t1.id=t2.id_offer
WHERE t1.id = '1'
The code is much larger but I don't understand how to wrap results from t2 into an array. Because the length of the array returned is made by t2 which is the pictures table. This will return an array with 2 objects.
It is possible to return one object in the array with both pictures in it?
MySQL does not support array datatypes.
You can return a comma separated list of values instead:
SELECT o.*, GROUP_CONCAT(picture_name ORDER BY p.id)
FROM offers o
JOIN pictures p
ON p.id_offer = o.id
GROUP BY
o.id
Arrays doesn't exist in mysql.
But you can use GROUP_CONCAT to return all images in comma separated list
SELECT offers.*, GROUP_CONCAT(t2.picture_name) AS pictures
FROM offers AS t1
JOIN pictures AS t2 ON t1.id=t2.id_offer
WHERE t1.id = '1'
GROUP BY t1.id