MySQL - Merging two queries that have different conditions and limits - mysql

I'm implementing a Tag System for my website, using PHP + MySQL.
In my database, I have three tables:
Posts
Id
Title
DateTime
Primary Key: Id
Tags
Id
Tag
Slug
1
First Tag
first-tag
Primary Key: Id | Key: Slug
TagsMap
Id
Tag
Primary Key: both
(Id = post's Id in Posts; Tag = Tag's Id in Tags)
Given, for instance, the url www. ... .net/tag/first-tag, I need to show:
the tag's name (in this case: "First Tag");
the last 30 published posts having that tag.
In order to achieve this, I'm using two different queries:
firstly
SELECT Tag FROM Tags WHERE Slug = ? LIMIT 1
then
SELECT p.Title FROM Posts p, Tags t, TagsMap tm
WHERE p.Id = tm.Id
AND p.DateTime <= NOW()
AND t.Id = tm.Tag
AND t.Slug = ?
ORDER BY p.Id DESC
LIMIT 30
But I don't think it's a good solution in terms of performance (please, correct me if I'm wrong).
So, my question is: how (if possible) to merge those two queries into just one?
Thanks in advance for Your suggestions.

The query that you have shown above is not a optimal solution as first it creates a cartesian product of all the tables and then filters out the data based on the conditions. If these tables become heavier in future, then your query will start slowing down (SLOW QUERIES).
Please use joins over this approach. ex. INNER JOIN, LEFT JOIN, RIGHT JOIN etc.
Try this SQL:
SELECT t.*, p.* FROM Tags t
INNER JOIN TagsMap tm ON (tm.Tag = t.Id )
INNER JOIN Posts p ON (p.Id = tm.Id AND p.DateTime <= NOW())
WHERE t.slug LIKE 'First Tag'
ORDER BY p.Id DESC
LIMIT 30

Given that you have structured your tables in a manner where you can utilize foreign keys and match them with their counterparts, then you can make use of JOIN's in your query.
SELECT
Tags.Tag,
Posts.title
FROM
Tags
LEFT JOIN
TagsMap ON Tags.id = TagsMap.tag
LEFT JOIN
Posts ON TagsMap.id = Posts.id AND
Posts.DateTime <= NOW()
WHERE
Posts.id = TagsMap.id AND
Tags.Slug = ?
ORDER BY
Posts.id DESC
LIMIT 30
The idea is that the query is optimized, but you will need to filter your result set programmatically in the view, in order to display the Tag only once.

If there is at most one "slug" per "post", include slug as a column in Posts.
If there can be any number of "tags" per "post", then have a table
CREATE Tags (
post_id ... NOT NULL,
tag VARCHAR(..)... NOT NULL,
post_dt DATETIME NOT NULL,
PRIMARY KEY(post_id),
INDEX(tag, dt)
) ENGINE=InnoDB
And you may want to use LEFT JOIN Tags and GROUP_CONCAT(tag).
I don't know what you mean by "first" in "first_tag". Maybe you should get rid of "first"?
The last 30 posts for a given tag:
SELECT p.*,
( SELECT GROUP_CONCAT(tag) FROM Tags ) AS tags
FROM ( SELECT post_id FROM tags WHERE tag = ?
ORDER BY post_dt DESC LIMIT 30 ) AS x
JOIN posts AS p ON p.id = x.post_id

Related

MySQL Tag System (3 tables) - How to select the last 10 published posts and all the tags associated to each post?

I'm implementing a Tag System for my website, using PHP + MySQL.
In my database, I have three tables:
posts
id
title
datetime
Primary Key: id
tags
id
tag
slug
Primary Key: id | Key: slug
tagsmap
id
tag
Primary Key: both
(id = post's id in posts; tag = tag's id in tags)
Every post may have one or more tags, or no tags, associated to it.
Now I need to select - and show on the same page - the last 10 (no less, no more) published posts and all the tags associated to those posts.
This is what I've tried:
SELECT
p.title,
t.slug,
t.tag
FROM
posts p
LEFT JOIN tagsmap tm ON p.id = tm.id
LEFT JOIN tags t ON tm.tag = t.id
WHERE
p.datetime <= NOW()
ORDER BY
p.id DESC
LIMIT
10
It works, but, doing so, posts having two tags are showed twice, while I need to show each post just once.
Then I've added
GROUP BY p.id
But, this way, I get only one tag for each post.
I don't know how to solve this problem (I'm not very experienced with MySQL).
Would you give me any suggestions?
SQL Fiddle
P.S.: What I need is a SQL-side (not PHP-side) solution (the best in terms of performance). Nevertheless, any further suggestions would be much appreciated.
First select the latest 10 posts and then join to the other tables.
Then group by post and use GROUP_CONCAT() to combine all slugs and tags in a comma separated string for each post:
SELECT p.title,
GROUP_CONCAT(t.slug ORDER BY t.id) slugs,
GROUP_CONCAT(t.tag ORDER BY t.id) tags
FROM (SELECT * FROM posts WHERE datetime <= NOW() ORDER BY datetime DESC LIMIT 10) p
LEFT JOIN tagsmap tm ON p.id = tm.id
LEFT JOIN tags t ON tm.tag = t.id
GROUP BY p.id
ORDER BY p.datetime DESC;
See the demo.
As others have said, GROUP_CONCAT is the 'answer'. But to make the schema better and faster, I suggest:
Get rid of slug (if possible). If "posts" need slugs, then put them into Posts. If a "slug" is a type of "tag", then treat it as such. For that matter, consider throwing category, date, author, etc, into Tags.
Combine tagmap and tag. The loss in normalization is more than made up for by avoiding the double-JOIN.
(for clarity) id should only refer to the current table; post_id should be used to refer to the posts table.
CREATE TABLE Tags (
post_id INT UNSIGNED NOT NULL,
tag VARCHAR(191) CHARACTER SET utf8mb4 NOT NULL,
PRIMARY KEY(post_id, tag),
INDEX(tag)
) ENGINE=InnoDB;
SELECT p.title,
( SELECT GROUP_CONCAT(tag SEPARATOR ', ') tags WHERE post_id = p.id ) AS tags
FROM posts AS p
ORDER BY p.datetime DESC
LIMIT 10;
When the tables get big, this formulation has the advantage of avoiding the "explode-implode" that is caused by JOIN + GROUP BY.
That subquery will return NULL when there are no tags; use COALESCE() if you want to turn NULL int '' or (no tags) or whatever.
When you get into "pagination", don't use OFFSET. Instead, "remember where you left off", See http://mysql.rjweb.org/doc.php/pagination
If you need the latest 10 posts with a certain tag, then see http://mysql.rjweb.org/doc.php/lists

Order by occurrences of tags in mysql

I've been stuck the last couple of days with this problem. I have the users' tags (user_tags) in one table and the articles in another table (crawler_results), and I'm trying to sort the articles so the ones which contain the most tags are placed at first.
How do you do that?
This is my code:
SELECT content
FROM crawler_results articles
INNER JOIN user_newspapers un
ON un.newspaper_id = articles.newspaper_id
WHERE un.user_id = '$user_id'
I somehow have to join the user_tags table which consists of:
tag and user_id
EDIT:
Lets say the user wants to find articles with these tags: #stack and #overflow. The articles that have the most of these tags are displayed first. So an article that says: "Stack overflow is great" is ranked higher than just: "overflow is great". And tags can be reused, for instance: "overflow, overflow, overflow" counts as 3.
I want to sort the articles so the ones that contain the most tags are first.
How is this achieved? I have four tables: crawler_results (where I store the articles), newspapers (where I store the newspapers), user_newspapers (where I have newspaper_id and user_id) and user_tags (where I have the tag and user_id).
Do I need to make a separate table that stores all the words from the articles or can this be done with a sql query?
I really appreciate your help!
If you just have to sort your query using the number of tags related, try this query:
SELECT title
, c.name AS category
, img_src
, n.name AS newspaper_name
, crawler_results.id AS id
, crawler_results.content AS content
FROM crawler_results
INNER JOIN newspapers n ON n.id = crawler_results.newspaper_id
INNER JOIN categories c ON c.id = crawler_results.category
INNER JOIN user_newspapers un ON un.newspaper_id = n.id
INNER JOIN (SELECT ut.user_id
,COUNT(ut.tag) AS nb_tags
FROM user_tags ut
GROUP BY ut.user_id) t ON t.user_id = un.user_id
INNER JOIN user_categories uc ON uc.category_id = c.id
WHERE img_src != ''
AND char_length(content) > 750
AND active = '1'
AND un.user_id = '$user_id'
GROUP BY title
ORDER BY category ASC, t.nb_tags DESC
If the result is not what you expect, try to clarify your question and eventually provide some example of data you're expecting.
Hope this will help.

MySQL - match all tags rather than any

I have an SQL setup akin to the following:
ARTICLES
id (PK)
name
TAGS
id (PK)
tag
...and a third table logging associations between the two, since there can be multiple tags to each article:
ARTICLE_TAG_ASSOCS
id (PK)
article_id (FK)
tag_id (FK)
Via this question I managed to construct a query that would find articles that were tagged with at least one of a number of tags, e.g.
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata ON articles.id = ata.article_id
JOIN tags ON tags.id = ata.tag_id
WHERE tags.tag = 'budgie' OR tags.tag = 'parrot';
Question: How can I alter the above to find articles that match ALL tags, i.e. both 'budgie' and 'parrot', not just one?
Clearly modifying the logic to
WHERE tags.tag = 'budgie' && tags.tag = 'parrot';
...is logically flawed, since MySQL is considering each tag in isolation, one at a time, but hopefully you get what I mean.
There are several workable approaches.
One approach is to perform separate JOIN operations for each tag. For example:
SELECT articles.*
FROM articles
JOIN article_tag_assocs ata
ON ata.article_id = articles.id
JOIN tags ta
ON ta.id = ata.tag_id
AND ta.tag = 'budgie'
JOIN article_tag_assocs atb
ON atb.article_id = articles.id
JOIN tags tb
ON tb.id = atb.tag_id
AND tb.tag = 'parrot'
Note that this can return "duplicate" rows if a given articles is associated to the same tag value more than once. (Adding the DISTINCT keyword or a GROUP BY clause are ways to eliminate the duplicates.)
Another approach, if we are guaranteed that a given article has no duplicate tag values, is to use an inline view to get the list of article_id that are associated with both tags, and then JOIN that set to the articles table. For example:
SELECT a.*
FROM ( SELECT ata.article_id
FROM article_tag_assocs ata
JOIN tags t
ON t.id = ata.tag_id
WHERE t.tag IN ('budgie','parrot')
GROUP BY ata.article_id
HAVING COUNT(1) = 2
) s
JOIN articles a
ON a.id = s.article_id
Note that the literal "2" in the HAVING clause matches the number of values in the predicate on the tag column. The inline view (aliased as s) returns a distinct list of article_id, and we can join that to the articles table.
This approach is useful if you wanted to match, for example, at least three out of four tags. We could use lines like this in the inline view query.
WHERE t.tag IN ('fee','fi','fo','fum')
HAVING COUNT(1) >= 3
Then, any article that matched at least three of those four tags would be returned.
These aren't the only ways to return the specified result, there are several other approaches.
As Roland's answer pointed out, you can also do something like this:
FROM articles a
WHERE a.id IN ( <select article id values related to tag 'parrot'> )
AND a.id IN ( <select article id values related to tag 'bungie'> )
You could also use an EXISTS clause with a correlated subquery, though this approach doesn't usually perform as well with large sets, due to the number of executions of the subquery
FROM articles a
WHERE EXISTS ( SELECT 1
FROM article_tag_assocs s1
JOIN tags t1 ON t1.tag = 'bungie'
WHERE s1.article_id = a.id
)
AND EXISTS ( SELECT 1
FROM article_tag_assocs s2
JOIN tags t2 ON t2.tag = 'parrot'
WHERE s2.article_id = a.id
)
NOTE: in this case, it is possible to reuse the same table aliases within each subquery, because it doesn't lead to ambiguity, though I still prefer distinct aliases because the table aliases show up in the EXPLAIN output, and the distinct aliases make it easier to match the rows in the EXPLAIN output to the references in the query.)
What about this?
Will this give bad performance like EXISTS for large data sets?
This query is to check which rows of 'a1' table has some specified tags and not has some other specified tags
SELECT * FROM a1 WHERE a1.id IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (1))
AND a1.id NOT IN
(SELECT taggables.taggable_id FROM taggables WHERE taggables.taggable_type = 'a1' AND taggables.tag_id IN (2))
ORDER BY a1.file_count DESC LIMIT 0, 5

MySQL Nested Selected with Multiple Columns

I am currently trying to retrieve the latest posts along with their related posts (x number for each post). I have the following query in hand:
SELECT id, title, content
(SELECT GROUP_CONCAT(title) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE p_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5
My database structure is simple: A posts table. A tags table. And a tagsmap table where I associate posts with tags.
This query works fine (though I don't know its performance since I don't have many rows in the tables -- Maybe an explain could help me but that's not the case right now).
What I really need is to retrieve the ids of the related posts along with their titles.
So I'd like to do SELECT GROUP_CONCAT(title), GROUP_CONCAT(id), but I know that will result in an error. So what is the best way to retrieve the id along with the title in this case? I do not want to rewrite the whole subquery to just retrieve the id. There should be another way.
EDIT
SELECT p1.id, p1.title, p1.content,
group_concat(DISTINCT p2.id) as 'P IDs',
group_concat(DISTINCT p2.title) as 'P titles'
FROM posts as p1
LEFT JOIN tagsmap as tm1 on tm1.p_id = p1.id
LEFT JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
LEFT JOIN posts as p2 on p2.id = tm2.p_id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
At the end this is the query that I've used. I removed the Where clause because it is unnecessary and used LEFT JOIN rather that JOIN because otherwise it would ignore the posts without tags. And finally added DISTINCT to group_concat because it was concatenating duplicate rows (If for example a post had multiple common tags with a related post it would result in a duplicate concatenation).
The query above works perfectly. Thanks for all.
Okay - this will work, and it has the added advantage of eliminating the sub queries (which can slow you down when you get lots of records):
SELECT p1.id, p1.title, p1.content,
group_concat( p2.id) as 'P IDs',
group_concat( p2.title) as 'P titles'
FROM posts as p1
JOIN tagsmap as tm1 on tm1.p_id = p1.id
JOIN tagsmap as tm2 on tm2.t_id = tm1.t_id and tm1.p_id <> tm2.p_id
JOIN posts as p2 on p2.id = tm2.p_id
WHERE p2.id <> p1.id
GROUP BY p1.id
ORDER BY p1.id desc limit 5;
What we're doing here is selecting what you want from the first version of posts, joining them to the tagsmap by their post.id, doing a self join to tagsmap by tag id to get all the related tags, and then joining back to another posts (p2) to get the posts that are pointed to by those related tags.
Use GROUP BY to discard the dups from all that joining, and you're there.
like this?
SELECT id, title, content
(SELECT GROUP_CONCAT(concat(cast(id as varchar(10)), ':', title)) FROM posts -- Select title of related posts
WHERE id <> p.id AND id IN (
SELECT p_id FROM tagsmap -- Select reletad post ids from tagsmap
WHERE t_id IN (
SELECT t_id FROM tagsmap -- Select the tags of the current post
WHERE post_id = p.id)
) ORDER BY id DESC LIMIT 0, 3) as related
FROM posts as p ORDER BY id DESC LIMIT 5

Join on multiple rows

I'm trying to load rows form a posts table based on whether they have multiple rows in another table. Take the below table structures:
posts
post_id post_title
-------------------
1 My Post
2 Another Post
post_tags
post_tag_id post_tag_name
--------------------------
1 My Tag
2 Another Tag
postTags
postTag_id postTag_tag_id postTag_post_id
------------------------------------------
1 1 1
2 2 1
Unsurprisingly, post and post_tags stores the posts and tags, and postTags joins which posts have which tags.
What I'd normally do to join the tables is this:
SELECT * FROM (`posts`)
JOIN `postTags` ON (`postTag_post_id` = `post_id`)
JOIN `post_tags` ON (`post_tag_id` = `postTag_tag_id`)
Then I'd have information on the tags, and can have additional stuff later in the query to search tag names for search terms etc, and then GROUP once I have posts that match the search terms.
What I'm trying to do is only select from posts where a post has both tag 1 AND tag 2, and I can't work out the SQL for it. I think it needs to be done in the actual JOIN rather than having a WHERE clause for it as when I run the join above I'd obviously get two rows back, so I can't have something like
WHERE post_tag_id = 1 AND post_tag_id = 2
as each row will only have one post_tag_id, and I can't check different values for the same column in one row.
What I've tried to do is something like this:
SELECT * FROM (`posts`)
JOIN `postTags` ON (postTag_tag_id = 1 AND postTag_tag_id = 2)
JOIN `post_tags` ON (`post_tag_id` = `postTag_tag_id`)
but this is returning 0 results when I run it; I've put conditions like this in JOINS before for similar things and I'm sure it's close but can't quite work out what to do if this doesn't work.
Am I at least on the right track? Hopefully I'm not missing something obvious.
Thanks.
You are trying to ask the postTags row to be at the same time one thing and another.
You either need to do two joins to post_tags and postTags so you get both. Or you can say that the post can have whatever tag between those two and the total amount of tags must equal two (assuming a post cannot related to the same tag more than once).
First approach:
SELECT *
FROM `posts` as p
WHERE p.`post_id` IN (SELECT pt.`postTag_post_id`
FROM `postTags` as pt
WHERE pt.`postTag_tag_id` = 1)
AND p.`post_id` IN (SELECT pt.`postTag_post_id`
FROM `postTags` as pt
WHERE pt.`postTag_tag_id` = 2);
Second approach:
SELECT *
FROM posts as p
WHERE p.post_id IN (SELECT pt.postTag_post_id
FROM (SELECT count(0) as c, pt.postTag_post_id
FROM postTags as pt
WHERE pt.postTag_tag_id IN (1, 2)
GROUP BY pt.postTag_post_id
HAVING c = 2) as pt);
I want also to add that if you use IN or EXISTS in the first approach then you won't have multiple lines for the same post row just because you have more than one tag. This way you save one DISTINCT later that would make your query slower.
I've used an IN in the second approach just as a rule of thumb I use: if you don't need to show the data you don't need to do a JOIN in the FROM section.
SELECT p.*, t1.*, t2.* FROM posts p
INNER JOIN postTags pt1 ON pt1.postTag_post_id = p.id AND pt1.postTag_tag_id = 1
INNER JOIN postTags pt2 ON pt2.postTag_post_id = p.id AND pt2.postTag_tag_id = 2
INNER JOIN post_tags t1 ON t1.post_tag_id = pt1.postTag_tag_id
INNER JOIN post_tags t2 ON t2.post_tag_id = pt2.postTag_tag_id
Without actually building a db the same as yours this is hard to verify but it should work.
Let me start by saying that this type of query is much easier and much more performant in a database that supports analytic queries (Oracle, MS SQL Server). So in MySQL you have to do it the old, crappy, aggregate way.
I also want to say that having a table that stores the names of the tags in post_tags and then the mapping of post tags to posts in postTags is confusing. If it were me, I would change the name of the mapping table to post_tags_map or post_tags_to_post_map. So you would have posts with post_id, post_tags with post_tags_id, and post_tags_map with post_tags_map_id. And those id columns would be named the same in every table. Having the same column that is named differently in other tables is also confusing.
Anyways, let's solve your problem.
First you want a result set that is 1 post id per row, and only the posts that have tags 1 & 2.
select postTag_post_id, count(1) cnt from (
select postTag_post_id from postTags where postTag_tag_id in (1, 2)
) group by postTag_post_id;`
That should give you back data like this:
postTag_post_id | cnt
1 | 2
Then you can join that result set back to your posts table.
select * from posts p,
(
select postTag_post_id, count(1) cnt from (
select postTag_post_id from postTags where postTag_tag_id in (1, 2)
) group by postTag_post_id;
) t
where p.post_id = t.postTag_post_id
and t.cnt >= 2;
If you need to do another join to the post_tags table in order to get the postTag_tag_id from the post_tag_name, your inner most query would change like so:
select postTag_post_id
from postTags a,
post_tags b
where a.postTag_tag_id = b.post_tag_id
and b.post_tag_name in ('tag 1', 'tag 2');
That should do the trick.
Assuming you already know tag IDs (1 and 2), you could do something like this:
SELECT post_id, post_title
FROM posts JOIN postTags ON (postTag_post_id = post_id)
WHERE postTag_tag_id IN (1, 2)
GROUP BY post_id, post_title
HAVING COUNT(DISTINCT postTag_tag_id) = 2
NOTE: DISTINCT is not necessary if there is an alternate key on postTags {postTag_tag_id, postTag_post_id}, as it should be.
NOTE: If you don't have tag IDs (and just have tag names), you'll need another JOIN (towards the post_tags table).
BTW, you should seriously consider ditching the surrogate PK in the junction table (postTags.postTag_id) and just having the natural PK {postTag_tag_id, postTag_post_id}. InnoDB tables are clustered, and secondary indexes in clustered tables are fatter and slower than in heap-based tables. Also, some queries can benefit from storing posts tagged by the same tag physically close together (or storing tags of the same post close together, if you reverse the PK).