improving MySQL related articles query - mysql

For a related topic list I use a query using tags. It displays a list of 5 articles that have 1 or more tags in common and that are older than the viewed one.
Is it possible to write a query that produce more relevant results by giving more weight to articles that have 2,3,4... tags in common?
I saw this topic on more or less the same subject:
MySQL Find Related Articles
but it produces 0 results in the case there are less than 3 tags in common.
The query I use now:
SELECT DISTINCT
AAmessage.message_id, AAmessage.title, AAmessage.date
FROM
AAmessage
LEFT JOIN
AAmessagetagtable
AS child ON child.message_id = AAmessage.message_id
JOIN AAmessagetagtagtable
AS parent ON parent.tag_id = child.tag_id
AND
parent.message_id = '$message_id'
AND AAmessage.date < '$row[date]'
ORDER BY
AAmessage.date DESC LIMIT 0,5
using tables:
AAmessage (message_id, title, date...)
AAmessagetable (key, message_id, tag_id)
AAtag (tag_id, tag.... not used in this query but needed to store names of tags)

First of all, please excuse that I changed the table names a bit to message and message_tag for readability.
Second, I didn't test this. Use it rather as a pointer than a definite answer.
The query uses two subqueries, which might not be so efficient, there is probably a room for improvement. First, the innermost query looks for the tags of the current message. Then, the middle query looks for messages which are marked with at least one common tag. The grouping is used to get unique message_id and order them by number of common tags. Last, the JOIN is used to load additional details and to filter out the old messages.
You may notice I used question marks instead of '$xyz'. This is to avoid the care about escaping the variable contents.
SELECT message_id, title, date
FROM message
RIGHT JOIN (SELECT message_id, COUNT(*)
FROM message_tag
WHERE tag_id IN
(SELECT MT.tag_id FROM message_tag MT WHERE MT.message_id = ?)
GROUP BY message_id
ORDER BY COUNT(*) DESC) RELATED_MESSAGES
ON message.message_id = RELATED_MESSAGES.message_id
WHERE date < ?

I use HAVING for this situations.
SELECT DISTINCT m.message_id, m.title, m.date
FROM AAmessage AS `m`
LEFT JOIN AAmessagetagtable AS `mt` ON mt.message_id = mt.message_id
GROUP m.message_id
HAVING COUNT(mt.key) >= 1
WHERE m.message_id = '$message_id'
AND m.date < '$row[date]'
ORDER BY m.date DESC
LIMIT 5

Related

SQL: How to merge two complex queries into one, where the second one needs data from the first one

The goal is to load a list of chats where the user sending the request is a member in. Some of the chats are group chats (more than two members) and there I want to show the profile pictures from the users who wrote the last three messages.
The first query to load meta data like the title and the timestamp of the chat is:
SELECT Chat_Users.ID_Chat, Chats.title, Chats.lastMessageAt
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) = 2
AND MAX(Chat_Users.ID_User = $userID) > 0
ORDER BY Chats.lastMessageAt DESC
LIMIT 20
The query to load the last three profile pictures from one of the chats loaded with the query above is:
SELECT GROUP_CONCAT(innerTable.profilePictures SEPARATOR ', ') AS 'ppUrls',
innerTable.ID_Chat
FROM
(
SELECT Chat_Users.ID_Chat, Users.profilePictureUrl AS profilePictures
FROM Users
JOIN Chat_Users ON Chat_Users.ID_User = Users.ID
JOIN Chat_Messages ON Chat_Messages.ID_Chat = Chat_Users.ID_Chat
WHERE Chat_Users.ID_Chat = $chatID
ORDER BY Chat_Messages.timestamp DESC
LIMIT 3
) innerTable
GROUP BY innerTable.ID_Chat
Both are working separately but I want to merge them together so I don't have to run the second query in a loop due to performance reasons. Unfortunately I have no idea how this can be achieved because the second query needs the $chatID, which it only gets from the first query.
So to clarify the desired result: The list with the profile picture urls (second query) should be just another column in the result of the first query.
I hope it is explained in a reasonably understandable way. Any help would be much appreciated.
Edit: Sample data from the affected tables:
Table "Chats":
Table "Chat_Users":
Table "Chat_Messages":
Table "Users":
This fufils the brief, however it requires a view because MySQL 5.x doesn't support the WITH clause.
It's long and cluncky and I've tried to shorten it but this is as good as I can get, hopefully someone will pop up in the comments with a way to make it shorter!
The view:
CREATE VIEW last_interaction AS
SELECT
id_chat,
id_user,
MAX(timestamp) AS timestamp
FROM chat_messages
GROUP BY id_user, id_chat
The query:
SELECT
Chat_Users.ID_Chat,
Chats.title,
Chats.lastMessageAt,
urls.pps AS profilePictureUrls
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
JOIN (
SELECT
lo.id_chat,
GROUP_CONCAT(users.profilePictureUrl) AS pps
FROM last_interaction lo
JOIN users ON users.id = lo.id_user
WHERE (
SELECT COUNT(*) -- the amount of more recent interactions
FROM last_interaction li
WHERE (li.timestamp = lo.timestamp AND li.id_user > lo.id_user)
) < 3
GROUP BY id_chat
) urls ON urls.id_chat = Chats.id
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) > 2
AND MAX(Chat_Users.ID_User = $userID)
ORDER BY Chats.lastMessageAt DESC
LIMIT 20

Count, Group By, Subquery, Left Join not working as expected

This is puzzling me and no amount of the Google is helping me, hoping someone can point me in the right direction.
Please note that I have omitted some fields from the tables that don't relate to the question just to simplify things.
contacts
contact_id
name
email
contact_uuids
uuid
contact_id
visitor_activity
uuid
event
contact_communications
comm_id
contact_id
employee_id
Query
SELECT
`c`.*,
(SELECT COUNT(`log_id`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `num_comms`,
(SELECT MAX(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `latest_date`,
(SELECT MIN(`date`) FROM `contact_communications` `cc` WHERE `cc`.`contact_id` = `c`.`contact_id`) as `first_date`,
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
FROM `contacts` `c`
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
GROUP BY `c`.`contact_id`
ORDER BY `c`.`name` ASC
Some contacts have multiple UUIDs (upwards of 20 or 30).
When I perform the query WITHOUT the GROUP BY statement, I get the results I expect - a row returned for each UUID that exists for that contact, with the correct "num_comms" and "num_act" numbers.
However when I add the GROUP BY statement, the "num_comms" is a lot smaller then expected and the "num_act" returns only the value from the first row without the GROUP BY statement.
I tried doing a "WHERE NOT IN" in the subquery, however that simply crashed the server as it was far too intense.
So - how do I get this to add up all the COUNT values from the LEFT JOIN and not just return the first value?
Also if anyone can help me optimize this that would be great.
Two problems:
GROUP BY c.contact_id does not include all the non-aggregate columns. This is a MySQL extension. What you get is random values for the rows other than contact_id
The JOIN adds confusion. Your only use for visitor_activity is the COUNT(*) one it. But that does not make sense since it is limited to one UUID, whereas the row is limited to one contact_id. Rethink the purpose of that.
Remove this line:
(SELECT COUNT(`vaid`) FROM `visitor_activity` `va` WHERE `va`.`uuid` = `cu`.`uuid`) as `num_act`
and the rest may work ok.
I will continue with the assumption that you want the COUNT of all rows in visitor_activity for all the uuids associated with the one contact_id.
See if this:
( SELECT COUNT(*)
FROM `contacts` c2
JOIN `visitor_activity` USING(uuid)
WHERE c2.contact_id = c.contact_id as `num_act` ) AS num_act
will work for the last subquery. At the same time, remove the JOIN:
LEFT JOIN `contact_uuids` `cu` ON `c`.`contact_id` = `cu`.`contact_id`
Now, back to the other problem (the non-standard usage of GROUP BY). Assuming that contact_id is the PRIMARY KEY, then simply remove the
GROUP BY `c`.`contact_id`

GROUP BY in query

I have a little struggle with this query. The GROUP BY function is not working in this query. Without the webshop group by it works, but multiple webshops show. Maybe anyone got an idea? Thanks!
SELECT
default_deals.id,
default_deals.title,
default_webshops.name,
COUNT(default_deals_categories.categories_id) as category_count,
default_webshops.popular,
default_webshops.popular_order
FROM
default_deals
JOIN default_deals_categories ON default_deals.id = default_deals_categories.row_id
JOIN default_webshops ON default_deals.webshop = default_webshops.id
WHERE
categories_id IN (1, 2)
GROUP BY
default_deals.id,
default_deals.title,
default_webshops.name,
default_webshops.popular,
default_webshops.popular_order
ORDER BY
category_count DESC,
default_webshops.popular DESC,
default_webshops.popular_order DESC
Try this:
SELECT
default_deals.id,
title,
name,
COUNT(categories_id) as category_count,
popular,
popular_order
FROM
default_deals
JOIN default_deals_categories ON default_deals.id = default_deals_categories.row_id
JOIN default_webshops ON default_deals.webshop = default_webshops.id
WHERE
categories_id IN (1, 2)
GROUP BY
default_deals.id,
title,
name,
popular,
popular_order
ORDER BY
category_count DESC,
popular DESC,
popular_order DESC
Okay, I guess your results have a deal listed multiple times with each webshop it has been linked to and then the same category count but possibly different values for popular and popular_order? What you need to do is to decide how you want to aggregate these fields. For example, you could do this:
SELECT
dd.id,
dd.title,
MAX(dw.name) AS webshop_name,
COUNT(dc.categories_id) AS category_count,
MAX(dw.popular) AS popular,
SUM(dw.popular_order) AS popular_order
FROM
default_deals dd
JOIN default_deals_categories dc ON dd.id = dc.row_id
JOIN default_webshops dw ON dd.webshop = dw.id
WHERE
dc.categories_id IN (1, 2)
GROUP BY
dd.id,
dd.title
ORDER BY
COUNT(dc.categories_id) DESC,
MAX(dw.popular) DESC,
SUM(dw.popular_order) DESC;
But I am going to bet that is NOT what you actually want to see. Where you have multiple webshops what do you want to see? My example just takes the highest value alphabetically but you could make this into a list, e.g. "Webshop1, Webshop2", etc.
You also need to decide how to aggregate the other fields in the webshop table. My query takes the maximum for popular and the sum of popular_order but this is just an example.
Finally, this will double up, triple up, etc. the category count depending on how many webshops a deal is linked to... and I am going to bet that this is also not what you want?

Mysql FULLTEXT search with GROUP BY, keep row value that has the highest score

In two FULLTEXT searches, I look for the search terms in the title of the book and the tags and get the following result:
rScore tScore ID
...
1.235689725827653 0 406
0.928482055664062 2.37063312530518 406
0.928482055664062 0 406
0.453363467548853 0 520
...
WHAT I WOULD LIKE TO HAVE, all duplicate ID have been concatenated taking highest scores:
rScore tScore ID
...
1.235689725827653 2.37063312530518 406
0.453363467548853 0 520
...
, but after a GROUP BY, ID 406 was grouped in this row column:
...
MATCH_SCORE_TITLE MATCH_SCORE_TAGS ID
0.928482055664062 0 406
0.453363467548853 0 520
...
How can I group all these results and keeping the max value of each MATCH? I know this has been asked before and can be done with a JOIN, but I didn't find it with a combination of two rows, plus I already have JOINS in my query since TITLE and TAGS are in two different tables.
UPDATE:
I have 3 tables, "registrants" (the left table with the titles to search), the "registrants_tags" (relational table between the left and the right table) and "tags" (the right table with the tags to search. Here is a simplified version of the SQL query:
SELECT
tags.tag, (Also tried (GROUP_CONCAT(`tags`.`tag`) AS tags)
MAX(MATCH(registrants.story_title) AGAINST('bob')) as rScore,
MAX(MATCH(tags.tag) AGAINST('bob')) as tScore,
registrants.id, registrants.story_title
FROM registrants
LEFT JOIN registrants_tags ON registrants.id = registrants_tags.registrant
LEFT JOIN tags ON registrants_tags.tag = tags.id
WHERE MATCH(registrants.story_title) AGAINST('bob')
OR MATCH(tags.tag) AGAINST('bob')
GROUP BY registrants.id
ORDER BY (rScore + tScore) DESC
Which gives me error message: "#1247 - Reference 'tscore' not supported (reference to group function)"
You could sort by the score, and use a max on a subquery to get the final preferred row.
For example:
SELECT id, story_title,
max(match_score_title) as titleScore,
max(match_score_tags) as as tagScore
FROM (
SELECT
MATCH(registrants.story_title) AGAINST('bob') as rScore,
MATCH(tags.tag) AGAINST('bob') as tScore,
registrants.id, registrants.story_title
FROM
registrants
LEFT JOIN registrant_tags on registrant_tags.registrant=registrant.id
LEFT JOIN tags on tags.id=registrant_tags.tag
WHERE rScore > 0 or tScore > 0
) AS score_matcher
group by ID
ORDER BY (rScore + tScore) DESC
That ought to work for you. It may not be the quickest query in the universe, since it's relying on subqueries, which in MySQL aren't terribly well optimized in my experience, but it should get you your results.
You could also rework it to a different subquery to take advantage of group_concat like follows:
SELECT
MATCH(registrants.story_title) AGAINST('bob') as rScore,
MATCH(tags.tag) AGAINST('bob') as tScore,
registrants.id, registrants.story_title
FROM
registrants
LEFT JOIN (
FROM rtags.registrant, GROUP_CONCACT(DISTINCT tags.tag SEPARATOR ',') as tags
FROM registrants_tags AS rtags
INNER JOIN tags on tags.id=registrants_tags.id
GROUP BY rtags.registrant
) AS grouped_tags ON registrants.id = grouped_tags.registrant
WHERE rScore > 0 or tScore > 0
ORDER BY (rScore + tScore) DESC
It would help, if in your database, you added a "grouped_tags" field to your registrant table, which could then have a fulltext index added to it - that would eliminate the need for the grouped_tags subquery. Then whenever someone updates the tags for a particular registrant, the grouped_tags field get updated with the current list of correct tags.
And if you did as I recommended about adding a grouped_tags field (that gets populated in the interface), you could replace the whole query with this, which with fulltext indexes would be pretty fast (however fulltext indexes require the use of MyISAM, which isn't exactly great).
If you did that, then this will definitely be the fastest query I've got listed here.
SELECT
MATCH(story_title) AGAINST('bob') as rScore,
MATCH(grouped_tags) AGAINST('bob') as tScore,
id, story_title
FROM
registrants
WHERE rScore > 0 or tScore > 0
GROUP BY ID
ORDER BY (rScore + tScore) DESC
So there are a pile of recommendations for you for making this query happen, and which solution you go with will depend largely on the size of your data set, and how fast the query need to be. I recommend doing some benchmarking to find which one works best for you
SELECT MATCH_SCORE_TITLE, MAX(MATCH_SCORE_TAGS), ID FROM <tablename>........GROUP BY ID
You can use SELECT DISTINCT ID with ORDER BY MATCH_SCORE_TITLE and MATCH_SCORE_TAGS
Have a try at this pattern:
SELECT
*
FROM registrants r1
LEFT JOIN registrants t2 ON r1.id = r2.id AND r1.MATCH_SCORE_TITLE > r2.MATCH_SCORE_TITLE
LEFT JOIN registrants_tags ON r1.registrants.id = registrants_tags.registrant
LEFT JOIN tags ON registrants_tags.tag = tags.id
WHERE
r2.id IS NULL AND
(MATCH(r1.registrants.full_name) AGAINST('bob')
OR MATCH(tags.tag) AGAINST('bob'))
ORDER BY (tscore + ascore) DESC
See a quick explanation of the LEFT JOIN trick in another answer.
Edit: removed the unecessary GROUP BY clause.

How to order, group, order with mySQL

Here is a simplified version of my table
tbl_records
-title
-created
-views
I am wondering how I can make a query where they are grouped by title, but the record that is returned for each group is the most recently created. I then will order it by views.
One way I guess is to do a sub query and order it by created and then group it by title and then from those results order it by views. I guess there is a better way though.
Thanks
EDIT:
SAMPLE DATA:
-title: Gnu Design
-created: 2009-11-11 14:47:18
-views: 104
-title: Gnu Design
-created:2010-01-01 21:37:09
-views:9
-title: French Connection
-created:2010-05-01 09:27:19
-views:20
I would like the results to be:
-title: French Connection
-created:2010-05-01 09:27:19
-views:20
-title: Gnu Design
-created:2010-01-01 21:37:09
-views:9
Only the most recent Gnu Design is shown and then the results are ordered by views.
This is an example of the greatest-n-per-group problem that appears frequently on StackOverflow.
Here's my usual solution:
SELECT t1.*
FROM tbl_records t1
LEFT OUTER JOIN tbl_records t2 ON (t1.title = t2.title AND
(t1.created < t2.created OR t1.created = t2.created AND t1.primarykey < t2.primarykey))
WHERE t2.title IS NULL;
Explanation: find the row t1 for which no other row t2 exists with the same title and a greater created date. In case of ties, use some unique key to resolve the tie, unless it's okay to get multiple rows per title.
select i.*, o.views from
(
select
title
, max(created) as last_created
from tbl_records
group by title
) i inner join tbl_records o
on i.title = o.title and i.last_created = o.created
order by o.views desc
I'm assuming that the aggregation to be applied to views is count(), but could well be wrong (you'll need to have some way of defining which measure of views you wish to have for the lastest created title). Hope that helps.
EDIT: have seen your sample data and edited accordingly.
SELECT title,
MAX(created),
views
FROM table
GROUP BY title
ORDER BY views DESC