mysql complicated join - mysql

I have run into some troubles while writing a query for MySQL. I don't know how to describe my problem well enough to search the web for it, so sorry if my question is stupid.
I have 3 tables:
CREATE TABLE posts( id INT, author INT );
CREATE TABLE users( id INT, nick varchar(64) );
CREATE TABLE groups( id INT, name varchar(64) );
CREATE TABLE membership (user INT, group INT, date INT ) ;
Membership contains info about users that have joined some groups. "Date" in the membership table is the time when a user joined that group.
I need a query which will return a post, its author's nick and the name of the group with the least joining date.
All I have currently is:
SELECT p.id, u.nick, g.name
FROM posts AS p
LEFT JOIN users AS u ON u.id = p.author
LEFT JOIN membership AS m ON m.user = p.author
LEFT JOIN groups AS g ON g.id = m.group
WHERE 1;
but of course it returns a random group's name, not the one with earliest joining date.
I also tried the following variant:
SELECT p.id, u.nick, g.name
FROM posts AS p
LEFT JOIN users AS u ON u.id = p.author
LEFT JOIN
(SELECT * FROM membership WHERE 1 ORDER BY date ASC)
AS m ON m.user = p.author
LEFT JOIN groups AS g ON g.id = m.group
WHERE 1;
but it gave me same result.
I would appreciate even pointers to where I could start, because at the moment I have no idea what to do with it.

I don't know why you want what you do, however, if you want the information for the earliest membership date (since there's no date for posting itself), no problem. Now, we have the earliest membership which will always point to the same one person as you are not asking for a specific group.. (or did you want the earliest person PER membership group -- which is what I'll write the query for). Now, we have the earliest user and can link to the posts table (by apparently the author), but what if someone has 20 posts under their name... Do you also want the FIRST ID for that author.
Just copying from your supplied tables as a reference...
posts: id (int), author(int)
users: id (int), nick (varchar)
groups: id (int), name (varchar)
membership: user (int), group (int), date (int)
select
u1.nick,
m2.date,
g1.name,
p1.id as PostID
from
( select m.group,
min( m.date ) as EarliestMembershipSignup
from
Membership m
group by
m.group ) EarliestPerGroup
join Membership m2
on EarliestPerGroup.Group = m2.Group
AND EarliestPerGroup.EarliestMembershipSignup = m2.Date
join groups g1
on m2.group = g1.id
join users u1
on m2.user = u1.ID
join posts p1
on u1.id = p1.author

Something like this
SELECT p.id, u.nick, g.name
FROM posts p,
users u,
membership m,
groups g
WHERE p.author = u.id
AND m.user = u.id
AND m.group = g.id
ORDER BY m.date ASC
LIMIT 1;
Take care to have good indexes when joining these 4 tables.

I'd recommend moving your date column from the membership table into your groups table since that seems to be where you're tracking that information. The membership table is just an intersection table for the many-to-many users<->groups tables. It should only contain user ID and the group ID columns.
What about this?
SELECT p.id, u.nick, g.name
FROM
users u,
posts p,
groups g
INNER JOIN membership m
ON u.id = m.user
INNER JOIN groups
ON m.group = groups.id
ORDER BY g.timestamp DESC
LIMIT 1;

Related

Table conflict with SQL request

I have 5 tables [structure] :
"medias" to store pictures [id, creatorID (user who create the media), date]
"likes" to store likes on pictures [id, senderID (user who liked), mediaID (media liked)]
"comments" to store comments on pictures [id, mediaID (media commented)]
"follow" to store a follow [id, follow (user X), following (one following of the user X)]
"users" to store users [id]
All tables are made with an ID which increment at insert.
Here my request to display a flux of pictures for an user :
SELECT
m.id as mediaID,
COUNT(l.id) as likesCount,
COUNT(c.id) as commentsCount
FROM medias m
INNER JOIN follow f
ON f.follow = 'user_here' AND m.creatorID = f.following AND m.date < 'timestamp_here'
INNER JOIN users u
ON u.id = m.creatorID
LEFT JOIN likes x
ON m.id = x.mediaID AND x.senderID = 2
LEFT JOIN likes l
ON m.id = l.mediaID
LEFT JOIN comments c
ON m.id = c.mediaID
GROUP BY m.id
When there's more than 1 comment, likesCount take the value of the commentsCount. And when I dislike a picture, the commentCount decrement of 1 comment. So, I really don't know how can I solve it...
The easy way to solve your problem is to use distinct:
SELECT m.id as mediaID,
COUNT(DISTINCT l.id) as likesCount,
COUNT(DISTINCT c.id) as commentsCount
If you have lots of likes and comments, a better way may be to aggregate before joining or use a correlated subquery.

join 2 mysql tables and get the first and last date

I have 2 mysql tables, one with the users details and the second with all the pages that the users saw (1:N)
TABLE "users"
id int(10) auto_increment primay
ip varchar(15)
lang char(2)
...
TABLE "pages"
id int(10) auto_increment primay
uid int(10) index
datetime datetime
url varchar(255)
I know is possibile to join the 2 tables, but i'm a little confused how to get the first and last datetime, and the first url from the "pages" table...
SELECT * FROM users, pages WHERE users.id = pages.uid
I think with GROUP BY / MIN(pages.datetime), MAX(pages.datetime) but I have no idea where to use it, and how I can get the first pages.url
As you mentioned you need to use Group by with MIN & MAX aggregate function to find the first and last datetime per user.
Also don't use comma separated join syntax which is quite old and not much readable use proper INNER JOIN syntax
SELECT U.ID,
MIN(pages.datetime) as First_date,
MAX(pages.datetime) as Last_date
FROM users U
INNER JOIN pages P
ON U.id = P.uid
Group by U.ID
If you want to see the other information like first visited url,etc.. Then you can join above result to the main table to get the related information.
select A.uid,A.url First_URL,C.url as Last_url,First_date,Last_date
from pages A
INNER JOIN
(
SELECT U.ID,
MIN(pages.datetime) as First_date,
MAX(pages.datetime) as Last_date
FROM users U
INNER JOIN pages P
ON U.id = P.uid
Group by U.ID
) B
ON A.ID =B.ID
and A.datetime = B.First_date
INNER JOIN pages C
on C.ID =B.ID
and C.datetime = B.Last_date

Find unique values that do not exist in multiple columns and tables

A misconfigured manual import imported our entire AD into our help desk user database, creating a bunch of extraneous/duplicate accounts. Of course, no backup to restore from.
To facilitate the cleanup, I want to run a query that will find users not currently linked to any current or archived tickets. I have three tables, USER, HD_TICKET, and HD_ARCHIVE_TICKET. I want to compare the ID field in USER to the OWNER_ID and SUBMITTER_ID fields in the other two tables, returning the only the values in USER.ID that do not exist in any of the other four columns.
How can this be accomplished?
Do a left join for each relationship where the right table id is null:
select user.*
from user
left join hd_ticket on user.id = hd_ticket.owner_id
left join hd_ticket as hd_ticket2 on user.id = hd_ticket2.submitter_id
left join hd_archive_ticket on user.id = hd_archive_ticket.owner_id
left join hd_archive_ticket as hd_archive_ticket2 on user.id = hd_archive_ticket2.submitter_id
where hd_ticket.owner_id is null
and hd_ticket2.submitter_id is null
and hd_archive_ticket.owner_id is null
and hd_archive_ticket2.submitter_id is null
How about something like:
SELECT id
FROM user
WHERE id NOT IN
(
SELECT owner_id
FROM hd_ticket
UNION ALL
SELECT submitter_id
FROM hd_ticket
UNION ALL
SELECT owner_id
FROM hd_archive_ticket
UNION ALL
SELECT submitter_id
FROM hd_archive_ticket
)
If I understood you situation I would do this:
SELECT a.id FROM user a, hd_ticket b, hd_archive_ticket c WHERE a.id != b.id AND a.id != c.id
You would want to try something like below. Inner query where I am doing Inner join with other 2 tables, will return only those user id which exist in all 3 tables. Then in your outer query I am just filtering out those ID's returned by inner query; since your goal is to get only those USER ID which is not present in other tables.
select ID
FROM USER
WHERE ID NOT IN
(
select u.ID
from user u
inner join HD_TICKET h on u.ID = h.OWNER_ID
inner join HD_ARCHIVE_TICKET ha on u.ID = ha.SUBMITTER_ID
)

MySQL JOIN queries - Messaging system

I have the following tables for a messaging system and I was wondering how I would go about querying the DB for how many conversations have new messages.
My tables are as follows
Conversation
------------
id
subject
Messages
--------
id
conversation_id
user_id (sender)
message
timestamp (time sent)
Participants
------------
conversation_id
user_id
last_read (time stamp of last view user viewed conversation)
I'm trying to do the following query but it returns no results:
SELECT COUNT(m.conversation_id) AS count
FROM (messages_message m)
INNER JOIN messages_participants p ON p.conversation_id = m.conversation_id
WHERE `m`.`timestamp` > 'p.last_read'
AND `p`.`user_id` = '5'
GROUP BY m.conversation_id
LIMIT 1
Also, I probably will have to run this on every page load - any tips of making it as fast as possible?
Cheers
EDIT
I've got another somewhat related question if anybody would be so kind as to help out.
I'm trying to retrieve the subject, last message in conversation, timestamp of last convo and number of new messages. I believe I have a working query but it looks a bit badly put together. What sort of improvements can I do to this?
SELECT SQL_CALC_FOUND_ROWS c.*, last_msg.*, new_msgs.count as new_msgs_count
FROM ( messages_conversation c )
INNER JOIN messages_participants p ON p.user_id = '5'
INNER JOIN ( SELECT m.*
FROM (messages_message m)
ORDER BY m.timestamp DESC
LIMIT 1) last_msg
ON c.id = last_msg.conversation_id
LEFT JOIN ( SELECT COUNT(m.id) AS count, m.conversation_id, m.timestamp
FROM (messages_message m) ) new_msgs
ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Should I determine if the conversations is unread by doing an IF statement in MySQL or should I convert and compare timestamps on PHP?
Thanks again,
RS7
'p.last_read' as quoted above is a string constant - remove the quotes from this and see whether that changes anything, RS7. If user_id is an integer than remove the quotes from '5' as well.
As far as performance goes, ensure you have indexes on all the relevant columns. messages_participants.user_id and messages_message.timestamp being two important columns to index.
Yes, you have problem in your query.
Firstly, you should have noticed that you count the column you are grouping, so the count result will be 1.
Secondly, you are comparing the timestamp to a string : m.timestamp > 'p.last_read'.
Finally, avoid using LIMIT when you know your query will return one row (be self-confident :p).
Try:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id
WHERE
m.timestamp > p.last_read
AND p.user_id = 5
if you want to increase the query running time you can create a new index in message_participants (conversation_id, user_id) to index the conversations per users and then change your query with:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id AND p.user_id = 5
WHERE
m.timestamp > p.last_read
So that your DB engine can now filter the JOIN by simply looking at the index table. You could go deeper in this thought by indexing the timestampe too : (timestamp, conversation_id, user_id) and put the where condition in the join condition.
Whatever you choose, always put the most selective field first, to increase selectivity.
EDIT
First, let's comment your query:
SELECT
SQL_CALC_FOUND_ROWS c.*,
last_msg.*,
new_msgs.count as new_msgs_count
FROM
messages_conversation c
INNER JOIN
messages_participants p ON p.user_id = 5 -- Join with every conversations of user 5; if id is an integer, avoid writing '5' (string converted to an integer).
INNER JOIN
( -- Select every message : you could already select here messages from user 5
SELECT
*
FROM
messages_message m
ORDER BY -- this is not the goal of ORDER BY. Use MAX to obtain to latest timestamp.
m.timestamp DESC
LIMIT 1
) last_msg ON c.id = last_msg.conversation_id -- this query return one row and you want to have the latest timestamp for each conversation.
LEFT JOIN
(
SELECT
COUNT(m.id) AS count,
m.conversation_id,
m.timestamp
FROM
messages_message m
) new_msgs ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Let's rephrase your query:
select the number of new messages of a conversation subject, its last message and timestamp for user #id.
Do it step by step:
Selecting last message, timestamp in conversation for each user:
SELECT -- select the latest timestamp with its message
max(timestamp),
message
FROM
messages_message
GROUP BY
user_id
Aggregates functions (MAX, MIN, SUM, ...) work on the current group. Read this like "for each groups, calculate the aggregate functions, then select what I need where my conditions are true". So it will result in one row per group.
So this last query selects the last message and timestamp of every user in the messages_message table. As you can see, it is easy to select this value for a specific user adding the WHERE clause:
SELECT
MAX(timestamp),
message
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
Number of messages per conversation: for each conversation, count the number of messages
SELECT
COUNT(m.id) -- assuming id column is unique, otherwise count distinct value.
FROM
messages_conversation c
INNER JOIN -- The current user participated to the conversation
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN -- Messages of the conversation where the current user participated, newer than last read its time
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
c.id -- for each conversation
INNER JOIN won't return rows for conversations where the current user did not participated.
Then OUTER JOIN will join with NULL columns if the condition is false, so that COUNT will return 0 - there is not new messages.
Putting it all together.
Select the last message and timestamp in conversation where the current user participated and the number of new messages in each conversation.
Which is a JOIN between the two last queries.
SELECT
last_msg.conversation_id,
last_msg.message,
last_msg.max_timestamp,
new_msgs.nb
FROM
(
SELECT
MAX(timestamp) AS max_timestamp,
message,
conversation_id
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
) last_msg
JOIN
(
SELECT
c.id AS conversation_id
COUNT(m.id) AS nb
FROM
messages_conversation c
INNER JOIN
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
C.id
) new_msgs ON new_msgs.conversation_id = last_msg.conversation_id
-- put here and only here a order by if necessary :)

Somewhat Complex MySQL Statement

I am creating a forum, and have gotten stuck creating the page that will display all the topics for a given forum. The three relevant tables & fields are structured as follows:
Table: forums_topics Table: forums_posts Table: users
-------------------- ------------------- ------------
int id int id int id
int forum_id int topic_id varchar name
int creator int poster
tinyint sticky varchar subject
timestamp posted_on
I've started with the following SQL:
SELECT t.id,
t.sticky,
u.name AS creator,
p.subject,
COUNT(p.id) AS posts,
MAX(p.posted_on) AS last_post
FROM forums_topics AS t
JOIN users AS u
LEFT JOIN forums_posts AS p ON p.topic_id = t.id
WHERE t.forum_id = 1
AND u.id = t.creator
GROUP BY t.id
ORDER BY t.sticky
This appears to be getting me what I want (topic's id number, if its a sticky, who made the topic, the subject of the topic, number of posts for each topic, and timestamp of latest post). If there is a mistake though please let me know.
What I am having trouble with now is how I can add to this to get the name of the lastest poster. Can someone explain how I would edit my SQL to do that? I can provide more details if needed, or restructure my tables if that will make it simpler.
Here is a simple way to do this:
SELECT t.id,
t.sticky,
u.name AS creator,
p.subject,
COUNT(p.id) AS posts,
MAX(p.posted_on) AS last_post,
(SELECT name FROM users
JOIN forums_posts ON forums_posts.poster = users.id
WHERE forums_posts.id = MAX(p.id)) AS LastPoster
FROM forums_topics AS t
JOIN users AS u
LEFT JOIN forums_posts AS p ON p.topic_id = t.id
WHERE t.forum_id = 1
AND u.id = t.creator
GROUP BY t.id
ORDER BY t.sticky
Basically, you do a sub-query to find the user based upon the max id. If your IDs are GUIDs or are not in order for some other reason, you could do the lookup based upon the posted_on timestamp instead.