LEFT JOIN not working as expected with sub-query - mysql

I've got the SQL query below:
SELECT message, sent_date, user_id
FROM messages
LEFT JOIN numbers ON messages.from_id = numbers.id
It returns all the rows (about 4000) in the messages table with additional columns coming from the numbers table. So far, this is what I would expect.
Now I left join this sub-query to another table, again using a left join:
SELECT message, sent_date
FROM (
SELECT message, sent_date, user_id
FROM messages
LEFT JOIN numbers ON messages.from_id = numbers.id
) AS table1
LEFT JOIN users ON table1.user_id = users.id
However, it only returns about 200 rows so many are missing. Since this is a left join I would expect all the rows from table1 to be in the result. Can anybody see what the issue is?
Edit:
So for information here are the 3 relevant tables (with irrelevant columns removed):
CREATE TABLE IF NOT EXISTS `messages` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message` text CHARACTER SET utf8 NOT NULL,
`from_id` int(11) DEFAULT NULL,
`sent_date` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `from_id` (`from_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=101553 ;
CREATE TABLE IF NOT EXISTS `numbers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`number` varchar(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=6408 ;
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(256) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=2395 ;

You can try alternative method to debug the issue:
CREATE TEMPORARY table tmp1 AS SELECT message, sent_date, user_id
FROM messages
LEFT JOIN numbers
ON messages. from_id = numbers.id;
and then see whether this query works.
SELECT message, sent_date
FROM tmp1 table1
LEFT JOIN users
ON table1.user_id = users.id;
Also for your case make sure that there are no other insert or updates in between. otherwise use transactions.

table1 sometimes won't have a UserID - so that'll be null, so those results will be missing?

I don't have an exact answer to your question, but if I have to start thinking, I will first find out what 3800 rows are missing and try to see the pattern (is it because user_id are null or duplicate)
SELECT message, sent_date, user_id
FROM messages
LEFT JOIN numbers ON messages.from_id = numbers.id
MINUS
(SELECT table1.message, table1.sent_date, table1.user_id
FROM (
SELECT message, sent_date, user_id
FROM messages
LEFT JOIN numbers ON messages.from_id = numbers.id
) AS table1
LEFT JOIN users ON table1.user_id = users.id)

Try this, I think it's a scoping issue on user_id.
SELECT table1.message, table1.sent_date
FROM (
SELECT messages.message, messages.sent_date, numbers.user_id
FROM messages
LEFT JOIN numbers ON messages.from_id = numbers.id
) AS table1
LEFT JOIN users ON table1.user_id = users.id
I'm not sure if user_id is in messages or numbers.

There is no way this should happen.
Try this variation:
SELECT
m.message, m.sent_date, n.user_id
FROM
messages m
LEFT JOIN
numbers AS n ON m.from_id = n.id
LEFT JOIN
users AS u ON n.user_id = u.id ;

Related

Optimizing a query for loading message history in a chat app

I have 2 tables, which are a users table, and a messages table
`users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(35) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=MyISAM AUTO_INCREMENT=859312 DEFAULT CHARSET=utf8
`messages` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sender_id` int(11) NOT NULL,
`receiver_id` int(11) NOT NULL,
`message` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `by_sender_id_and_receiver_id` (`sender_id`,`receiver_id`),
KEY `by_sender_id` (`sender_id`),
KEY `by_receiver_id` (`receiver_id`)
) ENGINE=MyISAM AUTO_INCREMENT=56762871 DEFAULT CHARSET=latin1
When a user (whose user id is 108) loads their chat history, I am currently using the following query to list all the people that user has messaged, ordered by most recent.
SELECT u.username, m.sender_id, m.receiver_id, m.date
FROM messages m
JOIN users u ON ( u.id = m.sender_id
AND m.receiver_id = 108
OR u.id = m.receiver_id
AND m.sender_id = 108 )
GROUP BY u.id
ORDER BY m.date DESC
When I use EXPLAIN, I get the following results
I am wondering if there are any obvious ways to optimize this query, whether it is by altering indexes or rewriting the query itself. My messages table has over 50 million rows.
(from Comment) The GROUP BY is to only select the last message from each user.
The OR criterion in the join is a real performance killer. One way to workaround this would be to phrase the query using a union:
SELECT u.username, m.sender_id, m.receiver_id, m.date
FROM messages m
INNER JOIN users u ON u.id = m.receiver_id
WHERE m.sender_id = 108
UNION ALL
SELECT u.username, m.receiver_id, m.sender_id, m.date
FROM messages m
INNER JOIN users u ON u.id = m.sender_id
WHERE m.receiver_id = 108;
The above query can be optimized by adding the following indices to the messages table:
CREATE INDEX msg_idx_1 ON messages (sender_id, receiver_id, date);
CREATE INDEX msg_idx_2 ON messages (receiver_id, sender_id, date);
These indices should speed up the joins in the two halves of the union query above.
Note that I dropped the GROUP BY clause, which wasn't doing anything and also seemed not needed.

How to get corresponding FIELD to MAX(DATE) in mySQL?

My two tables are as follows :
Table 1 : Transaction
TRANS_ID (primary key), TRANS_DATE, TRANS_STATUS, USER_ID (Foreign_Key)
The same user_id will be repeated when status change
Table 2 : Users
USER_ID (Primary_Key), USER_NAME, USER_DOB, OTHER_INFO
I want to get the user information along with last transaction status.
I am familiar with the following query.
SELECT MAX(Transaction.TRANS_DATE),Transaction.TRANS_STATUS, Users.USER_NAME, Users.USER_DOB
FROM Users
INNER_JOIN Transaction ON Transaction.USER_ID = Users.USER_ID
WHERE Transaction.USER_ID = #UserID
I pass the UserID with Parameter.AddWithValue. Unfortunately, this method does not return the TRANS_STATUS for the MAX(TRANS_DATE). It does return MAX(TRANS_DATE) but TRANS_STATUS is the first occurrence, not the corresponding STATUS to MAX(TRANS_DATE) record.
Please let me know how I could get the TRANS_STATUS for the MAX(TRANS_DATE). I prefer using INNER JOIN but recommendations are appreciated.
I could not still get to working.
Here are my table scripts.
CREATE TABLE `Transactions` (
`TRANS_ID` int(11) NOT NULL,
`TRANS_DATE` datetime NOT NULL,
`TRANS_STATUS` varchar(45) NOT NULL,
`USER_ID` int(11) NOT NULL,
PRIMARY KEY (`TRANS_ID`),
UNIQUE KEY `TRANS_ID_UNIQUE` (`TRANS_ID`),
KEY `USER_ID_idx` (`USER_ID`),
CONSTRAINT `USER_ID` FOREIGN KEY (`USER_ID`) REFERENCES `Users` (`USER_ID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `Users` (
`USER_ID` int(11) NOT NULL AUTO_INCREMENT,
`USER_NAME` varchar(45) NOT NULL,
`USER_DOB` datetime NOT NULL,
`OTHER_INFO` varchar(45) NOT NULL,
PRIMARY KEY (`USER_ID`),
UNIQUE KEY `USER_ID_UNIQUE` (`USER_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
If I understood right this could work for you
SELECT A.USER_ID AS USER_ID, A.TRANS_DATE AS TRANS_DATE, TRANS_STATUS, USER_NAME, USER_DOB
FROM
(SELECT USER_ID, MAX(TRANS_DATE) AS TRANS_DATE FROM TRANSACTION
GROUP BY USER_ID) A
INNER JOIN
(SELECT USER_ID, TRANS_DATE, TRANS_STATUS FROM TRANSACTION) B
ON A.USER_ID = B.USER_ID
AND A.TRANS_DATE=B.TRANS_DATE
INNER JOIN USERS U
ON A.USER_ID=U.USER_ID;
SELECT Users.USER_NAME, Users.USER_DOB
FROM Users usr INNER JOIN(
SELECT Transaction.TRANS_STATUS, MAX(Transaction.TRANS_DATE)
FROM Transaction GROUP BY Transaction.TRANS_STATUS) trs ON trs.USER_ID=usr.USER_ID
You can use LAST_VALUE function
SELECT Top 1
u.User_ID
,u.user_name
,u.user_dob
,u.other_info
,LAST_VALUE(t.Trans_Date) OVER (PARTITION BY t.user_id ORDER BY
t.Trans_Date RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) AS Max_Tran_Date
,LAST_VALUE(t.trans_status) OVER (PARTITION BY t.user_id ORDER BY
t.Trans_Date RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as Last_Status
FROM #Users as u
INNER JOIN #Transaction as t ON t.user_id = u.user_id
WHERE u.User_ID = #UserID
You could use a join on the subquery for max TRANS_DATE
select t.max_trans, t.USER_NAME, t.USER_DOB
from Transaction
INNER JOIN
(
SELECT MAX(Transaction.TRANS_DATE) max_trans,
Transaction.USER_ID, Users.USER_NAME, Users.USER_DOB
FROM Users
INNER_JOIN Transaction ON Transaction.USER_ID = Users.USER_ID
WHERE Transaction.USER_ID = #UserID
) t on Transaction.USER_ID = t.USER_ID and t.max_trans = Transaction.TRANS_DATE

MysqL big table query optimization

I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data.
This is my messages table
CREATE TABLE IF NOT EXISTS `messages` (
`_id` int(11) NOT NULL AUTO_INCREMENT,
`fromid` int(11) NOT NULL,
`toid` int(11) NOT NULL,
`message` text NOT NULL,
`attachments` text NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '0',
`date` datetime NOT NULL,
`delete` varchar(50) NOT NULL,
`uuid_read` varchar(250) NOT NULL,
PRIMARY KEY (`_id`),
KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;
and this is my users table (simplified)
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`login` varchar(50) DEFAULT '',
`sex` tinyint(1) DEFAULT '0',
`status` varchar(255) DEFAULT '',
`avatar` varchar(30) DEFAULT '0',
`last_active` datetime DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;
And here is my query (for user with id 1930)
select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10
Please help to optimize this query
What I need,
Who user talked to (name, sex, and etc)
What was the last message (from me or to me)
Count of messages (all)
Count of unread messages (only to me)
The query works well, but takes too long.
The output must be like this
You have some design problems on your query and database.
You should avoid keywords as column names, as that delete column or the count column;
You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
Try to limit your result before the join conditions (with a derived table) to perform less processing;
I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.
SELECT SQL_CALC_FOUND_ROWS
u.id,
u.login,
u.sex,
u.birthdate,
u.avatar,
u.last_active AS online_status,
g._count,
CASE WHEN m.toid = 1930
THEN g.nr_count
ELSE 0
END AS nr_count,
m.`date`,
m.message AS last_mesg
FROM
(
SELECT
MAX(_id) AS _id,
COUNT(*) AS _count,
COUNT(*) - SUM(m.status) AS nr_count
FROM messages m
WHERE 1=1
AND m.`delete` NOT LIKE '%1930;%'
AND
(0=1
OR m.fromid = 1930
OR m.toid = 1930
)
GROUP BY
CASE WHEN m.fromid = 1930
THEN m.toid
ELSE m.fromid
END
ORDER BY MAX(`date`) DESC
LIMIT 0, 10
) g
INNER JOIN messages AS m ON 1=1
AND m._id = g._id
LEFT JOIN users AS u ON 0=1
OR (m.fromid <> 1930 AND u.id = m.fromid)
OR (m.toid <> 1930 AND u.id = m.toid)
ORDER BY m.`date` DESC
;

MySQL joining most recent record from another query is slow

I have the two following tables:
CREATE TABLE `modlogs` (
`mod` int(11) NOT NULL,
`ip` varchar(39) CHARACTER SET ascii NOT NULL,
`board` varchar(58) CHARACTER SET utf8 DEFAULT NULL,
`time` int(11) NOT NULL,
`text` text NOT NULL,
KEY `time` (`time`),
KEY `mod` (`mod`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4
CREATE TABLE `mods` (
`id` smallint(6) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
`password` char(64) CHARACTER SET ascii NOT NULL COMMENT 'SHA256',
`salt` char(32) CHARACTER SET ascii NOT NULL,
`type` smallint(2) NOT NULL,
`boards` text CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=933 DEFAULT CHARSET=utf8mb4
I want to join the most recent log entry with the mod's name, however my query is very slow (takes 5.23 seconds):
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = mods.id
AND modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
All other answers on SO also seem to use dependent subqueries. Is there a way I can do this in a way that will return results more quickly?
Here's another solution, putting the subquery into a derived table avoids the problem of a dependent subquery. It'll run the subquery just once.
SELECT *
FROM mods AS m
LEFT JOIN (
SELECT ml1.*
FROM modlogs AS ml1
JOIN (
SELECT `mod`, MAX(time) AS time
FROM modlogs
GROUP BY `mod`
) AS ml2 USING (`mod`, time)
) AS ml ON m.id = ml.`mod`;
This is your query:
SELECT *
FROM mods LEFT JOIN
modlogs
ON modlogs.mod = (SELECT MAX(time)
FROM modlogs
WHERE mods.id = modlogs.mod
);
This query does not make sense. You are comparing something called mod to a max time. Sounds like it won't work to me, but then there are some very "clever" data models out there. I suspect you really want:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod and
modlogs.time = (SELECT MAX(time)
FROM mods
WHERE mods.id = modlogs.mod
);
I wouldn't write the query this way, because join conditions in the on clause seem confusing to me. But, you did. You can get better performance with an index. I would suggest:
create index modlogs_mod_time on modlogs(mod, time);
I would write the query as:
SELECT *
FROM mods LEFT JOIN
modlogs
ON mods.id = modlods.mod
WHERE NOT EXISTS (SELECT 1
FROM modlogs ml2
WHERE modlogs.mod = ml2.mod and
ml2.time > modlogs.time
);
I think you can also solve this one with an anti-join, though I'm skeptical of the performance on this one:
SELECT mods.*, modlogs.*
FROM mods
LEFT JOIN modlogs
ON modlogs.mod = mods.id
LEFT JOIN mods m2
ON m2.id = modlogs.mod
AND m2.time < modlogs.time
WHERE m2.id IS NULL
Ensure you have an index on modlogs(mod), and consider the index mods(id, time) for better performance.

Select 3 tables with count and join

I've 3 tables tb1, users, users_credits.
My gol is to combine two select (sel1, sel2) into a single view and
display 0 in the sel2 where there isn't rows (left join?)
sel1
SELECT
users.userid,
users.datareg,
users_credits.credits,
FROM
users,
users_credits,
WHERE
users.userid = users_credits.userid
Sel2
SELECT COUNT(*) FROM tb1 where tb1.id_user = users.userid
table structure
tb1
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_user` decimal(11,0) NOT NULL,
`datains` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
users
`userid` int(4) unsigned NOT NULL AUTO_INCREMENT,
`datareg` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`userid`)
users_credits
`id` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(11) NOT NULL,
`credits` decimal(5,0) NOT NULL,
`data` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
What is the best way to do this?
Thanks.
SELECT users.userid,
users.datareg,
users_credits.credits,
COALESCE(c.totalCount,0) totalCount
FROM users
LEFT JOIN users_credits
ON users.userid = users_credits.userid
LEFT JOIN
(
SELECT id_user, COUNT(*) totalCount
FROM tb1
GROUP BY id_user
) c ON c.id_user = users.userid
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
UPDATE 1
SELECT users.userid,
users.datareg,
users_credits.credits,
COALESCE(c.totalCount,0) totalCount,
c.max_datains
FROM users
LEFT JOIN users_credits
ON users.userid = users_credits.userid
LEFT JOIN
(
SELECT id_user, MAX(datains) max_datains, COUNT(*) totalCount
FROM tb1
GROUP BY id_user
) c ON c.id_user = users.userid
UPDATE 2
you need to create two views for this:
1st View:
CREATE VIEW tbl1View
AS
SELECT id_user, MAX(datains) max_datains, COUNT(*) totalCount
FROM tb1
GROUP BY id_user
2nd View
CREATE VIEW FullView
AS
SELECT users.userid,
users.datareg,
users_credits.credits,
COALESCE(c.totalCount,0) totalCount,
c.max_datains
FROM users
LEFT JOIN users_credits
ON users.userid = users_credits.userid
LEFT JOIN tbl1View c ON c.id_user = users.userid