Getting unread message count from table messages and messages-seen - mysql

I have two tables, student-messages and student-messages-seen
My problem is I cannot get a query to run whereby I can say that globally '(user) has 4 unread messages'
I've created a mysqlfiddle here
I think i've gotten the second query right, But I would like someone to confirm that its correct. It produces the right number but I am not sure. Please could someone help me achieve getting the two different counts correct?

Your second query is correct. To make it global, just remove the thread id where condition. You will want an index on student-messages recipient.

I would phrase your first query as follows (changes are commented):
SELECT COUNT(*) FROM `student-messages` t1
LEFT JOIN `student-messages-seen` t2
ON t2.messageID = t1.id
AND t2.userID = t1.recipient -- moved from the WHERE clause
WHERE
t1.recipient = 1001
-- AND t2.userID = 1001 -- moved to the ON part of the JOIN
AND t2.id IS NULL -- fiter on unread messages
ORDER BY t1.ts DESC;
The second query is essentially the same, with just an additional filter on the thread:
SELECT COUNT(*) FROM `student-messages` t1
LEFT JOIN `student-messages-seen` t2
ON t2.messageID = t1.id
AND t2.userID = t1.recipient
WHERE
t1.recipient = 1001
AND t1.thread = 69
AND t2.id IS NULL
ORDER BY t1.ts DESC;

Related

MySQL: LIMIT parameter in JOIN producing unexpected results

I have read here that MySQL processes ordering before applying limits. However, I receive different results when applying a LIMIT parameter in conjunction with a JOIN subquery. Here is my query:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / matches)
LIMIT 0, 50 -- PROBLEM IS HERE (I think)
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;
Here is a rough description of what I think is happening:
The sub-query selects a bunch of ids from table_one that meet the criteria
These are ordered by (t2.counts / matches)
The top 50 (in ascending order) are fashioned into a table
This resulting table is then joined on the the id column
Results are returned from the top level JOIN - without a GROUP BY clause this time. table_one is a reference table so this will return many rows with the same ID.
I appreciate that some of these joins don't make a lot of sense but I have stripped down my query for readability - it's normally quite chunky .
The problem is that when, I include the LIMIT parameter I get a different set of results and not just the top 50. What I want to do is get the top results from the subquery and use these to join onto a bunch of other tables based on the reference table.
Here is what I have tried so far:
LIMIT on the outer query (this is undesirable as this cuts off important information).
Trying different LIMIT tables and values.
Any idea what is going wrong, or what else I could try?
I have found a solution to my problem. It seems as if my matches column name does can't be used in my ORDER BY clause - which is weird since I don't get an error. Either way, this solves the problem:
SELECT
t1.id,
(t2.counts / c.matches)
FROM
table_one t1
JOIN
table_two t2 ON t1.id = t2.id
JOIN
(
SELECT
t1.id, COUNT(DISTINCT t1.id) AS matches
FROM
table_one t1
JOIN table_two t2 ON t1.id = t2.id
WHERE
t1.id IN (3390 , 3236, 148, 2811, 829, 137)
AND t2.value_one <= 30
AND t2.value_two < 2
GROUP BY t1.id
ORDER BY (t2.counts / COUNT(DISTINCT t1.id)) -- This line is changed
LIMIT 0, 50
) c ON c.id = t1.id
ORDER BY (t2.counts / c.matches), t1.id;

How to optimize mysql on left join

I try to explain a very high level
I have two complex SELECT queries(for the sake of example I reduce the queries to the following):
SELECT id, t3_id FROM t1;
SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id;
query 1 returns 16k rows and query 2 returns 15k
each queries individually takes less than 1 second to compute
However what I need is to sort the results using column added of query 2, when I try to use LEFT join
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
(SELECT t3_id, MAX(added) as last FROM t2 GROUP BY t3_id) AS t_t2
ON t_t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY t_t2.last
However, the execution time goes up to over a 1 minute.
I like to understand the reason
what is the cause of such a huge explosion?
NOTE:
ALL the used columns on every table have been indexed
e.g. :
table t1 has index on id,t3_Id
table t2 has index on t3_id and added
EDIT1
after #Tim Biegeleisen suggestion, I change the query to the following now the query is executing in about 16 seconds. If I remove the ORDER BY it query gets executed in less than 1 seconds. The problem is that ORDER BY the sole reason for this.
SELECT
t1.id, t1.t3_Id
FROM
t1
LEFT JOIN
t2 ON t2.t3_id = t1.t3_id
GROUP BY t1.t3_id
ORDER BY MAX(t2.added)
Even though table t2 has an index on column t3_id, when you join t1 you are actually joining to a derived table, which either can't use the index, or can't use it completely effectively. Since t1 has 16K rows and you are doing a LEFT JOIN, this means the database engine will need to scan the entire derived table for each record in t1.
You should use MySQL's EXPLAIN to see what the exact execution strategy is, but my suspicion is that the derived table is what is slowing you down.
The correct query should be:
SELECT
t1.id,
t1.t3_Id,
MAX(t2.added) as last
FROM t1
LEFT JOIN t2 on t1.t3_Id = t2.t3_Id
GROUP BY t2.t3_id
ORDER BY last;
This is happen because a temp table is generating on each record.
I think you could try to order everything after the records are available. Maybe:
select * from (
select * from
(select t3_id,max(t1_id) from t1 group by t3_id) as t1
left join (select t3_id,max(added) as last from t2 group by t3_id) as t2
on t1.t3_id = t2.t3_id ) as xx
order by last

Select results from 2 tables if particular match in the third one

I have three tables
t1
--------------
userID
userEmail
userName
userType
t2
--------------
businessID
businessUserID
t3
--------------
recordID
recordBusinessID
action (ENUM: pending, open, closed)
I need to retrieve records if results found in t3 ONLY has records with with action = 'pending'.
SELECT
t2.businessID,
t1.userEmail,
t1.userName
FROM t2
LEFT JOIN t1 ON (t1.userID = t2.businessUserID)
LEFT JOIN t3 ON (t3.recordBusinessID = t2.businessID)
WHERE userType = 'active'
AND t3.action = 'pending'
t3.action != 'open'
t3.action != 'closed'
It seems like I should be getting results, because my current t3 is empty, but I don't. What am I missing?
t3 can have results but I only need to match if t3.action is nothing but 'pending'.
Comparing Null with any value returns Null even if you check not equal to. So to test that there is no corresponding value or it is not equal to something, do so:
where ... and (t3.action is null or t3.action != 'pending')
Why are you using LEFT JOIN? If you only want records that match t3.action = 'pending', a normal join will work. I would rewrite your query as:
SELECT
t2.businessID,
t1.userEmail,
t1.userName
FROM t1, t2, t3
WHERE t1.userID = t2.businessUserID
AND t2.businessID = t3.recordBusinessID
AND t1.userType = 'active'
AND t3.action = 'pending'
For this to work, you will need data in all three tables. If t3 is empty, you will get no results.
A LEFT JOIN is typically used to find ALL the rows in the left tables that match the WHERE clause, plus any data in the right tables that matches the ON clause. From the mysql manual:
If there is no matching row for the right table in the ON or USING
part in a LEFT JOIN, a row with all columns set to NULL is used for
the right table. You can use this fact to find rows in a table that
have no counterpart in another table.
If this doesn't work for you, please clarify your question and post some example data from your t1, t2, and t3 tables and I will try to help.

5 tables, scan first to find single match in 1st relational pair or 2nd relational pair without full table scans

I am still learning mysql and am not even sure how to phrase this to find the answer in a search.
I have 5 tables (actually more, but for the example, 5 suffices), one is the main table, T, and then we have T1 and T2 and their respective relational tables T1_x_T and T2_x_T. I need to go through every row in T to find if there is a match in either T1 or T2 with a given attribute, it only needs to match once, but can have multiple matches. Table structure is something like:
T.id
T1.id T1.attrib
T2.id T2.attrib
T1_x_T.T1_id, T1_x_T.T_id
T2_x_T.T2_id, T2_x_T.T_id
If the entry in T has a match in either T1 or T2 on that attrib something like:
(T.id = T1_x_T.T_id and T1.id = T1_x_T.T1_id and T1.attrib = SOMEVAL) or (T.id = T2_x_T.T_id and T2.id = T2_x_T.T2_id and T2.attrib = SOMEVAL)
Ie, as soon as it finds a match for T, move on to the next row in T and don't scan the rest of the table nor move to the next table. Basically to answer the question: "For each id in T, is there any match in T1_x_T or T2_x_T where the corresponding T1 or T2 value matches a given value for attrib?"
So the result would be a subset of table T.
My initial intuition is to use LEFT INNER JOIN, LIMIT and GROUP BY to achieve this, but I don't know enough about either (or mysql) to know how to accomplish this or if it is accomplishable. I do know how to do this in what I assume is the inefficient way (full table scans for both?) or in two queries and then parse the results outside of mysql, but I want to learn how to build nice efficient queries.
Sample data, as requested, for query where attrib = 1:
T.id:
i1
i2
i3
T1.id - T1.attrib:
a - 1
b - 0
T1_x_T.T1_id - T1_x_T.T_id:
a - i1
b - i1
b - i2
T2.id - T2.attrib:
y - 0
z - 1
T2_x_T.T2_id - T2_x_T.T_id:
z - i3
y - i2
Results in:
i1
i3
Since T1.id = a has T1.attrib = 1 and T1_x_T.T1_id = a has entry with T1_x_T.T_id = i1; and T2.id = z has T2.attrib = 1 and T2_x_T.T2_id = a has entry with T2_x_T.T_id = i3.
Hope that helps explain a bit.
Try this:
SELECT
T.id as T_id
FROM T
LEFT JOIN T1_x_T ON T.id= T1_x_T.T_id
LEFT JOIN T1 ON T1.id = T1_x_T.T1_id
LEFT JOIN T2_x_T ON T.id= T2_x_T.T_id
LEFT JOIN T2 ON T2.id = T2_x_T.T2_id
WHERE T1.attributes = '1' OR T2.attribute = '1';
Well this maps your question:
"For each id in T, is there any match in T1_x_T or T2_x_T where the
corresponding T1 or T2 value matches a given value for attrib?"
and provide your expected result in the example.
Just to clarify how things work.
LEFT JOINS combine all the rows following the ON clause like T.id = T1_x_T.T_id. If the join find n different T and m different records in T1_x_T that respect the ON clause, it will produce a m x n result with al the possible values.
So here is the result of the joins in your case:
Where you see NULL is what you mean with short circuit, there is no match, so no result.
When you put the WHERE or the GROUP BY you are acting on this extended table result of JOIN to put your conditions.
By the way, when you are trying a complex join it can be a good idea to look the complete results to better understand if you are doing it right and select the appropriate conditions to obtain the desired result.
Regards
I would suggest indeed the use of INNER JOIN, but combined with UNION:
SELECT T.id
FROM T
INNER JOIN T1_x_T
ON T1_x_T.T_id = T.id
INNER JOIN T1
ON T1.id = T1_x_T.T1_id
WHERE T1.attrib = 1
UNION
SELECT T.id
FROM T
INNER JOIN T2_x_T
ON T2_x_T.T_id = T.id
INNER JOIN T2
ON T2.id = T2_x_T.T2_id
WHERE T2.attrib = 1
Here is a fiddle.
As your condition concerns the columns of the joined tables, you should not use outer joins like LEFT JOIN in this case. Although the output would be the same, LEFT JOIN is generally more expensive in terms of performance.
The UNION clause will also make sure you don't get duplicates.
Also, if you are only interested in the id value of table T, then you don't need to include that table at all in the query, and this would be better:
SELECT T1_x_T.T_id
FROM T1_x_T
INNER JOIN T1
ON T1.id = T1_x_T.T1_id
WHERE T1.attrib = 1
UNION
SELECT T2_x_T.T_id
FROM T2_x_T
INNER JOIN T2
ON T2.id = T2_x_T.T2_id
WHERE T2.attrib = 1
Fiddle
You might also compare the performance with this alternative, which performs sub queries. One might expect it will skip the second one if the first one gives a match, but as this may differ per id value, there really is no gain: both sub queries will be executed first before the matchings with the id values are done. There a short-cicuit will take place, but only for the comparison with the already generated result sets:
SELECT id
FROM T
WHERE id IN (
SELECT T1_x_T.T_id
FROM T1_x_T
INNER JOIN T1
ON T1.id = T1_x_T.T1_id
WHERE T1.attrib = 1)
OR id IN (
SELECT T2_x_T.T_id
FROM T2_x_T
INNER JOIN T2
ON T2.id = T2_x_T.T2_id
WHERE T2.attrib = 1)
Fiddle
One could potentially force the short-circuit with correlated sub queries, but then such sub query has to be executed again and again for each id. And even though in some cases it would not have to repeat that for the second sub query, the loss in performance, due to the repeated executions for different id values, will be much greater than the gain from the short-circuit evaluation. Also the execution plan might see an optimisation and thus not follow the produce I just described:
SELECT id
FROM T
WHERE EXISTS (
SELECT 1
FROM T1_x_T
INNER JOIN T1
ON T1.id = T1_x_T.T1_id
WHERE T1.attrib = 1
AND T1.id = T.id)
OR EXISTS (
SELECT 1
FROM T2_x_T
INNER JOIN T2
ON T2.id = T2_x_T.T2_id
WHERE T2.attrib = 1
AND T2.id = T.id)

COUNT(*) returning number of rows before GROUP BY clause?

I have the follwing query:
SELECT COUNT(*)
FROM mydb.table1 t1
JOIN mydb.table2 t2 ON t1.id = t2.t1_id
WHERE t1.user_id = 44 AND t1.date_deleted IS NULL
GROUP BY t2.system_id, CASE WHEN t2.system_id IS NULL THEN t2.id ELSE 0 END
It returns COUNT(*) = 6, when it should be returning 1 since all six rows for this user have the same t2.system_id (so they should be grouped).
If I change the query to select * instead of COUNT(*), it only returns a single row. If I then remove the GROUP BY clause, six rows are returned.
This makes me think COUNT(*) is returning the row count before the GROUP BY clause is executed, but from what I've read that's not how it's supposed to work.
Is this behavior normal?
Try this:
select count(*) from (
SELECT *
FROM mydb.table1 t1
JOIN mydb.table2 t2 ON t1.id = t2.t1_id
WHERE t1.user_id = 44
AND t1.date_deleted IS NULL
GROUP BY t2.system_id,
CASE WHEN t2.system_id IS NULL THEN t2.id ELSE 0 END
) q1
count gives you the number of (not null) items in each group, so yes, it is definitely working the way it is intended. This means that if you just want the total number of groups, the easiest way is to just wrap it in another query.
It returns the count of items in each group. You have one group with six items in it, so it returns one row containing a column valued 6.