Is this possible in one fast mysql query? - mysql

I have three tables and i need different data from all of them. Sadly i also need to be able to extract the latest row.
Here are my tables:
messages: I am just storing the content of the messages inside a table because one text could be sent to multiple users
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| message_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| content | varchar(255) | NO | | 0 | |
+------------+------------------+------+-----+---------+----------------+
conversations: This table just reflects a single conversation between two users.
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| partner_id | int(10) unsigned | NO | MUL | NULL | |
| conversation_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| expedition_id | int(11) | NO | | NULL | |
| active | tinyint(4) | NO | | 1 | |
+-----------------+------------------+------+-----+---------+----------------+
conversation_messages: This table stores the information about the actual messages exchanged.
+-----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| message_id | int(11) unsigned | NO | PRI | NULL | |
| receiver_id | int(11) unsigned | NO | PRI | NULL | |
| conversation_id | int(11) unsigned | NO | MUL | NULL | |
| status | tinyint(4) | NO | | NULL | |
| timestamp | datetime | YES | | NULL | |
+-----------------+------------------+------+-----+---------+-------+
What i want to do now to select the latest message inside each conversation and get the content from this message aswell. (It sounds simple, but it did not find a simple solution). What i tried is the following:
SELECT max(c_m.message_id), m.content, c_m.`status`
FROM expedition_conversations e_c, conversation_messages c_m
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND (c_m.conversation_id = e_c.conversation_id)
GROUP BY c_m.conversation_id;
Sadly since GROUP BY internally seems to selecting the first inserted row most of the time, the content i select from the messages table is wrong, while the message_id selected from conversation_messages is correct.
Any idea how to perform this in one query? If you have any suggestions to alter the table structure, i would also appreciate those.
Thanks in advance.

You may want to try this version:
SELECT c_m.message_id, m.content, c_m.`status`
FROM expedition_conversations e_c join
conversation_messages c_m
ON c_m.conversation_id = e_c.conversation_id INNER JOIN
messages m
ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND
NOT EXISTS (SELECT 1
FROM conversation_messages cm2
WHERE cm2.conversation_id = c_m.conversation_id AND
cm2.timestamp > c_m.timestamp
)
For performance, you want an index on conversation_messages(conversation_id, timestamp).

This is possible, because your usage of AUTO_INCREMENT means, that the highest id belongs to the latest message:
SELECT
messages.*,
FROM
conversations
INNER JOIN (
SELECT conversation_id, MAX(message_id) AS maxmsgid
FROM conversation_messages
GROUP BY conversation_id
) AS latest ON latest.conversation_id=conversations.id
INNER JOIN messages
ON messages.message_id=latest.maxmsgid
WHERE
1=1 -- whatever you want or need!
Since this query is bound to be quite slow, you might want to consider a few options:
Throw hardware at it: Use enough RAM and configure MySQL to go to disk for the interims table as late as possibel
Use denormalization and have a ON AFTER INSERT trigger on messages update a field on conversation_messages, that holds the latest message ID

Try this little trick:
SELECT c_m.message_id, m.content, c_m.status
FROM expedition_conversations e_c
JOIN (select * from (
select * from conversation_messages
order by message_id desc) x
group by conversation_id) c_m ON c_m.conversation_id = e_c.conversation_id
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1
This will work on your version if mysql - 5.6.19 - and should out-perform other approaches.

Related

Mysql Query performance very slow

The below query was taking more than 8 min and 900 000 rows processed. it is very slow and affect my product. I can't identify why the query getting slow, all index are set fine.
explain SELECT
COUNT(DISTINCT (cinfo.CONTACT_ID))
FROM
cinfo
INNER JOIN
LTocMapping ON cinfo.CONTACT_ID = LTocMapping.CONTACT_ID
WHERE
(((((((((cinfo.COUNTRY LIKE '%Panama%')
OR (cinfo.COUNTRY LIKE '%PANAMA%'))
AND (((cinfo.CONTACT_EMAIL NOT LIKE '%test%')
AND (cinfo.CONTACT_EMAIL NOT LIKE '%engine%'))
OR (cinfo.CONTACT_EMAIL IS NULL)))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000514445053,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000520574130,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND (LTocMapping.LIST_ID IN (221715000520574130 , 221715000201569885)))
AND (LTocMapping.STATUS = BINARY 'subscribed'))
AND (((cinfo.CONTACT_STATUS = BINARY 'active')
OR (cinfo.CONTACT_STATUS = BINARY 'softbounce'))
AND (LTocMapping.STATUS = BINARY 'subscribed')))
AND (((cinfo.CONTACT_ID >= 221715000000000000)
AND (cinfo.CONTACT_ID <= 221715999999999999))
OR ((cinfo.CONTACT_ID >= 0)
AND (cinfo.CONTACT_ID <= 999999999999))))
And the answer will be
Below tables FYR
Table 1 :
mysql> desc cinfo;
+------------------------+--------------+------+-----+-----------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+-----------+-------+
| CONTACT_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_EMAIL | varchar(100) | NO | MUL | NULL | |
| TITLE | varchar(20) | YES | | NULL | |
| FIRSTNAME | varchar(100) | YES | | NULL | |
| LASTNAME | varchar(50) | YES | | NULL | | |
| ADDED_BY | varchar(20) | YES | | NULL | |
| ADDED_TIME | bigint(19) | NO | | NULL | |
| LAST_UPDATED_TIME | bigint(19) | NO | | NULL | |
+------------------------+--------------+------+-----+-----------+-------+
Table 2 :
mysql> desc LTocMapping;
+---------------------+--------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+------------+-------+
| MAPPING_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_ID | bigint(19) | NO | MUL | NULL | |
| LIST_ID | bigint(19) | NO | MUL | NULL | |
| STATUS | varchar(100) | YES | | subscribed | |
| MAPPING_STATUS | varchar(20) | YES | | connected | |
| MAPPING_TIME | bigint(19) | YES | | NULL | |
+---------------------+--------------+------+-----+------------+-------+
As Far as I can tell, your subqueries are the bottleneck:
For the first subquery, you are using LTocMapping.CONTACT_ID
For the second subquery, you are using LTocMapping.CONTACT_ID as well.
These references (to values of the outer query) are causing these inner queries to become correlated subqueries (also called dependent subqueries). And that means: For every row you are going to fetch on one of the outer tables (~970000) - you are firing 2 additional queries on another table.
So, that's 1.8 Million (as it seems as well not trivial) queries you are executing.
Most the time, a correlated subquery can be replaced by a proper join. But this depends on the usecase. You also can join the same table twice, when using a different alias.
But to outline some join-options, you need to explain, why the subqueries resulting in the condition group_concat(....) = '0' are important - or maybe better, what you want to achieve.
(ps.: You can also see, that explain outlines them as dependent subquery)
OR is inefficient, see if you can avoid it.
Leading wildcards in LIKE are inefficient. See if a FULLTEXT index would work for you.
With a proper COLLATION, you don't need to test both upper and lower case. Also you can avoid use of BINARY. In both cases, you might be able to use an index. (What indexes do you have?)
Try to change from
WHERE ( ( SELECT ... ) = '0' )
to
WHERE ( NOT EXISTS ( SELECT ... ) )
(The SELECT will need some modification.)
(Please get rid of some of the redundant parens; it is hard to read.)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)

Always display the left of the join and display the right only if any and if it matches a clause [duplicate]

This question already has answers here:
Filter Table Before Applying Left Join
(4 answers)
Closed 5 years ago.
I have 3 tables, a client table, a user table and a user_has_client table.
The user_has_client table is there to join the 2 others, but it also has a roles column.
MariaDB [extrapack]> desc user;
+----------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+----------------------+------+-----+---------+----------------+
| user_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(255) | NO | UNI | NULL | |
MariaDB [extrapack]> desc client;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| client_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
MariaDB [extrapack]> desc user_has_client;
+-----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| user_id | int(10) unsigned | NO | PRI | NULL | |
| client_id | int(10) unsigned | NO | PRI | NULL | |
| roles | tinytext | YES | | NULL | |
+-----------+------------------+------+-----+---------+-------+
There may be multiple different roles for a client and a user, and the roles column is an array.
MariaDB [extrapack]> select * from user_has_client where roles != "" limit 3;
+---------+-----------+---------+
| user_id | client_id | roles |
+---------+-----------+---------+
| 181 | 395 | cpa, ce |
| 181 | 473 | cpa |
| 181 | 498 | cpa |
+---------+-----------+---------+
But one client can offer only one same role to one user. For example, there cannot be two different users having the cpa role on one same client.
I would like to list one client, and for the client, list only the one user that has the role cpa if there is such a user.
Here is my statement:
SELECT c.client_id AS client_id0, ou.user_id AS user_id3, ou.email AS email5 FROM client c LEFT JOIN user_has_client ouhc ON c.client_id = ouhc.client_id LEFT JOIN user ou ON ouhc.user_id = ou.user_id AND ouhc.roles LIKE '%cpa%' WHERE c.client_id = 265 ORDER BY ou.email DESC;
There may or may not be a joining record for a client and a user, but even so, I still want to display a list line for the client, so I cannot do an inner join and have to do a left join.
But doing a left join, I still want only one list line per client.
As of now, the above statement gives me n lines for the client, for n users have a join on this client. But only one of these n users has a cpa role in its join. So I want to display only one list line, and with that user.
So, to sum it up, I always want one line per client, and only one line, and with its user of the given role, say the cpa role, if any for that client.
I would probably change your plan up a little bit. Make user_has_client.roles only carry one role per row. Then, if you want to only allow one of a role for each client, set a UNIQUE constraint on user_id and role.

Subquery for faster result

I have this query which takes me more than 117 seconds on a mysql database.
select users.*, users_oauth.* FROM users LEFT JOIN users_oauth ON users.user_id = users_oauth.oauth_user_id WHERE (
(MATCH (user_email) AGAINST ('sometext')) OR
(MATCH (user_firstname) AGAINST ('sometext')) OR
(MATCH (user_lastname) AGAINST ('sometext')) )
ORDER BY user_date_accountcreated DESC LIMIT 1400, 50
How can I use a subquery in order to optimize it ?
The 3 fields are fulltext :
ALTER TABLE `users` ADD FULLTEXT KEY `email_fulltext` (`user_email`);
ALTER TABLE `users` ADD FULLTEXT KEY `firstname_fulltext` (`user_firstname`);
ALTER TABLE `users` ADD FULLTEXT KEY `lastname_fulltext` (`user_lastname`);
There is only one search input in a website to search in different table users fields.
If the limit is for example LIMIT 0,50, the query will run in less than 3 seconds but when the LIMIT increase the query becomes very slow.
Thanks.
Use a single FULLTEXT index:
FULLTEXT(user_email, user_firstname, user_lastname)
And change the 3 matches to just one:
MATCH (user_email, user_firstname, user_lastname) AGAINST ('sometext')
Here's another issue: ORDER BY ... DESC LIMIT 1400, 50. Read about the evils of pagination via OFFSET . That has a workaround, but I doubt if it would apply to your statement.
Do you really have thousands of users matching the text? Does someone (other than a search engine robot) really page through 29 pages? Think about whether it makes sense to really have such a long-winded UI.
And a 3rd issue. Consider "lazy eval". That is, find the user ids first, then join back to users and users_oauth to get the rest of the columns. It would be a single SELECT with the MATCH in a derived table, then JOIN to the two tables. If the ORDER BY an LIMIT can be in the derived table, it could be a big win.
Please indicate which table each column belongs to -- my last paragraph is imprecise because of not knowing about the date column.
Update
In your second attempt, you added OR, which greatly slows things down. Let's turn that into a UNION to try to avoid the new slowdown. First let's debug the UNION:
( SELECT * -- no mention of oauth columns
FROM users -- No JOIN
WHERE users.user_id LIKE ...
ORDER BY user_id DESC
LIMIT 0, 50
)
UNION ALL
( SELECT * -- no mention of oauth columns
FROM users
WHERE MATCH ...
ORDER BY user_id DESC
LIMIT 0, 50
)
Test it by timing each SELECT separately. If one of the is still slow, then let's focus on it. Then test the UNION. (This is a case where using the mysql commandline tool may be more convenient than PHP.)
By splitting, each SELECT can use an optimal index. The UNION has some overhead, but possibly less than the inefficiency of OR.
Now let's fold in users_oauth.
First, you seem to be missing a very important INDEX(oauth_user_id). Add that!
Now let's put them together.
SELECT u.*
FROM ( .... the entire union query ... ) AS u
LEFT JOIN users_oauth ON users.user_id = users_oauth.oauth_user_id
ORDER BY user_id DESC -- yes, repeat
LIMIT 0, 50 -- yes, repeat
Yes #Rick
I changed the index fulltext to:
ALTER TABLE `users`
ADD FULLTEXT KEY `fulltext_adminsearch` (`user_email`,`user_firstname`,`user_lastname`);
And now there is some php conditions, $_POST['search'] can be empty:
if(!isset($_POST['search'])) {
$searchId = '%' ;
} else {
$searchId = $_POST['search'] ;
}
$searchMatch = '+'.str_replace(' ', ' +', $_POST['search']);
$sqlSearch = $dataBase->prepare(
'SELECT users.*, users_oauth.*
FROM users
LEFT JOIN users_oauth ON users.user_id = users_oauth.oauth_user_id
WHERE ( users.user_id LIKE :id OR
(MATCH (user_email, user_firstname, user_lastname)
AGAINST (:match IN BOOLEAN MODE)) )
ORDER BY user_id DESC LIMIT 0,50') ;
$sqlSearch->execute(array('id' => $searchId,
'match' => $searchMatch )) ;
The users_oauth table has a column with user_id:
Table users:
+--------------------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+-----------------+------+-----+---------+----------------+
| user_id | int(8) unsigned | NO | PRI | NULL | auto_increment |
| user_activation_key | varchar(40) | YES | | NULL | |
| user_email | varchar(40) | NO | UNI | | |
| user_login | varchar(30) | YES | | NULL | |
| user_password | varchar(40) | YES | | NULL | |
| user_firstname | varchar(30) | YES | | NULL | |
| user_lastname | varchar(50) | YES | | NULL | |
| user_lang | varchar(2) | NO | | en
+--------------------------+-----------------+------+-----+---------+----------------+
Table users_oauth:
+----------------------+-----------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+-----------------+------+-----+---------+----------------+
| oauth_id | int(8) unsigned | NO | PRI | NULL | auto_increment |
| oauth_user_id | int(8) unsigned | NO | | NULL | |
| oauth_google_id | varchar(30) | YES | UNI | NULL | |
| oauth_facebook_id | varchar(30) | YES | UNI | NULL | |
| oauth_windowslive_id | varchar(30) | YES | UNI | NULL | |
+----------------------+-----------------+------+-----+---------+----------------+
The Left Join is long, the request takes 3 seconds with, 0,0158 seconds wihtout.
It would be more rapid to make a sql request for each 50 rows.
Would it be more rapid with a subquery ? How to make it with a subquery ?
Thanks

MySQL: Exclude subsets if not all row values are true

I have three MySQL tables: ingredients, ingredient_in_recipe and recipes which can be INNER JOINed to get ingredients in recipes. Also, the ingredients table has a column vegetarian. I want to get all recipes that are considered vegetarian, meaning that all ingredients for a given recipe must have set vegetarian to 1 (it is a BOOL/tinyint(1)).
I have looked at queries using ALL, HAVING NOT MAX and other various stuff, but I cannot find a working solution. What is the best way to do this? Are there some solutions that are more efficient than others?
Extra (only relevant) table information:
mysql> DESCRIBE ingredients;
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(100) | NO | | NULL | |
| vegetarian | tinyint(1) | NO | | 0 | |
+-----------------+---------------+------+-----+---------+----------------+
mysql> DESCRIBE ingredient_in_recipe;
+---------------+------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------------+------+-----+---------+-------+
| recipe_id | int(11) | NO | | NULL | |
| ingredient_id | int(11) | NO | | NULL | |
+---------------+------------------------+------+-----+---------+-------+
mysql> DESCRIBE recipes;
+------------------------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+----------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | text | NO | | NULL | |
+------------------------+----------------------+------+-----+---------+----------------+
The start of my query is currently:
SELECT recipe.name, ingredient.name
FROM ingredients AS ingredient
INNER JOIN ingredient_in_recipe AS ir
ON ir.ingredient_id = ingredient.id
INNER JOIN recipes AS recipe
ON ir.recipe_id = recipe.id;
So I am missing a WHERE, ALL, IN or something statement at the end.
You can try the following:
SELECT r.name FROM recipes r WHERE r.id NOT IN (
SELECT ir.recipe_id FROM ingredient_in_recipe ir
INNER JOIN ingredients i ON ir.ingredient_id = i.id
WHERE i.vegeterian = 0
)
Think of it this way.
Select out the set of recipes that have any non-vegitarian ingredients.
Subtract this set from the set of all recipes.
So here's the set of all recipes with a non veg ingredient.
select
id
from
recipes,
ingredient_in_recipe,
ingredients
where
ingredient_in_recipe.recipe_id = recipes.id
and
ingredient_in_recipe.ingredient_id = ingredients.id
and
ingredients.vegetarian <> 1
Note: why are you using a tinyint to mark a boolen? USE Boolean to mark boolean.
Also your DB model is pretty good. Your naming is consistent and appropriate.
Now that we have the "non-vegitarian" recipes, we just subtract from a "set" perspective [as in set theory].
select
*
from
recipes
where
id NOT IN (
-- this subquery returns a set of IDs corresponding to non-vegitarian recipes.
select
id
from
recipes,
ingredient_in_recipe,
ingredients
where
ingredient_in_recipe.recipe_id = recipes.id
and
ingredient_in_recipe.ingredient_id = ingredients.id
and
ingredients.vegetarian <> 1
);

MySQL: return field for which no related entries exist in another table

First, sorry for the title, as I'm no native english-speaker, this is pretty hard to phrase. In other words, what I'm trying to achieve is this:
I'm trying to fetch all domain names from the table virtual_domains where there is no corresponding entry in the virtual_aliases table starting like "postmaster#%".
So if I have two domains:
foo.org
example.org
An they got aliases like:
info#foo.org => admin#foo.org
postmaster#foo.org => user1#foo.org
info#example.org => admin#example.org
I want the query to return only the domain "foo.org" as "example.org" is missing the postmaster alias.
This is the table layout:
mysql> show columns from virtual_aliases;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_id | int(11) | NO | MUL | NULL | |
| source | varchar(100) | NO | | NULL | |
| destination | varchar(100) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> show columns from virtual_domains;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
I tried for many hours with IF, CASE, LIKE queries with no success. I don't need a final solution, maybe just a hint with some explanation. Thanks!
SELECT * FROM virtual_domains AS domains
LEFT JOIN virtual_aliases AS aliases
ON domains.id = aliases.domain_id
WHERE aliases.domain_id IS NULL
LEFT JOIN returns all records from the "left" table, even they have no corresponding records in "right" table. Those records will have the right table fields set to NULL. Use WHERE to strip all the others.
I guess I didn't understand you correctly the first time. You have several entries in aliases for single domain, and you want to display only those domains that don't have an entry in aliases table that starts with "postmaster"?
In this case you are should use NOT IN like this:
SELECT * FROM virtual_domains AS domains
WHERE domains.id NOT IN (
SELECT domain_id
FROM virtual_aliases
WHERE whatever_column LIKE "postmaster#%"
)
select id,domain from virtual_domains
where id not in (select domain_id from virtual_aliases)
SELECT * FROM virtual_domains vd
LEFT JOIN virtual_aliases va ON vd.id = va.domain_id
AND va.destination NOT LIKE 'postmaster#%';