Joining tables on two fields - mysql

I always have confusion when it comes into JOINING tables.
So, I have a table that stores the user details called tblUsers having the following fields(for the sake of simplicity, I am including only the required fields here while posting):
user_id
first_name
And I have another table which stores the messages called tblMessages:
msg_id
sender_id
recipient_id
msg_body
Now what am trying to do is to fetch all messages, with the user names too. What I have tried is this:
SELECT
`msg_id`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`sender_id`) AS `sender_name`,
(SELECT `first_name` FROM `tblUsers` WHERE `tblUsers`.`user_id` = `tblMessages`.`recipient_id`) AS `recipient_name`,
`msg_body`
FROM `tblMessages`
It seems to be working at the moment. But is this the correct way for attaining my goal? Or will JOINing the tables will be better? The tblMessages can grow to a large number of rows probably. If we are going to do the JOIN, then we will do 2 LEFT JOINs? First, on the sender_id of tblMessages with user_id of tblUsers and again recipient_id of tblMessages with user_id of tblUsers. Is that correct?
Let me know your suggestions or corrections on my approach.

This is going to be your best query (It will run queries once, and then join tables on their indices):
SELECT m.`msg_id`, su.`first_name` AS `sender_name`, ru.`first_name` AS `recipient_name`, m.`msg_body`
FROM `tblMessages` m
LEFT JOIN `tblUsers` su ON m.`sender_id` = su.`user_id`
LEFT JOIN `tblUsers` ru ON m.`recipient_id` = ru.`user_id`;
When in doubt, use EXPLAIN right before your query to determine what indexes it's going to use, and how efficient it's going to be. Check out these sqlfiddles containing the EXPLAIN's for each query.
You can read a bit about the reasoning for choosing this query over yours here and straight from the docs here. EXPLAIN is also a helpful tool that can help you understand where your bottlenecks are and what is causing performance issues on your database (This likely isn't going to impact it very much, but you can always do some performance tests when your database reaches a healthy size.

You should JOIN the same table twice, using two different aliases for example s and r:
SELECT
m.msg_id,
m.sender_id,
s.first_name,
m.recipient_id,
r.first_name,
m.msg_body
FROM
tblMessages AS m
LEFT JOIN tblUsers AS s ON m.sender_id=s.user_id
LEFT JOIN tblUsers AS r ON m.recipient_id=r.user_id
but your approach is not wrong, it works and with proper indexes shouldn't be much slower.

Related

MySQL basic query taking exceptionally long - explain plan looks pretty good

I have a query with the explain plan below. It's pretty basic, every join is using an index (though not unique), and it's taking 5+ hours. The largest table has about 100k records. RAM and CPU are not pegged or anything, no other queries running, no table locks. The most "complicated" part I have is the coalesce in an outer join. Is that killing me?
For clarification, I'm joining to the same table twice because some of the records have a user ID, some just have a first/last name. I prefer to join by a unique user name obviously, and one of the selected items is coalesce(u1.job_title, u2.job_title)
from utilization_incident ui
left join users_utilization_v u1
on u1.cc_user_id = ui.assigned_to_user_id
and u1.source_system = ui.source
and u1.data_date = ui.data_date
left join users_utilization_v u2
on u2.first_name = ui.assigned_to_first_name
and u2.last_name = ui.assigned_to_last_name
and u2.source_system = ui.source
and u2.data_date = ui.data_date
left join lkp_job_title_service_area jtsa
on jtsa.job_title = coalesce(u1.job_title, u2.job_title)
The CPU may not be pegged, but what about I/O?
How much RAM? What is the value of innodb_buffer_pool_size?
Please provide SHOW CREATE TABLE.
Please provide the entire SELECT.
Please provide the text version of EXPLAIN SELECT ....
Pending further details, these indexes should help:
users_utilization_v: INDEX(cc_user_id, source_system, data_date)
users_utilization_v: INDEX(first_name, last_name, source_system, data_date)
lkp_job_title_service_area: INDEX(job_title)
Remove LEFT unless you need it.

fast way to get number of records in mysql

I'm writing a query in mysql to join two tables. And both tables have more than 50,000 records.
Table EMP Columns
empid,
project,
code,
Status
Table EMPINFO
empid,
project,
code,
projecttype,
timespent,
skills
In each table there is candidate key [empid, project, code]
So when I join the table using INNER join
like this INNER JOIN
ON a.empid = b.empid
and a.project = b.project
and a.code = b.code
I'm getting the result, but if I add count(*) in outer query to count number of records, it takes lot of time something connection gets failed.
Is there any way to speed up to get number of records ?
And I would like to hear more suggestions to speed up inner join query as well having same candidate key in both tables.
INDEX(empid, project, code) -- in any order.
Are these tables 1:1? If so, why do the JOIN in order to do the COUNT?
Please provide SHOW CREATE TABLE. (If there are datatype differences, this could be a big problem.)
Please provide the actual SELECT.
How much RAM do you have? Please provide SHOW VARIABLES LIKE '%buffer%';.

Mysql query where not exists

I have three different tables - subscribers, unsubscribers, mass subscribers.
I'd like to print out each email from the mass subscribers table. However that email can only be printed if it doesn't exist in both subscribers and unsubscribers tables.
I know how to do this with arrays, however I want a plain mysql query.
What would mysql query be?
Thanks!
You can do that with a subquery (this is slow! Please read below the line):
SELECT email
FROM subscribers
WHERE email NOT IN(SELECT email FROM unsubscribers)
However, this is very bad for performance. I suggest you change the way you have your database, with just 1 table subscribers, and add a column active(tinyint). When someone unsubscribes, you set that value from 1 to 0. After that you can stay in 1 table:
SELECT email FROM subscribers WHERE active=1
This is faster because of some reasons:
No subquery
The where is bad, because you are going to select a heap of data, and compare strings
Selecting on integer in VERY fast (especially when you index it)
Apart from the fact that this is faster, it would be better for your database structure. You dont want two tables doing almost the same, with emailadresses. This will create duplicate data and a chance for misalignments
You sound like someone who doesn't have much experience with SQL. Your title does point in the right direction. Here is how you put the components together:
select m.*
from mass_subscribers m
where not exists (select 1 from subscribers s where s.email = m.email) and
not exists (select 1 from unsubscribers u where u.email = m.email);
NOT EXISTS happens to be a very good choice for this type of query; it is typically pretty efficient in both MySQL and other databases.
Without subqueries, using join
SELECT mass_subscribers.*
FROM mass_subscribers ms
LEFT JOIN subscribers s ON ms.email=s.email
LEFT JOIN unsubscribers us ON us.email=s.email
WHERE
ms.email IS NULL
AND
us.email IS NULL

MySQL - how to speed up or change this query

I did not write this query. I am working on someone else's old code. I am looking into changing what is needed for this query but if I could simply speed up this query that would solve my problem temporarily. I am looking at adding indexes. when I did a show indexes there are so many indexes on the table orders can that also slow down a query?
I am no database expert. I guess I will learn more from this effort. :)
SELECT
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.AMAZON_PurchaseDate,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orders.AMAZON_ORD_ID,
orders.ORD_TrackingNumber,
orders.ORD_SHIPPINGCNTRY_ID,
orders.AMAZON_IsExpedited,
orders.ORD_ShippingStreet1,
orders.ORD_ShippingStreet2,
orders.ORD_ShippingCity,
orders.ORD_ShippingStateProv,
orders.ORD_ShippingZipPostalCode,
orders.CUST_ID,
orders.ORD_ShippingName,
orders.AMAZON_ShipOption,
orders.ORD_ShipLabelGenOn,
orders.ORD_SHIPLABELGEN,
orders.ORD_AddressVerified,
orders.ORD_IsResidential,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name,
SUM(orderitems.ORDITEM_Qty) AS ORD_ItemCnt,
SUM(orderitems.ORDITEM_Weight * orderitems.ORDITEM_Qty) AS ORD_ItemTotalWeight
FROM
orders
LEFT JOIN orderstatuses ON
orders.ORDSTATUS_ID = orderstatuses.ORDSTATUS_ID
LEFT JOIN orderitems ON
orders.ORD_ID = orderitems.ORD_ID
LEFT JOIN paymentmethods ON
orders.PAYMETH_ID = paymentmethods.PAYMETH_ID
LEFT JOIN shippingoptions ON
orders.SHIPOPT_ID = shippingoptions.SHIPOPT_ID
WHERE
(orders.AMAZON_ORD_ID IS NOT NULL AND (orders.ORD_SHIPLABELGEN IS NULL OR orders.ORD_SHIPLABELGEN = '') AND orderstatuses.ORDSTATUS_ID <> 101 AND orderstatuses.ORDSTATUS_ID <> 40)
GROUP BY
orders.ORD_ID,
orders.ORD_TotalAmt,
orders.PAYMETH_ID,
orders.SCHOOL_ID,
orders.ORD_AddedOn,
orders.ORDSTATUS_ID,
orders.ORD_InvoiceNumber,
orders.ORD_CustFirstName,
orders.ORD_CustLastName,
orderstatuses.ORDSTATUS_Name,
paymentmethods.PAYMETH_Name,
shippingoptions.SHIPOPT_Name
ORDER BY
orders.ORD_ID
One simple thing you should consider is whether you really need to use left joins or you would be satisfied using inner joins for some of the joins. the new query would not be the same as the original query, so you would need to think carefully about what you really want back. If your foreign key relationships are indexed correctly, this could help substantially, especially between ORDERS and ORDERITEMS, because I would imagine these are your largest tables. The following post has a good explanation: INNER JOIN vs LEFT JOIN performance in SQL Server. There are lots of other things that can be done, but you will need to post the query plan so people can dive deeper.
It looks like just adding the index was all that was needed.
create index orderitems_ORD_ID_index on orderitems(ORD_ID);

Scalable way of doing self join with many to many table

I have a table structure like the following:
user
id
name
profile_stat
id
name
profile_stat_value
id
name
user_profile
user_id
profile_stat_id
profile_stat_value_id
My question is:
How do I evaluate a query where I want to find all users with profile_stat_id and profile_stat_value_id for many stats?
I've tried doing an inner self join, but that quickly gets crazy when searching for many stats. I've also tried doing a count on the actual user_profile table, and that's much better, but still slow.
Is there some magic I'm missing? I have about 10 million rows in the user_profile table and want the query to take no longer than a few seconds. Is that possible?
Typically databases are able to handle 10 million records in a decent manner. I have mostly used oracle in our professional environment with large amounts of data (about 30-40 million rows also) and even doing join queries on the tables has never taken more than a second or two to run.
On IMPORTANT lessson I realized whenever query performance was bad was to see if the indexes are defined properly on the join fields. E.g. Here having index on profile_stat_id and profile_stat_value_id (user_id I am assuming is the primary key) should have indexes defined. This will definitely give you a good performance increaser if you have not done that.
After defining the indexes do run the query once or twice to give DB a chance to calculate the index tree and query plan before verifying the gain
Superficially, you seem to be asking for this, which includes no self-joins:
SELECT u.name, u.id, s.name, s.id, v.name, v.id
FROM User_Profile AS p
JOIN User AS u ON u.id = p.user_id
JOIN Profile_Stat AS s ON s.id = p.profile_stat_id
JOIN Profile_Stat_Value AS v ON v.id = p.profile_stat_value_id
Any of the joins listed can be changed to a LEFT OUTER JOIN if the corresponding table need not have a matching entry. All this does is join the central User_Profile table with each of the other three tables on the appropriate joining column.
Where do you think you need a self-join?
[I have not included anything to filter on 'the many stats'; it is not at all clear to me what that part of the question means.]