Improve a Join performance - mysql

This query runs slowly:
SELECT UserAccountNumber, balance, username FROM users JOIN balances ON
users.UserAccountNumber=balances.UserAccountNumber WHERE date < “2011-02-02”
What can i do to improve its performance?
I thought about using the user ID for the join instead of the userAccountNumber.
Appart form it, as far as i know, the JOIN and WHERE users.id = balances.idUser perform at the same speed...
So.. what else could i change to improve it?
Thanks.

The query itself looks OK to me, but make sure you have indexes on the UserAccountNumber columns (since they're involved in the join) and date (the column you're searching on). If the database has to do a sequential scan of a lot of records, that'll be slow. Using EXPLAIN SELECT may help you to understand how the database is actually performing the query.

If the tables are huge you might get some improvement using a temporary table rather then letting MySQL's optimiser sort it out for you. It'll be a trade off for sure though.
CREATE TEMPORARY TABLE `tmp_balances`
(
`UserAccountNumber` INT,
`balance` INT,
INDEX (`UserAccountNumber`)
) ENGINE=MEMORY
SELECT `UserAccountNumber`, `balance`
FROM balances
WHERE date < "2011-02-02";
SELECT `tmp_balances`.`UserAccountNumber`,
`tmp_balances`.`balance`,
`users`.`username`
FROM `tmp_balances` INNER JOIN users USING (`UserAccountNumber`);
Other then that (and it's a bit of a long-shot), I'd have to echo what the earlier commentors said about indexing.

Related

Fast to query slow to create table

I have an issue on creating tables by using select keyword (it runs so slow). The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query.
SELECT *
FROM amusementPart a
INNER JOIN (
SELECT DISTINCT name, type, cageID, dateOfEntry
FROM bigRegistrations
GROUP BY cageID
) r ON a.type = r.cageID
But because of slow performance, someone suggested me steps to improve the performance. 1) use temporary table, 2)store the result and use it and join it the the other statement.
use myzoo
CREATE TABLE animalRegistrations AS
SELECT DISTINCT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
unfortunately, It is still slow. If I only use the select statement, the result will be shown in 1-2 seconds. But if I add the create table, the query will take ages (approx 25 minutes)
Any good approach to improve the query time?
edit: the size of big registration table is around 3.5 million rows
Can you please try the query in the way below to achieve The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query, the query you are using is not fetching records as per your requirement and it will faster:
SELECT a.*, b.name, b.type, b.cageID, b.dateOfEntry
FROM amusementPart a
INNER JOIN bigRegistrations b ON a.type = b.cageID
INNER JOIN (SELECT c.cageID, max(c.dateOfEntry) dateofEntry
FROM bigRegistrations c
GROUP BY c.cageID) t ON t.cageID = b.cageID AND t.dateofEntry = b.dateofEntry
Suggested indexing on cageID and dateofEntry
This is a multipart question.
Use Temporary Table
Don't use Distinct - group all columns to make distinct (dont forget to check for index)
Check the SQL Execution plans
Here you are not creating a temporary table. Try the following...
CREATE TEMPORARY TABLE IF NOT EXISTS animalRegistrations AS
SELECT name, type, cageID, MAX(dateOfEntry) as entryDate
FROM bigRegistrations
GROUP BY cageID
Have you tried doing an explain to see how the plan is different from one execution to the next?
Also, I have found that there can be locking issues in some DB when doing insert(select) and table creation using select. I ran this in MySQL, and it solved some deadlock issues I was having.
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
The reason the query runs so slow is probably because it is creating the temp table based on all 3.5 million rows, when really you only need a subset of those, i.e. the bigRegistrations that match your join to amusementPart. The first single select statement is faster b/c SQL is smart enough to know it only needs to calculate the bigRegistrations where a.type = r.cageID.
I'd suggest that you don't need a temp table, your first query is quite simple. Rather, you may just need an index. You can determine this manually by studying the estimated execution plan, or running your query in the database tuning advisor. My guess is you need to create an index similar to below. Notice I index by cageId first since that is what you join to amusementParks, so that would help SQL narrow the results down the quickest. But I'm guessing a bit - view the query plan or tuning advisor to be sure.
CREATE NONCLUSTERED INDEX IX_bigRegistrations ON bigRegistrations
(cageId, name, type, dateOfEntry)
Also, if you want the animal with the latest entry date, I think you want this query instead of the one you're using. I'm assuming the PK is all 4 columns.
SELECT name, type, cageID, dateOfEntry
FROM bigRegistrations BR
WHERE BR.dateOfEntry =
(SELECT MAX(BR1.dateOfEntry)
FROM bigRegistrations BR1
WHERE BR1.name = BR.name
AND BR1.type = BR.type
AND BR1.cageID = BR.cageID)

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

How to optimize a JOIN and AVG statement for a ratings table

I basically have two tables, a 'server' table and a 'server_ratings' table. I need to optimize the current query that I have (It works but it takes around 4 seconds). Is there any way I can do this better?
SELECT ROUND(AVG(server_ratings.rating), 0), server.id, server.name
FROM server LEFT JOIN server_ratings ON server.id = server_ratings.server_id
GROUP BY server.id;
Query looks ok, but make sure you have proper indexes:
on id column in server table - probably primary key,
on server_id column in server_ratings table,
If it does not help, then add rating column into server table and calculate it on a constant basis (see this answer about Cron jobs). This way you will save the time you spend on calculations. They can be made separately eg. every minute, but probably some less frequent calculations are enough (depending on how dynamic is your data).
Also make sure you query proper table - in the question you have mentioned servers table, but in the code there is reference to server table. Probably a typo :)
This should be slightly faster, because the aggregate function is executed first, resulting in fewer JOIN operations.
SELECT s.id, s.name, r.avg_rating
FROM server s
LEFT JOIN (
SELECT server_id, ROUND(AVG(rating), 0) AS avg_rating
FROM server_ratings
GROUP BY server_id
) r ON r.server_id = s.id
But the major point are matching indexes. Primary keys are indexed automatically. Make sure you have one on server_ratings.server_id, too.

How can i improve mysql join query speed?

I have two tables lets say employees and order, both table have millions of records.
Select orders.*
from orders
INNER JOIN employees
on Employees.id = orders.employeeid
WHERE orders.type='daily'
and orders.date > Employees.registerDate
and orders.date < Employees.regiserDate + Interval 60 Days
The above query is rough query, there may be syntax error, but is just considerations.
The query consuming almost 60 seconds to load, any body know how can i optimize this query
The indexing is one of the best options mentioned, you could also use stored procedure to optimize its performance on retrieving data.
you may would also like to use the ANALYZE statement to optimize the retrieval of data from those tables you may read more about this statement from this site:
http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html
Set your indexes right (which you probably did?) or create views. You could also consider a temp table, but most of the time the sync makes it not worth it.

MySQL JOIN Query taking far too long to complete

I have a table called users with roughly 250,000 records in it. I have another table called staging with around 75,000 records in it. Staging only has one column, msisdn. I want to check to see how many rows in staging are not present in users.
I have the following query, which I have tested on a small data subset, and it seems to work fine:
SELECT
s.*
FROM staging s
LEFT OUTER JOIN users u ON u.msisdn=s.msisdn
WHERE u.msisdn IS NULL
The problem however, is when I try to run this query on the full list of 250k users. It ran for an hour before I stopped it. Is there any way I can optimise this query?
I have started running the query on subsets of the data in staging, but this is horribly manual:
SELECT
s.*
FROM staging s
LEFT OUTER JOIN users u ON u.msisdn=s.msisdn
WHERE u.msisdn IS NULL
LIMIT 0,10000
msisdn is the primary key of the staging table, but it's not the primary key of the table users. I don't know if that is significant however.
First, you can see what indices MySQL is using with the EXPLAIN command. Just write EXPLAIN in front of your query, and the results will show what index (if any) it's using. Presumably if it's that slow on so (relatively) small a data set as 250,000 records, it's not exploiting a very effective index, and you'll be able to see where.
It may also help to rewrite the query as a NOT EXISTS like so:
SELECT s.* FROM staging s
WHERE NOT EXISTS (SELECT 1 FROM users WHERE users.misdn = s.misdn)
Put indexes on the msisdn columns of each table. Since it's not a PK on users, you'll need to put a non-clustered index on it. That should speed up your query tremendously.
Things you can do to speed up this query:
make sure msisdn is indexed in both tables
optimize your tables
replace * with msisdn
I'm not sure how much quicker this will be, but you can try something like.
select msisdn
from staging
where msisdn not in (select msisdn from users)
Also, make sure that an index exists for the msisdn column in both tables. That should speed things up tremendously.