I am struggling since several days with poor performance on joining two large tables. Maybe someone has a hint for me.
The one table is "broker_stock_data" which holds informations about purchases and sells of stocks of a client (this table is currently small but will grow bigger in future). To show the client the current price for his stocks there is the table "stock_data" which holds historical stock prices for a big amount of stocks (currently around 2million rows and growing). It´s a mariaDB/mysql Database and the table uses InnoDB.
Here are some informations about my tables:
broker_stock_data Table
stock_data Table
EXPLAIN Call on the SELECT
Schema of stock_data table
Having that in place I need to somehow get the latest price for each stock which is owned by a client. To do that I have the following query.
SELECT
`brokerStockData`.`id` AS `id`,
`brokerStockData`.`name` AS `name`,
`brokerStockData`.`symbol` AS `symbol`,
`brokerStockData`.`wkn` AS `wkn`,
`brokerStockData`.`modifyDate` AS `modifyDate`,
`brokerStockData`.`addDate` AS `addDate`,
`webApiConfig`.`id` AS `webApiConfigId`,
`webApiConfig`.`name` AS `webApiConfigName`,
`importError`.`msg` AS `importErrorMessage`,
SUM(`brokerStockData`.`purchaseAmount`) AS `purchaseAmount`,
stockData.stock_data_close AS `stockDataClose`,
stockData.stock_data_date AS `purchaseDate`,
stockData.stock_data_close * SUM(purchaseAmount) - SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * SUM(purchaseAmount) AS `difference`,
(
(stockData.stock_data_close * purchaseAmount - SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * purchaseAmount) / (SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) * purchaseAmount)
)
* 100 AS `yield`,
SUM(purchasePrice * purchaseAmount) / SUM(purchaseAmount) AS `avgPurchasePrice`
FROM
`broker_stock_data` `brokerStockData`
INNER JOIN
`broker` `broker`
ON `broker`.`id` = `brokerStockData`.`brokerId`
INNER JOIN
`user` `user`
ON `user`.`id` = `broker`.`userId`
INNER JOIN
`webapi_configuration` `webApiConfig`
ON `webApiConfig`.`id` = `brokerStockData`.`webApiConfigId`
LEFT JOIN
(
SELECT
`stockData`.`date` AS `stock_data_date`,
`stockData`.`symbol` AS `stock_data_symbol`,
`stockData`.`close` AS `stock_data_close`,
`stockData`.`webApiConfigId` AS `stock_data_webApiConfigId`
FROM
`stock_data` `stockData`
WHERE
`stockData`.`date` IN
(
SELECT
MAX(`stockDataSQ`.`date`)
FROM
`stock_data` `stockDataSQ`
WHERE
`stockDataSQ`.`symbol` = `stockData`.`symbol`
GROUP BY
`stockDataSQ`.`symbol`
)
GROUP BY
`stockData`.`symbol`
)
`stockData`
ON `brokerStockData`.`symbol` = stock_data_symbol
AND `webApiConfig`.`id` = stock_data_webApiConfigId
LEFT JOIN
`import_log` `importError`
ON `importError`.`symbol` = `brokerStockData`.`symbol`
WHERE
`user`.`id` = 2
AND `broker`.`id` = 2
AND `brokerStockData`.`symbol` != ""
GROUP BY
`brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC LIMIT 12
The problematic part is the LEFT JOIN on the stock_data table. Any Ideas on how to speed this up?
UPDATE
Changed the query since I copied a modified version of me :/
UPDATE2
Updated the EXPLAIN screenshot with the new query, sorry ;)
LEFT JOIN ( SELECT ... ) is likely to be very inefficient. Can you get rid of the LEFT?
Because of the LIMIT 12, a common trick to improve performance is to do the minimum work to find the 12 ids first. Then do whatever JOINs are needed to find all the other data.
After you have done that, I'll look at the indexes.
GROUP BY
`brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC
Because the lists are different, it requires sorting twice. Change to this:
GROUP BY
`brokerStockData`.`name`, `brokerStockData`.`symbol`
ORDER BY
`brokerStockData`.`name` ASC, `brokerStockData`.`symbol` ASC
Assuming symbol and name are 1:1, you will get same results, but faster.
Related
I have a query that is slow... i want to display the last 12 newest members near me(near the logged user) and my dev database has 150k rows.
It took over 1 second and the explain query tells me that 30k rows are filtered
So 30k filtered for 150k rows in my developpment DB... my server online is much bigger thant this....
Here my query :
SELECT profils.*,
Users.username,
( SELECT count(*)
from profilsphotos pp
where pp.iduser=Profils.iduser
) as nbpics,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)),
2)), (SIN(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)))
) * 6372.795 AS distance
from Users
inner join Profils ON Users.id=Profils.iduser
where Profils.Actif=1
and profils.idsexe=2
and profils.idlookingfor=1
and Profils.iduser<>1
HAVING distance<400
order by Users.id desc, distance asc
limit 12
Note that i add an index on those four fields: actif,idsexe,idlookingfor and iduser
What wrong with my query ?
Thanks a lot !
Pascal
I would extract the subquery from the SELECT clause to a temporary table, index it and join to it, instead of executing it for every record in the select clause (30K times).
So the steps are: create a temp table, index it, run the optimized query.
First, create the relevant indexes for the query:
ALTER TABLE
`Profils`
ADD
INDEX `profils_idx_actif_iduser` (`Actif`, `iduser`);
ALTER TABLE
`Users`
ADD
INDEX `users_idx_id_username` (`id`, `username`);
ALTER TABLE
`profils`
ADD
INDEX `profils_idx_idsexe_idlookingfor` (`idsexe`, `idlookingfor`);
ALTER TABLE
`profilsphotos`
ADD
INDEX `profilsphotos_idx_iduser` (`iduser`);
Now, create the temp table and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS nbpics,
iduser
FROM
profilsphotos pp
WHERE
1 = 1
GROUP BY
iduser
ORDER BY
NULL;
ALTER TABLE
`temp1`
ADD
INDEX `temp1_idx_iduser_nbpics` (`iduser`, `nbpics`);
Now try to run this query instead of the original one and see if it runs faster:
SELECT
optimizedSub1.*,
temp1.nbpics
FROM
(SELECT
Users.username,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2)),
(SIN(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)))) * 6372.795 AS distance
FROM
Users
INNER JOIN
Profils
ON Users.id = Profils.iduser
WHERE
Profils.Actif = 1
AND profils.idsexe = 2
AND profils.idlookingfor = 1
AND Profils.iduser <> 1
HAVING
distance < 400
ORDER BY
Users.id DESC,
distance ASC LIMIT 12) AS optimizedSub1
LEFT JOIN
temp1
ON temp1.iduser = optimizedSub1.iduser
Profils needs
INDEX(Actif, idsexe, idlookingfor) -- in any order
Perhaps distance should be first?..
order by Users.id desc, distance asc
What is Y(gm_coor)? If is a Stored Function, we need to know more. What table has gm_coor? After that, maybe we can discuss a "bounding box" as a partial speedup.
Make another nesting of SELECTs and move the computation of nbpics to it. Currently, the COUNT(*) is being performed 30K times. After the change, it will be only 12 times.
Reformulation
SELECT p2.*,
u.username,
( SELECT COUNT(*)
FROM profilsphotos pp
where pp.iduser = p2.iduser
) as nbpics,
x.distance
FROM
( SELECT p1.id, -- assuming this the PK of Profils
(...) AS distance
FROM Profils AS p1
WHERE p1.Actif=1
and p1.idsexe=2
and p1.idlookingfor=1
and p1.iduser<>1
HAVING distance < 400
ORDER BY distance
LIMIT 12
) AS x
JOIN profils AS p2 USING(id)
JOIN Users AS u ON u.id = p2.iduser;
I have a performance issue with the query below on MYSQL. The below query has 5 tables involved. When I apply the order by and limit, the results are retrieved in 0.3 secs. But without the order by and limit, I was able to get the results in 0.01 secs. I am tired changing the query but that did not work. Could someone please help me with this query so I can get the results in desired time (<0.3 secs).
Below are the details.
m_todos = 286579 (records)
m_pat = 214858 (records)
users = 119 (records)
m_programs = 26 (records)
role = 4 (records)
SELECT *
FROM (
SELECT t.*,
mp.name as A_name,
u.first_name, u.last_name,
p.first, p.last, p.zone, p.language,p.handling,
r.name,
u2.first_name AS created_first_name,
u2.last_name AS created_last_name
FROM m_todos t
INNER JOIN role r ON t.role_id=r.id
INNER JOIN m_pat p ON t.patient_id = p.id
LEFT JOIN users u2 ON t.created_id=u2.id
LEFT JOIN m_programs mp ON t.prog_id=mp.id
LEFT JOIN users u ON t.user_id=u.id
WHERE t.role_id !='9'
AND t.completed = '0000-00-00 00:00:00'
) C
ORDER BY priority DESC, due ASC
LIMIT 0,10
Get rid of the outer SELECT; move the ORDER BY and LIMIT in.
Indexes:
t: (completed)
t: (priority, due)
I assume priority and due are in t?? Please be explicit in the query. It could make a huge difference.
If the following works, it should speed things up a lot: Start by finding the t.id without all the JOINs:
SELECT id
FROM m_todos
WHERE role_id !='9'
AND completed = '0000-00-00 00:00:00'
ORDER BY priority DESC, due DESC
LIMIT 10
That will benefit from this covering composite index:
INDEX(completed, role_id, priority, due, id)
Debug that. Then use it in the rest:
SELECT t.*, the-other-stuff
FROM ( that-query ) AS t1
JOIN m_todos AS t USING(id)
then-the-rest-of-the-JOINs
ORDER BY priority DESC, due ASC -- yes, again
If you don't need all of t.*, it may be beneficial to spell out the actual columns needed.
The reason for this to run much faster is that the 10 rows are found efficiently by looking only at the one table. The original code was shoveling around a lot more rows than 10 and they included all the columns of t, plus columns from the other tables.
My version does only 10 lookups for all the extra stuff.
I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~
I have two tables:
history
business
I want to run this query :
SELECT name, talias.*
FROM
(SELECT business.bussName as name history.*
FROM history
INNER JOIN business on history.bussID = business.bussID
WHERE history.activity = 'Insert' OR history.activity = 'Update'
UNION
SELECT name as Null, history.*
FROM history
WHERE history.activity = 'Delete'
) as talias
WHERE 1
order by talias.date DESC
LIMIT $fetch,20
this query take 13 second , I think the problem is that Mysql join all the rows at history and business tables ! While it should join just 20 rows !
how could I fix that ?
If I understand you correctly you want all rows from history where the activity is deleted plus all those rows where the activity is 'Insert' or 'Update' and there is a corresponding row in the business table.
I don't know if that is going to be faster than your query - you will need to check the execution plan to verify this.
SELECT *
FROM history
where activity = 'Delete'
or ( activity in ('Insert','Update')
AND exists (select 1
from business
where history.bussID = business.bussID))
order by `date` DESC
LIMIT $fetch,20
Edit (after the question has changed)
If you do need columns from the business table, replacing the union with an outer join might improve performance.
But to be honest, I don't expect it. The MySQL optimizer isn't very smart and I wouldn't be surprised if the outer join was actually implemented using some kind of union. Again only you can test that by looking at the execution plan.
SELECT h.*,
b.bussName as name
FROM history
LEFT JOIN business b
ON h.bussID = b.bussID
AND h.activity in ('Insert','Update')
WHERE h.activity in ('Delete', 'Insert','Update')
ORDER BY h.`date` DESC
LIMIT $fetch,20
Btw: date is a horrible column name. First because it's a reserved word, second (and more important) because it doesn't document anything. Is that the "creation date"? The "deletion date"? A "due date"? Some other date?
Try this:
SELECT h.*
FROM history AS h
WHERE (h.activity IN ('Insert', 'Update')
AND EXISTS (SELECT * FROM business AS b WHERE b.bussID = h.bussID))
OR h.activity = 'Delete'
ORDER BY h.date DESC
LIMIT $fetch, 20
For the ORDER BY and LIMIT to be efficient, make sure you have an index on history.date.
I have this MySQL query to get the total amount of only the first invoice for each client on a given month:
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
WHERE InvoiceID IN (
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
WHERE tblinvoice.ClientID IN (
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
)
GROUP BY tblinvoice.ClientID
);
When I run it, it seems to loop forever. If I remove the first part it gives me the list of invoices instantly. Am sure it is a small syntax detail but haven't been able to figure out what the problem is after nearly one hour trying to fix it.
Your assistance is appreciated.
This query can probably be done in a better way without all the sub queries as well, just I'm not so experienced with sub queries. :)
Solution was given but I should have included the full query rather than just the part I was having trouble with. The full query is:
SELECT AdvertisingID, AdvertisingTitle, AdvertisingYear,
AdvertisingMonth, AdvertisingTotal, AdvertisingVisitors,
IFNULL(
(SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN
(SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN
(SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE YEAR(tblenquiry.EnquiryDate)=tbladvertising.AdvertisingYear
AND MONTH(tblenquiry.EnquiryDate)=tbladvertising.AdvertisingMonth)
AS inq
ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID) AS inq2
ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID)
, 0)
FROM tbladvertising
ORDER BY AdvertisingYear DESC, AdvertisingMonth DESC, AdvertisingTitle;
Now the problem is that the column with the sub query has no access to "tbladvertising.AdvertisingYear" or "tbladvertising.AdvertisingMonth"
A commenter mentioned that it's hard to understand what you're trying to do here. I agree. But I will take the risk of trying to puzzle it out.
As usual with this sort of query, it's helpful to take advantage of the structured part of structured query language, and try to build this up piece by piece. That's the secret to creating complex queries that actually do what you want them to do.
Your innermost query is this:
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
It is saying, "give me the list of ClientID values which have enquiries in September 2014. There's a more efficient way to do this:
SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE tblenquiry.EnquiryDate >= '2014-09-01'
AND tblenquiry.EnquiryDate < '2014-09-01' + INTERVAL 1 MONTH
Two changes here: First, the NOT ... IS NULL search is unnecessary because if the item you're searching on is null, there's no way for your EnquiryDate to be valid. So we just change the LEFT JOIN to an ordinary inner JOIN and get rid of the otherwise expensive NULL scan.
Second, we recast the date matching as a range scan, so it can use an index on tbl.EnquiryDate.
Cool.
Next, we have this query level.
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
WHERE tblinvoice.ClientID IN (
/* that list of Client IDs from the innermost query */
)
GROUP BY tblinvoice.ClientID
That is pretty straightforward. But MySQL isn't too swift with IN () clauses, so let's recast it in the form of a JOIN as follows:
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN (
/* that list of Client IDs from the innermost query */
) AS inq ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID
This gets us the list of invoice IDs which were the subject of the first enquiry of the month on behalf of each distinct ClientID. (It's hard for me to figure out the business meaning of this, but I don't understand your business.)
Finally, we come to your outermost query. We can also recast that as a JOIN, like so.
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN (
/* that list of first-in-month invoices */
) AS inq2 ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID
So, this all expands to:
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN (
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN (
SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE tblenquiry.EnquiryDate >= '2014-09-01'
AND tblenquiry.EnquiryDate < '2014-09-01' + INTERVAL 1 MONTH
) AS inq ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID
) AS inq2 ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID
That should do the trick for you. In summary, the big optimizing changes are
using a date range scan.
eliminating the NOT ... IS NULL criterion.
recasting your IN clauses as JOIN clauses.
The next step will be to create useful indexes. A compound index (EnquiryDate, EnquiryID) on your tblenquiry is very likely to help a lot. But to be sure you'll need to do some EXPLAIN analysis.
What if you modify your above posted query, to replace the subquery with JOIN (INNER JOIN) like below. Give it a try.
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN
(
SELECT MIN(ti.InvoiceID) as MinInvoice
FROM tblinvoice ti
JOIN
(
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry
ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
) tab
on ti.ClientID = tab.ClientID
GROUP BY ti.ClientID
) tab1
on tblinvoiceproduct.InvoiceID = tab1.MinInvoice