I have a commenting system I'm developing that I have asked another question about here. The schema is pretty much the same except I have added a rating column in the comments table and have set up a trigger to update it when there's changes in the comments_ratings table in order to avoid calculating the rating every time I need to fetch comments.
So when I execute my query to fetch the latest comments:
SELECT
c.*, COALESCE (COUNT(r.id), 0) AS replies
FROM
(
SELECT
...
FROM
comments c
LEFT JOIN users u ON u.id = c.author
LEFT JOIN comments_ratings crv ON crv.COMMENT = c.id
AND crv.USER = ?
WHERE
c.item = ?
AND c.type = ?
ORDER BY
c.id DESC
LIMIT 0,
10
) AS c
LEFT JOIN comments r ON c.id = r.reply
GROUP BY
c.id
ORDER BY
c.id DESC
I get a result back in a matter of 0.001 ~ 0.003 seconds, and I can confirm cache is not helping me because I have tried limiting by random values and the time is always in this range.
However if I try to order by rating instead of c.id, the query takes 30+ seconds (I have a lot of test data). When I open up the profiler I see that more than 90% of the time it has used for Copying to tmp table. I suppose that it is copying the entire table into a in-memory table and sorting it there, but I don't understand why (if) that's happening since I have created an index on column rating which should help?
The reason I broke database normalization and created the rating column instead of calculating it was to be able to index it in order to ease sorting.
I am pretty confused at this point, do you see what I'm doing wrong?
Related
We are using MySQL Aurora, we have 1 million customers in the customer table and the customer table has around 30 columns.
When the first time we load, we load 50 records with a limit and order by date with the total customer count. This first loading takes approx 60 sec. We also have different segmentation options by which user can filter their data. We have indexed all the important columns that can help us to speed up the filter query. We also tried increasing the server resources but does not see significant changes.
We are looking for a solution that can help to load the records in under 3 sec for the first time and with different segmentation also.
We are open to any solution, and looking for the answer to the following questions:
Doest the AWS MySQL compatible Aurora is enough capable of doing this, doest it can support this much faster query with complex filters?
Should we plan to sync our data to any other type of database server and load from there?
Any suggestion for the database server that is frequent insert/update supported and can query extremely fast records.
Any help on the above point will be really appreciated.
UPDATES:
SELECT c.id as customer_id, `c`.first_name, `c`.last_name, `c`.email, `ssh`.`shop_name`, `ssh`.`domain`, `cx`.`is_push_access`, `cx`.`last_contacted_by`, `cx`.`last_email_open_type`, `cx`.`last_email_open_date`, `cx`.`last_contacted_date`, `cx`.`last_email_link_click_date`, `cx`.`avg_order_amount`, `cx`.`email_marketing`, `cx`.`loyalty_email`, `cx`.`newsletter_email`, `cx`.`review_email`, `cx`.`lyt_points_earned`, `cx`.`is_product_reviewed`, `cx`.`is_bounce`, `cx`.`lyt_points_redeemed`, (lyt_points_earned - cx.lyt_points_redeemed - cx.lyt_points_expire) AS `points_remaining`, `cx`.`lyt_is_refer`, `cx`.`lyt_cus_birthday`, `cx`.`lyt_customer_status`, `cx`.`lyt_points_expire`, `cx`.`contact_status`, `cx`.`source_detail`, `cx`.`total_sent`, `cx`.`last_order_date`, `cx`.`utm_source`, `cx`.`utm_medium`, `cx`.`utm_campaign`, `cx`.`utm_content`, `cx`.`utm_term`, `cx`.`total_campaign_sent`, `cx`.`total_campaign_opened`, `cx`.`total_campaign_clicked`, `cx`.`last_campaign_sent_date`, `cx`.`last_campaign_opened_date`, `cx`.`last_campaign_clicked_date`, `cx`.`total_campaign_delivered`, `ca`.`company`, `ca`.`address1`, `ca`.`address2`, `ca`.`city`, `ca`.`province`, `ca`.`country`, `ca`.`zip`, `ca`.`province_code`, `ca`.`country_code`, `ct`.`tier_name`, `aft`.`code` AS `affiliate_code`
FROM `tbl_customers` AS `c`
LEFT JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN `aio_customer_extra` AS `cx` ON (c.id = cx.customer_id)
LEFT JOIN `tbl_customers_address` AS `ca` ON ca.id = cx.customer_default_address_id
LEFT JOIN `aio_lyt_customer_tier` AS `ct` ON cx.lyt_customer_tier_id = ct.id
LEFT JOIN `aio_customer_custom` AS `acc` ON acc.customer_id = c.id
LEFT JOIN `aio_aft_affiliates` AS `aft` ON aft.customer_id = c.id WHERE (c.shop_id = 'xxxx')
GROUP BY `c`.`id`
ORDER BY `c`.`last_seen_date` DESC, `c`.`id` DESC
LIMIT 20
Note:
All the foreign keys and GROUP and ORDER BY columns are properly indexed.
If we remove the GROUP BY and ORDER BY clause, the query executes extremely fast (under 1 sec), but we can't permanently remove that, with GROUP BY and ORDER BY it's taking 45 sec.
Indexing
If you don't already have these indexes, I recommend adding them:
c: INDEX(shop_id, id, last_seen_date) -- in this order
cx: INDEX(customer_id)
aft: INDEX(customer_id, code)
acc: INDEX(customer_id)
(If any is already the PRIMARY KEY, don't redundantly add an INDEX.)
If that does not help enough, please provide EXPLAIN SELECT ... and move on to...
Reformulate to avoid "explode-implode":
What you have gathers lots of data, then GROUPs them and ORDERs them. Instead, let's turn the query inside out.
find the 20 customers.id that are needed -- do this with as little effort as possible.
gather all the other stuff needed
Step 1 is probably simply
SELECT id
FROM `tbl_customers` AS `c1`
WHERE (c1.shop_id = 'xxxx')
GROUP BY `c1`.`id`
ORDER BY `c1`.`last_seen_date` DESC, `c1`.`id` DESC
LIMIT 20
Note that the index I recommended above will be "covering" for this subquery.
Step 2:
SELECT [ lots of stuff ]
FROM ( [ the above query ] ) AS c2
JOIN tbl_customers AS c USING(customer_id)
LEFT JOIN [ all the LEFT JOINs, as you have ]
ORDER BY c.last_seen_date DESC, c.id DESC -- yes, this is repeated
(Note that the GROUP BY and LIMIT exist only in the subquery.) (I assume that the PK of tbl_customers is customer_id.)
Recently I wrote a PHP web app to gather a list of data and output it. Originally I thought the PHP code was running slow but I checked the amount of time this query takes to run and noticed it's MySQL and not PHP.
My conclusion is that I need to make indexes on these tables but I wanted to get feedback from.others before moving forward and doing that.
Here's my query:
SELECT *
FROM claims c
LEFT
JOIN claims_data d
ON c.claim_number = d.claim_number
LEFT
JOIN merchant_category_code m
ON c.procedure_code = m.code
LEFT JOIN claim_log l
ON c.claim_number = l.claim_number
WHERE c.social_security_num = :num
ORDER
BY c.start_date DESC
For your QUERY, indexes are needed for each line with
ON left and right side of = for each table, column
WHERE for each table, column
SORT BY each table, column
spend some time with MYSQL INDEX TUTORIALS
so 2 (more so 3) questions, is my query just badly coded or thought out ? (be kind, I only just discovered cross apply and relatively new) and is corss-apply even the best sort of join to be using or why is it slow?
So I have a database table (test_tble) of around 66 million records. I then have a ##Temp_tble created which has one column called Ordr_nbr (nchar (13)). This is basically ones I wish to find.
The test_tble has 4 columns (Ordr_nbr, destination, shelf_no, dte_bought).
This is my current query which works the exact way I want it to but it seems to be quite slow performance.
select ##Temp_tble.Ordr_nbr, test_table1.destination, test_table1.shelf_no,test_table1.dte_bought
from ##MyTempTable
cross apply(
select top 1 test_table.destination,Test_Table.shelf_no,Test_Table.dte_bought
from Test_Table
where ##MyTempTable.Order_nbr = Test_Table.order_nbr
order by dte_bought desc)test_table1
If the ##Temp_tble only has 17 orders to search for it take around 2 mins. As you can see I'm trying to get just the most recent dte_bought or to some max(dte_bought) of each order.
In term of index I ran database engine tuner and it says its optimized for the query and I have all relative indexes created such as clustered index on test_tble for dte_bought desc including order_nbr etc.
The execution plan is using a index scan(on non_clustered) and a key lookup(on clustered).
My end result is it to return all the order_nbrs in ##MyTempTble along with columns of destination, shelf_no, dte_bought in relation to that order_nbr, but only the most recent bought ones.
Sorry if I explained this awfully, any info needed that I can provide just ask. I'm not asking for just downright "give me code", more of guidance,advice and learning. Thank you in advance.
UPDATE
I have now tried a sort of left join, it works reasonably quicker but still not instant or very fast (about 30 seconds) and it also doesn't return just the most recent dte_bought, any ideas? see below for left join code.
select a.Order_Nbr,b.Destination,b.LnePos,b.Dte_bought
from ##MyTempTble a
left join Test_Table b
on a.Order_Nbr = b.Order_Nbr
where b.Destination is not null
UPDATE 2
Attempted another let join with a max dte_bought, works very but only returns the order_nbr, the other columns are NULL. Any suggestion?
select a.Order_nbr,b.Destination,b.Shelf_no,b.Dte_Bought
from ##MyTempTable a
left join
(select * from Test_Table where Dte_bought = (
select max(dte_bought) from Test_Table)
)b on b.Order_nbr = a.Order_nbr
order by Dte_bought asc
K.M
Instead of CROSS APPLY() you can use INNER JOIN with subquery. Check the following query :
SELECT
TempT.Ordr_nbr
,TestT.destination
,TestT.shelf_no
,TestT.dte_bought
FROM ##MyTempTable TempT
INNER JOIN (
SELECT T.destination
,T.shelf_no
,T.dte_bought
,ROW_NUMBER() OVER(PARTITION BY T.Order_nbr ORDER BY T.dte_bought DESC) ID
FROM Test_Table T
) TestT
ON TestT.Id=1 AND TempT.Order_nbr = TestT.order_nbr
I'm still having problems understanding how to read, understand and optimize MySQL explain. I know to create indices on orderby columns but that's about it. Therefore I am hoping you can help me tune this query:
EXPLAIN
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN (
SELECT *
FROM image
ORDER BY karma DESC
) AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC
LIMIT 0 , 24
What this query does is to find the photo with the most karma for a specie. You can see the result of this live:
http://www.jungledragon.com/species
I have a table of species, a table of images, a mapping table in between and an imagefile table, since there are multiple image files (formats) per image.
Explain output:
For the specie table, I have indices on its primary id and the field commonname. For the image table, I have indices on its id and karma field, and a few others not relevant to this question.
This query currently takes 0.8 to 1.1s which is too slow in my opinion. I have a suspicion that the right index will speed this up many times, but I don't know which one.
I think you'd go a great way by getting rid of the subquery. Look at the first and last rows of the "explain" result - it's copying the entire "image" table to a temporary table. You could obtain the same result by replacing the subquery with INNER JOIN image and moving ORDER BY karma DESC to the final ORDER BY clause:
SELECT specie.id, specie.commonname, specie.block_description, maximage.title,
maximage.karma, imagefile.file_name, imagefile.width, imagefile.height,
imagefile.transferred
FROM specie
INNER JOIN specie_map ON specie_map.specie_id = specie.id
INNER JOIN image AS maximage ON specie_map.image_id = maximage.id
INNER JOIN imagefile ON imagefile.image_id = maximage.id
AND imagefile.type = 'small'
GROUP BY specie.commonname
ORDER BY commonname ASC, karma DESC
LIMIT 0 , 24
The real problem is that there is no need to optimize MySQL explain. There is usually a query (or several queries) that you want to be efficient and EXPLAIN is a way to see if the execution of the query is going to happen as you expect it to happen.
That is you need to understand how the execution plan should look like and why and compare it with results of the EXPLAIN command. To understand how the plan is going to look like you should understand how indexes in MySQL work.
In the meantime, your query is a tricky one, since for efficient index using it has some limitations: a) simultaneous ordering and by a field from one table, and b) finding the last element in each group from another (the latter is a tricky task as itself). Since your database is rather small, you are lucky that you current query is rather fast (though you consider it slow).
I would rewrite the query in a bit hacky manner (I assume that there is at least one foto for each specie):
SELECT
specie.id, specie.commonname, specie.block_description,
maximage.title, maximage.karma,
imagefile.file_name, imagefile.width, imagefile.height, imagefile.transferred
FROM (
SELECT s.id,
(SELECT i.id
FROM specie_map sm
JOIN image i ON sm.image_id = i.id
WHERE sm.specie_id = s.id
ORDER BY i.karma DESC
LIMIT 1) as image_id
FROM specie s
ORDER BY s.commonname
LIMIT 0, 24
) as ids
JOIN specie
ON ids.id = specie.id
JOIN image as maximage
ON maximage.id = ids.image_id
JOIN imagefile
ON imagefile.image_id = ids.image_id AND imagefile.type = 'small';
You will need the following indexes:
(commonname) on specie
a composite (specie_id, image_id) on specie_map
a composite (id, karma) on image
a composite (image_id, type) on imagefile
Paging now should happen within the subquery.
The idea is to make complex computations within a subquery that operates with ids only and join for the rest of the data at the top. The data would be ordered in the order of the results of the subquery.
It would be better if you could provide the table structures and indexes. I came up with this alternative, it would be nice if you could try this and tell me what happens (I am curious!):
SELECT t.*, imf.* FROM (
SELECT s.*, (SELECT id FROM image WHERE karma = MAX(i.karma) LIMIT 1) AS max_image_id
FROM image i
INNER JOIN specie_map smap ON smap.image_id = i.id
INNER JOIN specie s ON s.id = smap.specie_id
GROUP BY s.commonname
ORDER BY s.commonname ASC
LIMIT 24
) t INNER JOIN imagefile imf
ON t.max_image_id = imf.image_id AND imf.type = 'small'
I've got a users table and a votes table. The votes table stores votes toward other users. And for better or worse, a single row in the votes table, stores the votes in both directions between the two users.
Now, the problem is when I wanna list for example all people someone has voted on.
I'm no MySQL expert, but from what I've figured out, thanks to the OR condition in the join statement, it needs to look through the whole users table (currently +44,000 rows), and it creates a temporary table to do so.
Currently, the bellow query takes about two minutes, yes, two minutes to complete. If I remove the OR condition, and everything after it in the join statement, it runs in less than half a second, as it only needs to look through about 17 of the 44,000 user rows (explain ftw!).
The bellow example, the user ID is 9834, and I'm trying to fetch his/her own no votes, and join the info from user who was voted on to the result.
Is there a better, and faster way to do this query? Or should I restructure the tables? I seriously hope it can be fixed by modifying the query, cause there's already a lot of users (+44,000), and votes (+130,000) in the tables, which I'd have to migrate.
thanks :)
SELECT *, votes.id as vote_id
FROM `votes`
LEFT JOIN users ON (
(
votes.user_id_1 = 9834
AND
users.uid = votes.user_id_2
)
OR
(
votes.user_id_2 = 9834
AND
users.uid = votes.user_id_1
)
)
WHERE (
(
votes.user_id_1 = 9834
AND
votes.vote_1 = 0
)
OR
(
votes.user_id_2 = 9834
AND
votes.vote_2 = 0
)
)
ORDER BY votes.updated_at DESC
LIMIT 0, 10
Instead of the OR, you could do a UNION of 2 queries. I have known instances where this is an order of magnitude faster in at least one other DBMS, and I'm guessing MySQL's query optimizer may share the same "feature".
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_1 = u.uid
WHERE v.user_id_2 = 9834
AND v.votes_2 = 0
UNION
SELECT whatever
FROM votes v
INNER JOIN
users u
ON v.user_id_2 = u.uid
WHERE v.user_id_1 = 9834
AND v.votes_1 = 0
ORDER BY updated_at DESC
You've answered your own question: yes, you should redesign the table, as it's not working for you. It's too slow, and requires overly complicated queries. Fortunately, migrating the data is just a matter of doing essentially the query you're asking about here, but for all user instead of just one. (That is, a sum or count over the unions the first answering suggested.)