Hi I know there is a lot of topics dedicated to query optimizing strategies, but this one is so specific I couldnt find the answer anywhere on the interenet.
I have large table of product in eshop (appx. 180k rows) and the table has 65 columns. Yeah yeah I know its quite a lot, but I store there information about books, dvds, bluerays and games.
Still I am not considering a lot of cols into query, but the select is still quite tricky. There are many conditions that need to be considered and compared. Query below
SELECT *
FROM products
WHERE production = 1
AND publish_on < '2012-10-23 11:10:06'
AND publish_off > '2012-10-23 11:10:06'
AND price_vat > '0.5'
AND ean <> ''
AND publisher LIKE '%Johnny Cash%'
ORDER BY bought DESC, datec DESC, quantity_storage1 DESC, quantity_storege2 DESC, quantity_storage3 DESC
LIMIT 0, 20
I have already tried to put there indexes one by one on cols in where clause and even in order by clause, then I tried to create compound index on (production, publish_on, publish_off, price_vat, ean).
Query is still slow (couple of seconds) and it need to be fast since its eshop solution and people are leaving as they are not getting their results fast. And I am still not counting the time I need to perform the search for all found rows so I can make paging.
I mean, the best way to make it quick is to simplify the query, but all the conditions and sorting is a must in this case.
Can anyone help with this kind of issue? Is it even possible to speed this kind of query up, or is there any other way how I can for example simplify the query and leave the rest on php engine to sort the results..
Oh, Iam really clueless in this.. Share your wisdom peple, please...
Many thanks in advance
First of all be sure what you want to select and erase the '*'
Select * from
with something more specific
Select id, name, ....
There is no Join or anything other in your table so the speed up options are quite small I think.
Check that your mysql Server can use enough memory.
Have a look at this confis in your my.cnf
key_buffer_size = 384M;
myisam_sort_buffer_size = 64M;
thread_cache_size = 8;
query_cache_size = 64M
Have a look a max allowed concurrency. mysql recommends CPU's*2
thread_concurrency = 4
You should really thinks about splitting the table depending on informations you use and on standard normalization. If possible.
If it's a productive system with no way to split the tables then think about a caching server. But this will only help if you have a lot of recurring querys that are the same.
This is what I would do when knowing nothing about the underlying implementation or the system at all.
Edit:
Making as many columns indexable as you can won't necessarily speed up your system. The more indexes ≠ the more speed.
thx to all of you for good remarks..
I found the solution probably 'cause I was able to reduce query time from 2,8s down to 0,3 sec.
SOLUTION:
using SELECT * is really naive on large tables (65cols) so I realized I only need 25 of them on page - other can be easily used on product page itself.
I also reindexed my table little bit. I created compound index on
production, publish_on, publish_off, price_vat, ean
then I created another one specificaly for search including cols
title, publisher, author
last thing what I did was to use query like
SELECT SQL_CALC_FOUND_ROWS ID, title, alias, url, type, preorder, subdescription,....
which allowed me to calculate influenced rows quicker using
mysql_result(mysql_query("SELECT FOUND_ROWS()"), 0)
after mysql_query()... However I cannot understand how it could be quicker, because EXPLAIN EXTENDED says the query is not using any index, its still 0,5s quicker then calculate the number of rows in individual query.
It seems to be working rather fine. If the order by clause wasnt there it would be evil quick, but thats something I have no influence on.
Still need to check my server settings...
Thank y'all for all your help..
Related
I'm not very knowledgeable about databases. I would want to retrieve, say the "newest" 10 rows with owner ID matching something, and then perhaps paginate to retrieve the next "newest" 10 rows with that owner, and so on. But say I'm adding more and more rows into a database table -- at some point, would such a query become unbearably slow, or are databases generally good enough that this won't be a worry?
I imagine it would be an issue because to get the "newest" 10 rows you'd have to order by date, which is O(n log n). With this assumption, I sought a possible solution from SQL Server SELECT LAST N Rows.
It pointed me to http://www.sqlservercurry.com/2009/02/retrieve-last-n-rows-based-on-condition.html where I found that there is a PARTITION BY option for a query. I imagine this means first selecting all the rows that match the owner ID, and THEN ordering them, which would be significantly faster, and fast enough to not worry about for most applications. Is this the correct understanding?
Otherwise, is there some better way to get the "newest" N rows ( seems to suggest it is)?
I'm developing the app in Django if anyone knows a convenient way, but otherwise Django also allows raw database queries.
Okay, if you are using django, then you don't have to worry about DB's complexity. ORM is here to resolve your worries.
Simple fact, Django uses lazy query. So, it will reduce your DB hits and improve system performance.
So, according to your initial part of question, you can simply run this query:
queryset = YourModel.objects.filter(**lookup_condition).order_by('id')
It will get a queryset with the objects which match the condition from database of that Model class. For details, check this: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.filter
And to paginate over it, run like this:
first_ten_values = queryset[0:9]
second_ten_values = queryset[10:19]
...
I am building a forum and I am looking for the proper way to build a search feature that finds users by their name or by the title of their posts. What I have come up with is this:
SELECT users.id, users.user_name, users.user_picture
FROM users, subject1, subject2
WHERE users.id = subject1.user_id
AND users.id = subject2.user_id
AND (users.user_name LIKE '%{$keywords}%'
OR subject1.title1 LIKE '%{$keywords}%'
OR subject2.title2 LIKE '%{$keywords}%')
ORDER BY users.user_name ASC
LIMIT 10
OFFSET {$offset}
The LIMIT and the OFFSET is for pagination. My question is, would doing a LIKE search through multiple tables greatly slow down performance when the number of rows reach a significant amount?
I have a few alternatives:
One, perhaps I can rewrite that query to have the LIKE searches done inside a subquery that only returns indexed user_ids. Then, I would find the remaining user information based on that. Would that increase performance by much?
Second, I suppose I can have the $keyword string appear before the first wildcard as in LIKE {$keyword}%. This way, I can index the user_name, title1, and title2 columns. However, since I will be trading accuracy for speed here, how much of a difference in performance would this make? Will it be worth sacrificing this much accuracy to index these columns?
Third, perhaps I can give users 3 search fields to choose from, and have each search through only one table. Would this increase performance by much?
Lastly, should I consider using a FULLTEXT search instead of LIKE? What are the performance differences between the two? Also, my tables are using the InnoDB storage engine, and I am not able to use the FULLTEXT index unless I switch to MyISAM. Will there be any major differences in switching to MyISAM?
Pagination is another performance issue I am worried about, because in order to do pagination, I would need to find the total number of results the query returns. At the moment, I am basically doing the query I just mentioned TWICE because the first time it is used only to COUNT the results.
There are two things in your query that will prevent MySql from using indexes firstly your patterns start with a wildcard %, MySql can't use indexes to search for patterns that start with a wildcard, secondly you have OR in your WHERE clause you need to rewrite your query using UNION to avoid using OR which also prevents MySql from using indexes. Without using an index MySql needs to do a full table scan every time and the time needed for that will increase linearly as the number of rows grow in your table and yes as you put it "it would greatly slow down performance when the number of rows reach a significant amount" so I'd say your only real scalable option is to use FULLTEXT search.
Most of your questions are explained here: http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
InnoDB/fulltext indexing is announced for MySQL 5.6, but that will probably not help you right now.
How about starting with EXPLAIN <select-statement>? http://dev.mysql.com/doc/refman/5.6/en/explain.html
Switching to MyISAM should work seemlessly. The only downside is, that MyISAM is locking the whole table upon inserts/updates, which can be slow down tables with many more inserts than selects. Basically a rule of thumb in my opinion is to use MyISAM when you don't need foreign keys and the table has far more selects than inserts and use InnoDB when the table has far more inserts/updates than selects (e.g. for a statistic table).
In your case I guess switching to MyISAM is the better choice as a fulltext index is way more powerful and faster.
It also delivers the possibilty to use certain query modifiers like excluding words ("cat -dog") or similar. But keep in mind that it's not possible to look for words ending with a phrase anymore like with a LIKE-search ("*bar"). "foo*" will work though.
1. So if I search for the word ball inside the toys table where I have 5.000.000 entries does it search for all the 5 millions?
I think the answer is yes because how should it know else, but let me know please.
2. If yes: If I need more informations from that table isn't more logic to query just once and work with the results?
An example
I have this table structure for example:
id | toy_name | state
Now I should query like this
mysql_query("SELECT * FROM toys WHERE STATE = 1");
But isn't more logical to query for all the table
mysql_query("SELECT * FROM toys"); and then do this if($query['state'] == 1)?
3. And something else, if I put an ORDER BY id LIMIT 5 in the mysql_query will it search for the 5 million entries or just the last 5?
Thanks for the answers.
Yes, unless you have a LIMIT clause it will look through all the rows. It will do a table scan unless it can use an index.
You should use a query with a WHERE clause here, not filter the results in PHP. Your RDBMS is designed to be able to do this kind of thing efficiently. Only when you need to do complex processing of the data is it more appropriate to load a resultset into PHP and do it there.
With the LIMIT 5, the RDBMS will look through the table until it has found you your five rows, and then it will stop looking. So, all I can say for sure is, it will look at between 5 and 5 million rows!
Read this about indexes :-)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
It makes it uber-fast :-)
Full table scan is here only if there are no matching indexes and indeed very slow operation.
Sorting is also accelerated by indexes.
And for the #2 - this is slow because transfer rate from MySQL -> PHP is slow, and MySQL is MUCH faster at doing filtering.
For your #1 question: Depends on how you're searching for 'ball'. If there's no index on the column(s) where you're searching, then the entire table has to be read. If there is an index, then...
WHERE field LIKE 'ball%' will use an index
WHERE field LIKE '%ball%' will NOT use an index
For your #2, think of it this way: Doing SELECT * FROM table and then perusing the results in your application is exactly the same as going to the local super walmart, loading the store's complete inventory into your car, driving it home, picking through every box/package, and throwing out everything except the pack of gum from the impulse buy rack by the front till that you'd wanted in the first place. The whole point of a database is to make it easy to search for data and filter by any kind of clause you could think of. By slurping everything across to your application and doing the filtering there, you've reduced that shiny database to a very expensive disk interface, and would probably be better off storing things in flat files. That's why there's WHERE clauses. "SELECT something FROM store WHERE type=pack_of_gum" gets you just the gum, and doesn't force you to truck home a few thousand bottles of shampoo and bags of kitty litter.
For your #3, yes. If you have an ORDER BY clause in a LIMIT query, the result set has to be sorted before the database can figure out what those 5 records should be. While it's not quite as bad as actually transferring the entire record set to your app and only picking out the first five records, it still involves a bit more work than just retrieving the first 5 records that match your WHERE clause.
right to business.
I have an activity feed which gets all different kinds of activity from different parts of my site sorts them all out by means of using UNION and an ORDER BY and then a LIMIT to get the top 25 and then displays the activity.
My fellow programmer says that we will run into problems when we have more rows (currently we have 800) and it's fine.
So the question.
Will the UNION cause slow down later down the line?
If so should we
a) Try and put the activity into a new table and then query that.
b) Try some sort of view? - (if so could anyone explain how I'm not too sure how!)
c) Other...
Thanks for your help.
Richard
Why not just limit each of the individual queries to 25 as well? That way you could restrict the amount of rows that come back before being unioned and limited for the final list
This is a tricky one as it depends on a lot of variables, such as how busy your tables are etc.
I would imagine the union would slow you down later, as the method you are using is basically doing a union on entire tables so these need to be read into memory before the ordering and the limiting of the number of rows is applied. Your query will get slower as the amount of data increases.
If all the data in the tables is important then the best you can do it try and ensure you index the tables as best you can so at least your ordering runs fast etc. IF some of the data in the tables gets old or stale and you aren't too interested in it, then you might have scope to read just the rows you need into a temp table. This can then be ordered etc.
I think the best way might be to do a combination of the 2 so
a)yes Index
b) then do the LIMIT 25 on each of the sub queries.
c) Do a where added_date >= date on each of the queries so that we have the correct date order
mmm but then that presents problems as to what dates to go for. So if we go to page x which dates do I get.
This is turning into a problem and quite a big one. The size of data we have is going to be quite big.
Thanks for your help.
Richard
Decided just to make a log and do it that way.
Thanks as always for your help.
Richard
We recently had an issue I'd never seen before, where, for about 3 hours, one of our Mysql tables got extremely slow. This table holds forum posts, and currently has about one million rows in it. The query that became slow was a very common one in our application:
SELECT * FROM `posts` WHERE (`posts`.forum_id = 1) ORDER BY posts.created_at DESC LIMIT 1;
We have an index on the posts table on (forum_id, created_at) which normally allows this query and sort to happen in memory. But, during these three hours, notsomuch. What is normally an instantaneous query ranged from taking 2 seconds-45 seconds during this time period. Then it went back to normal.
I've pored through our slow query log and nothing else looks out of the ordinary. I've looked at New Relic (this is a Rails app) and all other actions ran essentially the same speed as normal. We didn't have an unusual number of message posts today. I can't find anything else weird in our logs. And the database wasn't swapping, when it still had gigs of memory available to use.
I'm wondering if Mysql could change its mind back and forth about which indexes to use for a given query, and for whatever reason, it started deciding to do a full table scan on this query for a few hours today? But if that were true, why would it have stopped doing the full table scans?
Has anyone else encountered an intermittently slow query that defied reason? Or do you have any creative ideas about how one might go about debugging a problem like this?
I'd try the MySQL EXPLAIN statement...
EXPLAIN SELECT * FROM `posts` WHERE (`posts`.forum_id = 1) ORDER BY posts.created_at DESC LIMIT 1;
It may be worth checking the MySQL response time in your Rails code, and if it exceeds a threshold then run the EXPLAIN and log the details somewhere.
Table locking also springs to mind - is the posts table updated by a cronjob or hefty query while SELECTs are going on?
Hope that helps a bit!
On a site I work on, we recently switched to InnoDB from MyISAM, and we found that some simple select queries which had both WHERE and ORDER BY clauses were using the index for the ORDER BY clause, resulting in a table scan to find the few desired rows (but, heck, they didn't need to be sorted when it finally found them all!)
As noted in the linked article, if you have a small LIMIT value, your ORDER BY clause is the first member of the primary key (so the data on file is ordered by it), and there are many results that match your WHERE clause, using that ORDER BY index isn't a bad idea for MySQL. However, I presume created_at is not the first member of your primary key, so it's not a particularly smart idea in this case.
I don't know why MySQL would switch indexes if you haven't changed anything, but I'd suggest you try running ANALYZE TABLE on the relevant table. You might also change the query to remove the LIMIT and ORDER BY clauses and sort at the application level, provided the result set is small enough; or you could add a USE INDEX hint so it never guesses wrong.
You could also change the wait_timeout value to something smaller so that these queries that use a bad index simply never complete (but don't lag all of the legitimate queries too). You will still be able to run long queries interactively, even with a small wait_timeout, since there is a separate configuration parameter for that.