I have a MYSQL question:
can anybody tell me a way how to measure if an IN() clause is getting nonperformance or not.
So far I am having a table which holds about 5.000 rows and the IN() will check up to 100 IDs. it may grow up to 50.000 in the next two years.
Thanks
NOTE
with nonperformant I mean, to be in effective, slowly, bad performance, ...
UPDATE
It's a decission finding problem; so the EXPLAIN Command in MySql does not answer my question. When the perfromance is bad, I can see it myself. But I want to know it before I start to design in a way, which might be the wrong...
UPDATE
I am searching for a measuring technique for general purpose.
You would use the EXPLAIN statement to check how the query is being executed. It displays information from the optimizer about the query execution plan, how it would process the statement, and how tables are joined and in which order.
There are many times that a JOIN can be used in place of an IN, which should yield better performance. Additionally, indices make a significant difference on how fast the query runs.
We would need to see your query and an EXPLAIN at the very least.
you can use the mysql explain statement to get the query plan. Just enter explain in front of your select and see what it says. You will need to learn how to read it but it is very helpful in identifying if a query is as fast as you would expect.
mysql also does not have the best query optimizer. In my experience sometimes it is faster to run 100 simple and fast queries than to run a complicated join. This is a rare case but I have gotten performance increases from it
Related
My queries are running slow with some indexes chosen over others. I am trying to find a tool or a guide using which I can figure out why MySQL decided to give preference to 1 index or 1 table(in case of joins) than others so that I can fine-tune the index or the query.
Till now, I haven't come across an article which explains it in detail or a tool which can provide me with the details of it.
Any inputs will be appreciated. Thanks a ton in advance!
As the Optimizer gets more sophisticated, it gets harder to understand what it is doing. The latest improvements involve "cost-based" analysis of possible execution methods. For many queries, it is obvious that one index would be better than another.
There are 4 views into what is going on:
EXPLAIN is quite limited. It does not handle LIMIT very well, nor does it necessarily say which step uses the filesort, or even if there are multiple filesorts. In general, the "Rows" column is useful, but in certain situations, it is useless. Simple rule: A big number is a bad sign.
EXPLAIN EXTENDED + SHOW WARNINGS; provides the rewritten version of the query. This does not do a lot. It does give clues of the distinction between ON and WHERE in JOINs.
EXPLAIN FORMAT=JSON provide more details into the cost-based analysis and spells out various steps, including filesorts.
"Optimizer trace" goes a bit further. (It is rather tedious to read.)
As for a "visualization", no. Anyway, EXPLAIN and its friends only work with what they have. That is they do not give clues of "what if you added INDEX(a,b)". That is what is really needed. Nor does it effectively point out that you should not "hide an indexed column in a function call". Example: WHERE DATE(dt) = '2019-01-23'. Note that some 'operators' are effectively function calls.
I've seen a few "graphical Explains", but they seem to be nothing more than boxing up the rows of EXPLAIN and drawing lines between them.
I have been struggling with these problems for many years, and have written a partial answer -- namely a "cookbook". It approaches indexing from the other direction -- explaining what index to add for a given SELECT. Unfortunately, it only works for simpler queries. http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I work on a lot of performance questions on this forum in hopes of getting more insight into what to add to the cookbook next. For now, you can help me by posting your tough query, together with EXPLAIN SELECT and SHOW CREATE TABLE(s).
Some random comments:
"Index merge intersection" is perhaps always not as good as a composite index.
"Index merge union" is almost never used. You might be able to turn OR into UNION effectively.
The newer Optimizer creates an index on the fly for "derived tables" (JOIN ( SELECT ... )). But this is rarely as efficient as rewriting the query to avoid such a subquery when it returns lots of rows. (Again, none of the EXPLAINs will point you this way.)
Something often forgotten about (but does show up as an unexplained large "Rows"): COLLATIONs must match.
A trick to make use of the PK clustering: PRIMARY KEY(foo, id), INDEX(id)
Nothing (except experience) says how nearly useless "prefix" indexing is (INDEX(bar(10))).
FORCE INDEX is handy for experimenting, but almost always a bad idea for production.
In a SELECT with a JOIN, and a WHERE that mentions only one of the tables, the Optimizer will usually pick the table mentioned in WHERE as the first table. Then it will do a "Nested Loop Join" into the other table(s).
(I should add some of this to my Cookbook. 'Stay tuned'. Update: Done.)
My thinking is that if I put my ANDs that filter out a greater number of rows before those that filter out just a few, my query should run quicker since that selection set is much smaller between And statements.
But does the order of AND in the WHERE clause of an SQL Statement really effect the performance of the SQL that much or are the engines optimized already for this?
It really depends on the optimiser.
It shouldn't matter because it's the optimiser's job to figure out the optimal way to run your query regardless of how you describe it.
In practice, no optimiser is perfect so you might find that re-ordering the clauses does make a difference to particular queries. The only way to know for sure is to test it yourself with your own schema, data etc.
Most SQL engines are optimized to do this work for you. However, I have found situations in which trying to carve down the largest table first can make a big difference - it doesn't hurt !
A lot depends how the indices are set up. If an index exists which combines the two keys, the optimizer should be able to answer the query with a single index search. Otherwise if independent indices exist for both keys, the optimizer may get a list of the records satisfying each key and merge the lists. If an index exists for one condition but not the other, the optimizer should filter using the indexed list first. In any of those scenarios, it shouldn't matter what order the conditions are listed.
If none of the conditions apply, the order the conditions are specified may affect the order of evaluation, but since the database is going to have to fetch every single record to satisfy the query, the time spent fetching will likely dwarf the time spent evaluating the conditions.
I was wondering if there is a performance gain between a SELECT query with a not very specific WHERE clause and another SELECT query with a more specific WHERE clause.
For instance is the query:
SELECT * FROM table1 WHERE first_name='Georges';
slower than this one:
SELECT * FROM table1 WHERE first_name='Georges' AND nickname='Gigi';
In other words is there a time factor that is link to the precision of the WHERE clause ?
I'm not sure to be very understandable and even if my question takes into account all the components that are involved in database query (MYSQL in my case)
My question is related to the Django framework because I would like to cache an evaluated queryset, and on a next request, take back this cached-evaluated queryset, filter it more, and evaluate it again.
There is no hard and fast rule about this.
There can be either an increase or decrease in performance by adding more conditions to the WHERE clause, as it depends on, among other things, the:
indexing
schema
data quantity
data cardinality
statistics
intelligence of the query engine
You need to test with your data set and determine what will perform the best.
MySql server must compare all columns in your WHERE clause (if all joined by AND ).
So if you don't have any index on column nickname second query will by slightly slower.
Here you can read how column indexes works (with examples similar to your question): http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
I think is difficult to answer this question, too many aspects (e.g.: indexes) are involved. I would say that the first query is faster than the first one, but I can't say for sure.
If this is crucial for you, why don't you run a simulation (e.g.: run 1'000'000 of queries) and check the time?
Yes, it can be slower. It will all depend on indexes you have and data distribution.
Check the link Understanding the Query Execution Plan
for information on how to know what MySQL is going to do when executing your query.
I ran the same query in number of tables (containing different no of records):
SELECT * FROM `tblTest`
ORDER BY `tblTest`.`DateAccess` DESC;
Why the first queries behave erratically (take longer then second, third...)?
I calculated the average of the second, third and fourth query, exuding the first query.
So for example, in a table with 1,000,000 records, the first time to proccess takes 4.8410 s and second time - only 0.8940 s. Why is this happening?
p.s. I use phpMyAdmin tool.
DBMS are really smart applications and maintain multiple catalogues to optimize their execution. When a query is run it generates many entries in the database depending on the DBMS used these catalogues will be more optimized and can even go to automatically generate index to optimize really often used queries. They also all have what is call a query optimizer which analyzes the plan of the query execution in order to optimize the execution plan.
In your specific case, you should look at query and result caching, the following article should help you understand how mysql natively tries to optimize query processing.
http://dev.mysql.com/doc/refman/5.5/en/query-cache.html
http://www.cyberciti.biz/tips/enable-the-query-cache-in-mysql-to-improve-performance.html
Here is a comparison between oracle, mysql and postgres (not a new article but will give you a basic idea of how different dbms have different way of handling complex queries on large databases)
http://dcdbappl1.cern.ch:8080/dcdb/archive/ttraczyk/db_compare/db_compare.html#Query+optimization
Cheers,
I am sending queries to a very large database (meaning many entities/tables).
So I have some queries which include some 7 to 8 joins.
The problem is, that I do not know, how many entries the tables will have in near future. It could be between 1.000 to 100.000 rows each table (or even more).
I think about splitting my queries to perform two or three queries consecutively instead of one mega-query.
Is there a common/recommended limit of JOIN's in an MySQL Query?
How can I measure/calculate which type of splitting would be a good variant (depending on count-of-rows in the tables, and so on)?
I have many JOIN's on the same field (foreign-key) of the same table. Is there a way to optimize that as well? (one row in that table - has many relations/connections)
thanks ;)
UPDATE:
I saw it too late. Somebody was so nice and changed the title of the question.
Because of my bad English I wrote performant - meaning having good performance. I did't mean to perform!
Please consider this in your answers. thank again!
You probably want to learn about EXPLAIN which will show you what MySQL's plan is for executing your query. e.g
EXPLAIN SELECT foo FROM bar NATURAL JOIN baz
will tell you how MySQL would execute the query SELECT foo FROM bar NATURAL JOIN baz
From the EXPLAIN results you may see opportunities to add indexes to the database that will help your queries if they're slow, and in some cases, you may be able to add hints to the query e.g. telling MySQL to prefer one index over another if you have the experise to know that.
In general you will gain nothing from trying to "split up" a query unless your "splitting up" actually completely changes the semantics of what will need to be executed. e.g. if your query is fetching six unrelated things from the database, and you re-write this as six separate queries each fetching one thing, the aggregate time taken to execute will probably be no better (and may be much worse) for your separate queries.
use 'desc (query);' to get a sense of how MySQL will treat your query. You are generally better off having MySQL do the joining and optimizing than doing it yourself. That's what its good at.
This will also tell you where indexing is working or needs to be augmented.