Does long query string affect the speed? - mysql

Suppose i have a long query string for eg.
SELECT id from users where collegeid='1' or collegeid='2' . . . collegeid='1000'
will it affect the speed or output in any way?
SELECT m.id,m.message,m.postby,m.tstamp,m.type,m.category,u.name,u.img
from messages m
join users u on m.postby=u.uid
where m.cid = '1' or m.cid = '1' . . . . . .
or m.cid = '1000'. . . .

I would prefer to use IN in this case as it would be better. However to check the performance you may try to look at the Execution Plan of the query which you are executing. You will get the idea about what performance difference you will get by using the both.
Something like this:
SELECT id from users where collegeid IN ('1','2','3'....,'1000')
According to the MYSQL
If all values are constants, they are evaluated according to the type
of expr and sorted. The search for the item then is done using a
binary search. This means IN is very quick if the IN value list
consists entirely of constants.
The number of values in the IN list is only limited by the
max_allowed_packet value.
You may also check IN vs OR in the SQL WHERE Clause and MYSQL OR vs IN performance
The answer given by Ergec is very useful:
SELECT * FROM item WHERE id = 1 OR id = 2 ... id = 10000
This query took 0.1239 seconds
SELECT * FROM item WHERE id IN (1,2,3,...10000)
This query took 0.0433 seconds
IN is 3 times faster than OR
will it affect the speed or output in any way?
So the answer is Yes the performance will be affected.

Obviously, there is no direct correlation between the length of a query string and its processing time (as some very short query can be tremendeously complex and vice versa). For your specific example: It depends on how the query is processed. This is something you can check by looking at the query execution plan (syntax depends on your DBMS, something like EXPLAIN PLAN). If the DBMS has to perform a full table scan, performance will only be affected slightly, since the DBMS has to visit all pages that make up the table anyhow. If there is an index on collegeid, performance will likely suffer more the more entries you put into your disjunction, since there will be several (though very fast) index lookups. At some point, there will we an full index scan instead of individual lookups, at which point performance will not degrade significantly anymore.
However - details depend ony our DBMS and its execution planner.

I' not sure you are facing what I suffered.
Actually, string length is not problem. How many values are in IN() is more important.
I've tested how many elements can be listed IN().
Result is 10,000 elements can be processed without performance loss.
Values in IN() should be stored in somewhere and searched while query evaluation. But 10k values is getting slower.
So if you have many 100k values, split 10 groups and try 10 times query. Or save in temp table and JOIN.
and long query uses more CPU, So IN() better than column = 1 OR ...

Related

Is it efficient to use IN clause for 1000 different values?

In my Java program, I need to select 1000 rows from a MySQL db with different WHERE condition for each row. Those rows must be retrieved with one single query. To do so, I have used IN clause as follows:
SELECT * FROM t WHERE (columnA,columnB) IN ((valA1,valB1),(valA2,valB2),(valA3,valB3),...)
which searches for rows where columnA=valA1 AND columnB=valB1 OR columnA=valA2 AND columnB=valB2 and so on.
Is this approach efficient to be applied for 1000 rows (means 1000 pairs of (valA,valB))? I have been looking for an efficient way for bulk select and this approach is what I have been able to find so far.
Is this efficient? That depends on factors that you haven't explained in the question.
But . . . if you have an index on (columnA, columnB) and all the values in the IN clause are constants, then MySQL should take advantage of the index. This is newish functionality available in MySQL 8+.
MySQL has another optimization for constant values in an IN list. It sorts them and uses a binary search. A table scan is still needed, but the binary search makes the comparisons much faster. I do not know if MySQL has expanded this to handle tuples as well as constants. Given the increased focus on tuple optimization, I would not be surprised if MySQL does this optimization as well.

How is it possible to have a good EXPLAIN and a slow query?

How is it possible to have a good plan in EXPLAIN like below and have a slow query. With few rows, using index, no filesort.
The query is running in 9s. The main table has around 500k rows.
When I had 250k rows in that table, the query was running in < 1s.
Suggestions plz?
Query (1. fields commented can be enabled according user choice. 2. Without FORCE INDEX I got 14s. 3. SQL_NO_CACHE I use to prevent false results):
SELECT SQL_NO_CACHE
p.property_id
, lct.loc_city_name_pt
, lc.loc_community_name_pt
, lc.loc_community_image_num_default
, lc.loc_community_gmap_longitude
, lc.loc_community_gmap_latitude
FROM property as p FORCE INDEX (ix_order_by_perf)
INNER JOIN loc_community lc
ON lc.loc_community_id = p.property_loc_community_id
INNER JOIN loc_city lct FORCE INDEX (loc_city_id)
ON lct.loc_city_id = lc.loc_community_loc_city_id
INNER JOIN property_attribute pa
ON pa.property_attribute_property_id = p.property_id
WHERE p.property_published = 1
AND (p.property_property_type_id = '1' AND p.property_property_type_sale_id = '1')
AND p.property_property_housing_id = '1'
-- AND p.property_loc_community_id = '36'
-- AND p.property_bedroom_id = '2'
-- AND p.property_price >= '50000' AND p.property_price <= '150000'
-- AND lct.loc_city_id = '1'
-- AND p.property_loc_subcommunity_id IN(7,8,12)
ORDER BY
p.property_featured DESC
, p.property_ranking_date DESC
, p.property_ranking_total DESC
LIMIT 0, 15
Query Profile
The resultset always outputs 15 rows. But the table property and property_attribute has around 500k rows.
Thanks all,
Armando Miani
This really seems to be an odditity in EXPLAIN in this case. This doesn't occur on MySQL 4.x, but it does on MySQL 5.x.
What MySQL is really showing you is that MySQL is trying to use the forced index ix_order_by_perf for sorting the rows, and it's showing you 15 rows because you have LIMIT 15.
However, the WHERE clause is still scanning all 500K rows since it can't utilize an index for the criteria in your WHERE clause. If it were able to use the index for finding the required rows, you would see the forced index listed in the 'possible_keys' field.
You can prove this by keeping the FORCE INDEX clause and removing the ORDER BY clause. You'll see that MySQL now won't use any indexes, even the one you're forcing (because the index doesn't work for this purpose).
Try adding property_property_type_id, property_property_type_sale_id, property_property_housing_id, and any other columns that you refer to in your WHERE clause to the beginning of the index.
There's a moment when your query will be optimized around a model which might not be anymore valid for a given need.
A plan could be great but, even if the filters you are using in the where clause respect indexes definitions, it doesn't mean the parser doesn't parse may rows.
What you have to analize is how determinating are your indexes. For instance, if there's an index on "name, family name" in a "person" table, the performances are going to be poor if everybody has the same name and family name. The index is a real trap pulling down performances when it doesn't manage to be enough describing a certain segment of your datasets.
Based on the output of your explain the query, here are my initial thoughts:
This portion of your query (rewritten to excluded the unneeded parentheses):
p.property_published = 1
AND p.property_property_type_id = '1'
AND p.property_property_type_sale_id = '1'
AND p.property_property_housing_id = '1'
Put conditions so many conditions on the property table that it's unlikely any index you have can be used. Unless you have a single index that has all four of those attributes in it, you're forcing a full table scan on the query just to find the rows that meet those conditions (though, it's possible if you have an index on one of the attributes it could use that).
First, I'd add the following index (have not checked this for syntax errors):
CREATE INDEX property_published_type_sale_housing_idx
ON property (property_published,
property_property_type_id,
property_property_type_sale_id,
property_property_housing_id );
Then I'd re-run your EXPLAIN to see if you hit the index now. (Take off the FORCE INDEX on that part of the query).
Also, even given this issue, it's possible the slow down may be memory related. That is, you may have enough memory to process the table with a smaller number of rows, but it may be that when the table gets larger MySQL can't process the entire query in memory and is forced to start using disk to get the entire query handled. This would explain why there's a sudden drop off in performance.
If that's the case, then two things might help:
Adding more memory (and tune the mysql config file to take advantage of it) so that the number of rows that can br processed at once is larger. This is at best a temporary solution.
Tune the indexes (like I'm saying above) so that the number of rows that mysql needs to process is lower. If it can be more precise in picking the rows it selects for processing.
except a good plan you need to have enough resources to run query.
check buffers size and another critical parameters in your config.
And your query is?

MySQL matching query performance on large dataset

Does anyone have experience with query of the form
select * from TableX where columnY = 'Z' limit some_limits
For example:
select * from Topics where category_id = 100
Here columnY is indexed, but it's not the primary key. ColumnY = Z could return an unpredictable number of rows (from zero to a few thousands).
I only wonder the case of quite large dataset, for example, more than 10 millions items in TableX. What is the performance of such query?
A little detail about the performance should be nice (I mean specific big-O analysis, for example).
It depends upon the records found. If your query return a large number of records it may take time to load the browswer. And even larger return could make your browser unresponsive. But this is how you execute the query. The better solution for such problems could be limiting the query as you did with relevant limits. Further more instead of limiting manually you may use limit with loop till certain index and again start from following index in the case of programming. I am answering with the context of programming. Hope this answers your question

Benchmark function in Mysql ( Incredible results )

I have 2 tables:
author with 3 millions of rows.
book with 20 miles rows.
.
So I have benchmarked this query with a join:
SELECT BENCHMARK(100000000, 'SELECT book.title, author.name
FROM `book` , `author` WHERE book.id = author.book_id ')
And this is the result:
Query took 0.7438 sec
ONLY 0.7438 seconds for 100 millions of query with a join ???
Do I make some mistakes or this is the right result ?
Your result smells wrong, I've just run checked the documentation and run some benchmarks of my own. You're not actually benchmarking anything.
BENCHMARK() is for testing scalar expressions, it's not for testing query runtimes. The query isn't actually being executed. In my own testing of queries, the duration took was not at all related to the complexity of the query, only to the amount of trials to be run.
Take a look at http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_benchmark
A few quotes from the doc:
"BENCHMARK() is intended for measuring the runtime performance of scalar expressions,"
"Only scalar expressions can be used. Although the expression can be a subquery, it must return a single column and at most a single row. For example, BENCHMARK(10, (SELECT * FROM t)) will fail if the table t has more than one column or more than one row."
You're not actually measuring anything, outside of at absolute most the query planners time.
If you want to run benchmarks, it's probably worth doing it from application code (and possible with a no cache directive depending on how write heavy your prod environment will be.). Doing it from application code will also figure in the time to hydrate the data, plus the cost of sending the data across the wire etc.

How do I optimize MySQL's queries with constants?

NOTE: the original question is moot but scan to the bottom for something relevant.
I have a query I want to optimize that looks something like this:
select cols from tbl where col = "some run time value" limit 1;
I want to know what keys are being used but whatever I pass to explain, it is able to optimize the where clause to nothing ("Impossible WHERE noticed...") because I fed it a constant.
Is there a way to tell mysql to not do constant optimizations in explain?
Am I missing something?
Is there a better way to get the info I need?
Edit: EXPLAIN seems to be giving me the query plan that will result from constant values. As the query is part of a stored procedure (and IIRC query plans in spocs are generated before they are called) this does me no good because the value are not constant. What I want is to find out what query plan the optimizer will generate when it doesn't known what the actual value will be.
Am I missing soemthing?
Edit2: Asking around elsewhere, it seems that MySQL always regenerates query plans unless you go out of your way to make it re-use them. Even in stored procedures. From this it would seem that my question is moot.
However that doesn't make what I really wanted to know moot: How do you optimize a query that contains values that are constant within any specific query but where I, the programmer, don't known in advance what value will be used? -- For example say my client side code is generating a query with a number in it's where clause. Some times the number will result in an impossible where clause other times it won't. How can I use explain to examine how well optimized the query is?
The best approach I'm seeing right off the bat would be to run EXPLAIN on it for the full matrix of exist/non-exist cases. Really that isn't a very good solution as it would be both hard and error prone to do by hand.
You are getting "Impossible WHERE noticed" because the value you specified is not in the column, not just because it is a constant. You could either 1) use a value that exists in the column or 2) just say col = col:
explain select cols from tbl where col = col;
For example say my client side code is generating a query with a number in it's where clause.
Some times the number will result in an impossible where clause other times it won't.
How can I use explain to examine how well optimized the query is?
MySQL builds different query plans for different values of bound parameters.
In this article you can read the list of when does the MySQL optimizer does what:
Action When
Query parse PREPARE
Negation elimination PREPARE
Subquery re-writes PREPARE
Nested JOIN simplification First EXECUTE
OUTER->INNER JOIN conversions First EXECUTE
Partition pruning Every EXECUTE
COUNT/MIN/MAX elimination Every EXECUTE
Constant subexpression removal Every EXECUTE
Equality propagation Every EXECUTE
Constant table detection Every EXECUTE
ref access analysis Every EXECUTE
range/index_merge analysis and optimization Every EXECUTE
Join optimization Every EXECUTE
There is one more thing missing in this list.
MySQL can rebuild a query plan on every JOIN iteration: a such called range checking for each record.
If you have a composite index on a table:
CREATE INDEX ix_table2_col1_col2 ON table2 (col1, col2)
and a query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 = t1.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, MySQL will NOT use an index RANGE access from (t1.value1, t1.value2_lowerbound) to (t1.value1, t1.value2_upperbound). Instead, it will use an index REF access on (t1.value) and just filter out the wrong values.
But if you rewrite the query like this:
SELECT *
FROM table1 t1
JOIN table2 t2
ON t2.col1 <= t1.value1
AND t2.col1 >= t2.value1
AND t2.col2 BETWEEN t1.value2_lowerbound AND t2.value2_upperbound
, then MySQL will recheck index RANGE access for each record from table1, and decide whether to use RANGE access on the fly.
You can read about it in these articles in my blog:
Selecting timestamps for a time zone - how to use coarse filtering to filter out timestamps without a timezone
Emulating SKIP SCAN - how to emulate SKIP SCAN access method in MySQL
Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE - how to emulate Oracle's analytic functions in MySQL
Advanced row sampling - how to select N records from each group in MySQL
All these things employ RANGE CHECKING FOR EACH RECORD
Returning to your question: there is no way to tell which plan will MySQL use for every given constant, since there is no plan before the constant is given.
Unfortunately, there is no way to force MySQL to use one query plan for every value of a bound parameter.
You can control the JOIN order and INDEX'es being chosen by using STRAIGHT_JOIN and FORCE INDEX clauses, but they will not force a certain access path on an index or forbid the IMPOSSIBLE WHERE.
On the other hand, for all JOIN's, MySQL employs only NESTED LOOPS. That means that if you build right JOIN order or choose right indexes, MySQL will probably benefit from all IMPOSSIBLE WHERE's.
How do you optimize a query with values that are constant only to the query but where I, the programmer, don't known in advance what value will be used?
By using indexes on the specific columns (or even on combination of columns if you always query the given columns together). If you have indexes, the query planner will potentially use them.
Regarding "impossible" values: the query planner can conclude that a given value is not in the table from several sources:
if there is an index on the particular column, it can observe that the particular value is large or smaller than any value in the index (min/max values take constant time to extract from indexes)
if you are passing in the wrong type (if you are asking for a numeric column to be equal with a text)
PS. In general, creation of the query plan is not expensive and it is better to re-create than to re-use them, since the conditions might have changed since the query plan was generated and a better query plan might exists.