Is there a way to get rows_examined in MySQL without the slow log? - mysql

I'm building some profile information for a home grown app. I'd like the debug page to show the query sent along with how many rows were examined without assuming that slow_log is turned on, let alone parsing it.
Back in 2006, what I wanted was not possible. Is that still true today?
I see Peter Zaitsev has a technique where you:
Run FLUSH STATUS;
Run the query.
Run SHOW STATUS LIKE "Handler%";
and then in the output:
Handler_read_next=42250 means 42250 rows were analyzed during this scan
which sounds like if MySQL is only examining indexes, it should give you the number. But are there a set of status vars you can poll, add up and find out how many rows examined? Any other ideas?

It's slightly better than it was in 2006. You can issue SHOW SESSION STATUS before and after and then look at each of the Handler_read_* counts in order to be able to tell the number of rows examined.
There's really no other way.. While the server protocol has a flag to say if a table scan occurred, it doesn't expose rows_examined. Even tools like MySQL's Query Analyzer have to work by running SHOW SESSION STATUS before/after (although I think it only runs SHOW SESSION STATUS after, since it remembers the previous values).
I know it's not related to your original question, but there are other expensive components to queries besides rows_examined. If you choose to do this via the slow log, you should check out this patch:
http://www.percona.com/docs/wiki/patches:microslow_innodb#changes_to_the_log_format
I can recommend looking for "Disk_tmp_table: Yes" and "Disk_filesort: Yes".

Starting in 5.6.3, the MySQL performance_schema database also exposes statements statistics, in tables such as performance_schema.events_statements_current.
The statistics collected by statements include the 'ROWS_EXAMINED' column.
See
http://dev.mysql.com/doc/refman/5.6/en/events-statements-current-table.html
From there, statistics are aggregated to provide summaries.
See
http://dev.mysql.com/doc/refman/5.6/en/statement-summary-tables.html

From documentation:
Handler_read_rnd
The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that don't use keys properly.
Handler_read_rnd_next
The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
read_rnd* means reading actual table rows with a fullscan.
Note that it will show nothing if there is a index scan combined with a row lookup, it still counts as key read.
For the schema like this:
CREATE TABLE mytable (id INT NOT NULL PRIMARY KEY, data VARCHAR(50) NOT NULL)
INSERT
INTO mytable
VALUES …
SELECT id
FROM mytable
WHERE id BETWEEN 100 AND 200
SELECT *
FROM mytable
WHERE id BETWEEN 100 AND 200
, the latter two queries will both return 1 in read_key, 101 in read_next and 0 in both read_rnd and read_rnd_next, despite the fact that actual row lookups occur in the second query.

Prepend the query with EXPLAIN. In MySQL that will show the query's execution path, which tables were examined as well as the number of rows examined for each table.
Here's the documentation.

Related

Memory engine table giving inconsistent results

Using MariaDB 10.6.7-MariaDB-2ubuntu1.1-log - Ubuntu 22.04
I have a relatively complex application that is making use of temporary memory (engine) tables. The tables are generated from select insert statements and then the data is manipulated. After which a series of selects are performed with one or more rejoins to the same table.
The tables have 8 indexes created with the table create statement some are hash and some are btree.
Doing exactly the same process using exactly the same data I am getting slightly different results (in terms of data return and the number of rows).
It's taken me a while to get to the root of this, I have figured out that the temp memory tables are identical in terms of data and the database calls are the same.
I have let the tables be created as permanent memory tables so I can see them in myphpadmin, rename them, and let them be created again. Then run the same query against each table, and I get different results. The table checksums are identical, row count is identical. but the same query gives different results when tables as memory engine. Convert them both to INNODB and both give the same result... is the memory engine broken?
Has anyone ever seen this before and got any ideas about what might be going on?
Thanks in advance.
I've found the answer, in case anyone is interested or has this same issue.
In my query, I have an order by and limit 1
example.
SELECT * FROM `tmp_lookup` T2
WHERE
T2.`from_location` = 'location1'
T2.`to_location` = 'location2'
T2.`depart_time` > '1970-01-01 22:14:00'
ORDER BY T2.`depart_time` ASC
LIMIT 1;
In the above example, when ordering by depart_time and limiting to 1, if there is more than one row with the same smallest depart_time it will return a random 1st row from the ones with the same value, depending on the table.
If you build a memory table and do the query it will consistently pick the same first row, delete the table, remake the table, rerun the query and it will consistently pick another first row.
I have no firm idea why, but I guess as it's in memory and not on disk the order of the rows in the table is 'random' utter guess.

Optimised way to store large key value kind of data

I am working on a database that has a table user having columns user_id and user_service_id. My application needs to fetch all the users whose user_service_id is a particular value. Normally I would add an index to the user_service_id column and run a query like this :
select user_id from user where user_service_id = 2;
Since the cardinality of the column user_service_id is very less than around 3-4 and the table has around 10M entries, the query will end up scanning almost the entire table.
I was wondering what is the recommendation for such usecases. Also, would it make more sense to move the data to another nosql datastore as this doesn't seem to be an efficient usecase for MySQL or any SQL datastore? Tried to search this but couldn't find any recommendations here. Can someone please help or provide the necessary references?
Thanks in advance.
That query needs this index, which is both "composite" and "covering":
INDEX(user_service_id, user_id) -- in this order
But what will you do with the millions of rows that you get? Sounds like it will choke the client, whether it comes fast or slow.
See my Index Cookbook
"very dynamic" -- Not a problem.
"cache" -- the dynamic nature defeats caching.
"cardinality" -- not important, except to point out that there will be millions of rows.
"millions of rows" -- that takes time to deliver to the client. The number of rows delivered is the biggest factor in cost.
"select entire table, then filter in client" -- That will be even slower! (See "millions of rows".)

Improve performance on MySQL fulltext search query

I have a following MySQL query:
SELECT p.*, MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') AS score
FROM posts p
WHERE p.post_id <> 23
AND MATCH (p.description) AGAINST ('random text that you can use in sample web pages or typography samples') > 0
ORDER BY score DESC LIMIT 1
With 108,000 rows, it takes ~200ms. With 265,000 rows, it takes ~500ms.
Under performance testing(~80 concurrent users) it shows ~18sec average latency.
Is any way to improve performance for this query ?
EXPLAIN OUTPUT:
UPDATED
We have added one new mirror MyISAM table with post_id, description and synchronized it with posts table via triggers. Now, fulltext search on this new MyISAM table works ~400ms(with the same performance load where InnoDB shows ~18sec.. this is a huge performance boost) Look like MyISAM is much more quicker for fulltext in MySQL than InnoDB. Could you please explain it ?
MySQL profiler results:
Tested on AWS RDS db.t2.small instance
Original InnoDB posts table:
MyISAM mirror table with post_id, description only:
Here are a few tips what to look for in order to maximise the speed of such queries with InnoDB:
Avoid redundant sorting. Since InnoDB already sorted the result according to ranking. MySQL Query Processing layer does not need to
sort to get top matching results.
Avoid row by row fetching to get the matching count. InnoDB provides all the matching records. All those not in the result list
should all have ranking of 0, and no need to be retrieved. And InnoDB
has a count of total matching records on hand. No need to recount.
Covered index scan. InnoDB results always contains the matching records' Document ID and their ranking. So if only the Document ID and
ranking is needed, there is no need to go to user table to fetch the
record itself.
Narrow the search result early, reduce the user table access. If the user wants to get top N matching records, we do not need to fetch
all matching records from user table. We should be able to first
select TOP N matching DOC IDs, and then only fetch corresponding
records with these Doc IDs.
I don't think you cannot get that much faster looking only at the query itself, maybe try removing the ORDER BY part to avoid unnecessary sorting. To dig deeper into this, maybe profile the query using MySQLs inbuild profiler.
Other than that, you might look into the configuration of your MySQL server. Have a look at this chapter of the MySQL manual, it contains some good informations on how to tune the fulltext index to your needs.
If you've already maximized the capabilities of your MySQL server configuration, then consider looking at the hardware itself - sometimes even a lost cost solution like moving the tables to another, faster hard drive can work wonders.
My best guess for the performance hit is the number of rows being returned by the query. To test this, simply remove the order by score and see if that improves the performance.
If it does not, then the issue is the full text index. If it does, then the issue is the order by. If so, the problem becomes a bit more difficult. Some ideas:
Determine a hardware solution to speed up the sorts (getting the intermediate files to be in memory).
Modifying the query so it returns fewer values. This might involve changing the stop-word list, changing the query to boolean mode, or other ideas.
Finding another way of pre-filtering the results.
The issue here is WHERE p.post_id <> 23
Design your system in such a way so that non-indexed columns — like post_id — need not be added to the WHERE clause.
Basically MySQL will search for the full-text indexed column and then filter the post_id. Hence, if there are a lot of matches returned by the full text search, the response time will not be as expected.

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

MySQL Query Caching (2)

This is not a problem but it belongs to site optimization. I have 110K records of hotels. When I use SELECT something query it will pulled out data from 110k records.
If I search a hotel list with more than 3 star rating, price between 100 - 300 $ and within Mexico City. Suppose I got 45 matching results.
Is there any other way when I add more refinement, it will pulled out data from just only the 45 matching and not go with the 110K data?
The key is indexes my friend... make sure you have indexes of all items used in the WHERE and this will reduce cardinality when selecting...
On a side not... 110k rows is still an extremely small data set for MySQL so shouldn't pose much of a performance issue if you haven't got correct indexing on the table anyway.
It is more depend on how often your data updates.
See.
The MySQL Query Cache
Query Caching in MySQL
Caching question MySQL or Filesystem
I am saying that is there any other way when I add more refinement, it
will pulled out data from just only the 45 matching and not go with
the 110K data.
Then make view of those 45 rows and apply query to it.
Create a view using query
Create view refined as select * from ....
And after that add more select queries to that view
like
Select * from refined where ...
Firs of all, i tend to agree with Brian, indexes matter.
Check what kind(s) of queries are most frequent, and construct multi-column indexes on the table accordingly. Note that the order of columns in the indexes does matter (as the index is a tree, first column appears in tree root, so if your query does not use that column - the whole tree is useless).
Enable slow query log to see what queries actually take long (if any), or not use indexes, so you can improve indexes over time.
Having said this, query cache is a real performance boost, if your table data is mostly read. Here is a useful article on mysql query cache.