Memory engine table giving inconsistent results - mysql

Using MariaDB 10.6.7-MariaDB-2ubuntu1.1-log - Ubuntu 22.04
I have a relatively complex application that is making use of temporary memory (engine) tables. The tables are generated from select insert statements and then the data is manipulated. After which a series of selects are performed with one or more rejoins to the same table.
The tables have 8 indexes created with the table create statement some are hash and some are btree.
Doing exactly the same process using exactly the same data I am getting slightly different results (in terms of data return and the number of rows).
It's taken me a while to get to the root of this, I have figured out that the temp memory tables are identical in terms of data and the database calls are the same.
I have let the tables be created as permanent memory tables so I can see them in myphpadmin, rename them, and let them be created again. Then run the same query against each table, and I get different results. The table checksums are identical, row count is identical. but the same query gives different results when tables as memory engine. Convert them both to INNODB and both give the same result... is the memory engine broken?
Has anyone ever seen this before and got any ideas about what might be going on?
Thanks in advance.

I've found the answer, in case anyone is interested or has this same issue.
In my query, I have an order by and limit 1
example.
SELECT * FROM `tmp_lookup` T2
WHERE
T2.`from_location` = 'location1'
T2.`to_location` = 'location2'
T2.`depart_time` > '1970-01-01 22:14:00'
ORDER BY T2.`depart_time` ASC
LIMIT 1;
In the above example, when ordering by depart_time and limiting to 1, if there is more than one row with the same smallest depart_time it will return a random 1st row from the ones with the same value, depending on the table.
If you build a memory table and do the query it will consistently pick the same first row, delete the table, remake the table, rerun the query and it will consistently pick another first row.
I have no firm idea why, but I guess as it's in memory and not on disk the order of the rows in the table is 'random' utter guess.

Related

Whether or not SQL query (SELECT) continues or stops reading data from table when find the value

Greeting,
My question; Whether or no sql query (SELECT) continues or stops reading data (records) from table when find the value that I was looking for?
referance: "In order to return data for this query, mysql must start at the beginning of the disk data file, read in enough of the record to know where the category field data starts (because long_text is variable length), read this value, see if it satisfies the where condition (and so decide whether to add to the return record set), then figure out where the next record set is, then repeat."
link for referance: http://www.verynoisy.com/sql-indexing-dummies/#how_the_database_finds_records_normally
In general you don't know and you don't care, but you have to adapt when queries take too long to execute. When you do something like
select a,b,c from mytable where a=3 and b=5
then the database engine has a couple of options to optimize. When all these options fail, then it will do a "full table scan" - which means, it will have to examine the entire table to see which rows are eligible. When you have indices on e.g. column a then the database engine can optimize the search because it can pre-select rows where a has value 3. So, in general, make sure that you have indices for the columns that are most searched. (Perversely, some database engines get confused when you have too many indices and will fall back to a full table scan because they've lost their way...)
As to whether or not the scanning stops: In general, the database engine has to examine all data in the table (hopefully aided by indices) and won't stop after having found just one hit. If you want just the first hit, use a limit 1 clause to make sure that your result set has only one outcome. But then again, if you have a sort by clause, the database engine cannot stop after the first hit, there might be next ones that should get priority given the sorting.
Summarizing, how the db engine does its scan depends on how smart it is, what indices are available etc.. If your select queries take too long then consider re-organizing your indices, writing your select statements differently, or rebuilding the table.
The RDBMS reading data from disk is something you cannot know, you should not care and you must not rely on.
The issue is too broad to get a precise answer. The engine reads data from storage in blocks, a block can contain records that are not needed by the query at hand. If all the columns needed by the query is available in an index, the RDBMS won't even read the data file, it will only use the index. The data it needs could already be cached in memory (because it was read during the execution of a previous query). The underlying OS and the storage media also keep their own caches.
On a busy system, all these factors could lead to very different storage access patterns while running the same query several times on a couple of minutes apart.
Yes it scans the entire file. Unless you put something like
select * from user where id=100 limit 1
This of course will still search entire rows if id 100 is the last record.
If id is a primary key it will automatically be indexed and searching would be optimized
I'm sorry... I thought the table.
I will change question and I will explain it in the following image;
I understand that in CASE 1 all columns must be read with each iteration.
My question is: If it's the same in the CASE 2 or columns that are not selected in the query are excluded from reading in each iteration.
Also, are the both queries are the some in performance perspective?
Clarify:
CASE: 1 In first CASE select print all data
CASE: 2 In second CASE select print columns first_name and last_name
Whether in CASE 2 mysql server (SQL query) reads only columns first_name, last_name or read the entire table to get that data(rows)=(first_name, last_name)?
An interest of me how the server reads table row in CASE 1 and CASE 2?

Does a mysql query write temp tables to disk and what exactly is it writing?

I'm working on optimizing a mysql query that joins 2 tables together and has a few where clauses and an order by.
I noticed (using explain) that a temporary table is created during the evaluation of the query. (since I'm grouping on a field in a table that isn't the first table in the join queue)
I'd really like to know if this temp table is being written to disk or not, which the explain results don't tell me.
It would also be nice to be able to tell what exactly is going into said temporary table. Some of the restrictions in my where clause are on indexed columns and some aren't, so I think that mysql might not be optimally picking rows into the temporary table.
Specifically, my query is basically of the form: select ... from a join b where ... with restrictions on both a and b on both indexed and non-indexed columns. The problem is that the number of rows going into the temp table selected from a is more than I suspect it should be. I want to investigate this.
All databases use a memory area or work area to execute a query and will use temp tables in those memory areas depending on how you built your query. If your joining multiple tables it may use more than one to build the final result set. Those temp tables usually exist in memory as long as the user is logged on.
Explain is illustrating the process it is trying to optimize as it interprets your SQL. If you have a poorly indexed where clause or if you are using a where clause in a join it could be pulling an excessive amount of data into memory as it executes and builds your final result set. This is what poor performance at the DB level looks like.
By reading your pseudo code in the last paragraph I would say you need some indexing and to rewrite your Where clause to join on indexed fields. Post your SQL if you really want an opinion.

MySQL Performance

We have a data warehouse with denormalized tables ranging from 500K to 6+ million rows. I am developing a reporting solution, so we are utilizing database paging for performance reasons. Our reports have search criteria and we have created the necessary indexes, however, performance is poor when dealing with the million(s) row tables. The client is set on always knowing the total records, so I have to fetch the data as well as the record count.
Are there any other things I can do to help with performance? I'm not the MySQL dba and he has not really offered anything up, so I'm not sure what he can do configuration wise.
Thanks!
You should use "Partitioning"
It's main goal is to reduce the amount of data read for particular SQL operations so that overall response time is reduced.
Refer:
http://dev.mysql.com/tech-resources/articles/performance-partitioning.html
If you partition the large tables and store the parts on different servers, than your query will run faster.
see: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Also note that using NDB tables you can use HASH keys that get looked up in O(1) time.
For the number of lines you can keep a running total in a separate table and update that. For example in a after insert and after delete trigger.
Although the trigger will slow down deletes/inserts this will be spread over time. Note that you don't have to keep all totals in one row, you can store totals per condition. Something like:
table field condition row_count
----------------------------------------
table1 field1 cond_x 10
table1 field1 cond_y 20
select sum(row_count) as count_cond_xy
from totals where field = field1 and `table` = table1
and condition like 'cond_%';
//just a silly example you can come up with more efficient code, but I hope
//you get the gist of it.
If you find yourself always counting along the same conditions, this can speed your redesigned select count(x) from bigtable where ... up from minutes to instantly.

Can we limit the number of rows in a table in MySQL?

Is it possible to set the number of rows that a table can accommodate in MySQL ?
I don't want to use any java code. I want to do this using pure mysql scripts.
I wouldn't recommend trying to limit the number of rows in a SQL table, unless you had a very good reason to do so. It seems you would be better off using a query like:
select top 1000 entityID, entityName from TableName
rather than physically limiting the rows of the table.
However, if you really want to limit it to 1000 rows:
delete from TableName where entityID not in (select top 1000 entityID from TableName)
Mysql supports a MAX_ROWS parameter when creating (and maybe altering?) a table. http://dev.mysql.com/doc/refman/5.0/en/create-table.html
Edit: Sadly it turns out this is only a hint for optimization
"The maximum number of rows you plan to store in the table. This is not a hard limit, but rather a hint to the storage engine that the table must be able to store at least this many rows."
.. Your question implied that scripts are ok; is it ridiculous to make one as simple as a cron job regularly dropping table rows above a given ID ? It's not nearly as elegant as it would've been to have mysql throw errors when something tries to add a row too many, but it would do the job - and you may be able to have your application also then check if it's ID is too high, and throw a warning to the user/relevant party.

Is there a way to get rows_examined in MySQL without the slow log?

I'm building some profile information for a home grown app. I'd like the debug page to show the query sent along with how many rows were examined without assuming that slow_log is turned on, let alone parsing it.
Back in 2006, what I wanted was not possible. Is that still true today?
I see Peter Zaitsev has a technique where you:
Run FLUSH STATUS;
Run the query.
Run SHOW STATUS LIKE "Handler%";
and then in the output:
Handler_read_next=42250 means 42250 rows were analyzed during this scan
which sounds like if MySQL is only examining indexes, it should give you the number. But are there a set of status vars you can poll, add up and find out how many rows examined? Any other ideas?
It's slightly better than it was in 2006. You can issue SHOW SESSION STATUS before and after and then look at each of the Handler_read_* counts in order to be able to tell the number of rows examined.
There's really no other way.. While the server protocol has a flag to say if a table scan occurred, it doesn't expose rows_examined. Even tools like MySQL's Query Analyzer have to work by running SHOW SESSION STATUS before/after (although I think it only runs SHOW SESSION STATUS after, since it remembers the previous values).
I know it's not related to your original question, but there are other expensive components to queries besides rows_examined. If you choose to do this via the slow log, you should check out this patch:
http://www.percona.com/docs/wiki/patches:microslow_innodb#changes_to_the_log_format
I can recommend looking for "Disk_tmp_table: Yes" and "Disk_filesort: Yes".
Starting in 5.6.3, the MySQL performance_schema database also exposes statements statistics, in tables such as performance_schema.events_statements_current.
The statistics collected by statements include the 'ROWS_EXAMINED' column.
See
http://dev.mysql.com/doc/refman/5.6/en/events-statements-current-table.html
From there, statistics are aggregated to provide summaries.
See
http://dev.mysql.com/doc/refman/5.6/en/statement-summary-tables.html
From documentation:
Handler_read_rnd
The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that don't use keys properly.
Handler_read_rnd_next
The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
read_rnd* means reading actual table rows with a fullscan.
Note that it will show nothing if there is a index scan combined with a row lookup, it still counts as key read.
For the schema like this:
CREATE TABLE mytable (id INT NOT NULL PRIMARY KEY, data VARCHAR(50) NOT NULL)
INSERT
INTO mytable
VALUES …
SELECT id
FROM mytable
WHERE id BETWEEN 100 AND 200
SELECT *
FROM mytable
WHERE id BETWEEN 100 AND 200
, the latter two queries will both return 1 in read_key, 101 in read_next and 0 in both read_rnd and read_rnd_next, despite the fact that actual row lookups occur in the second query.
Prepend the query with EXPLAIN. In MySQL that will show the query's execution path, which tables were examined as well as the number of rows examined for each table.
Here's the documentation.