Database table size and mysql query execution time - mysql

I have a database table which has about 500000 rows. When I use mysql select query the execution time is quite long, about 0.4 seconds. Same query from a smaller table takes about 0.0004 seconds.
Is there any solutions to make this query faster?

Most important thing: Use an index, suitable for your where-clause.
0.1) Use an index, that covers not only the where clause, but also all selected columns. This way the result can be returned by only using the index and not loading the data from the actual rows indentifed by the index.
If that is not enough you can even use an index that contains all rows that need to be returned by your query. So the query can look up everything from the index and does not have to load the actual rows.
Reduce the number of returned columns to the columns you really need. Don't select all columns if you are not using every one of them.
Use data types appropriate to the stored data, and choose the smalles data types possible. E.g. when you have to store a number that will never exceed 100 you can use a TINYINT that only consumes 1 byte instead of a BIGINT that will use 8 byte in every row (integer types).

Related

How to improve the performance of the Mysql Query

I have a Mysql Query:
select stu_photo, type, sign, type1, fname,regno1
from stadm
where regno1 = XXXXXX
LIMIT 1000;
stadm table has 67063 rows. Execution time period of above query is 5-6Mints.
am unable to add index for stu_photo and sign column (there datatype is blob & longblob) Table_Enginee is Innodb. How can i increasing the performance (i.e., To reduce the execution time period)?
One improvement I can see for your query would be to add an index on the regno1 column. This would potentially speed up the WHERE clause.
am unable to add index for stu_photo and sign column
These columns should not impact the performance of the query you showed us.
Another factor influencing the performance of the query is the time it takes to send the result set back to your console or application. If each record be very large, then things may appear slow, even for only 1000 records.
create a new column md5_regno1 from type varchar and store the md5 from regno1 in it. then you can create a index on the new column search like this:
select stu_photo, type, sign, type1, fname,regno1
from stadm
where md5_regno1 = MD%('XXXXXX')
LIMIT 1000;

How to optimize this MySQL query? (moving window)

I have a huge table (400k+ rows), where each row describes an event in the FX market. The table's primary key is an integer named 'pTime' - it is the time at which the event occurred in POSIX time.
My database is queried repeatedly by my computer during a simulation that I constantly run. During this simulation, I pass an input pTime (I call it qTime) to a MySQL procedure. qTime is a query point from that same huge table. Using qTime, my procedure filters the table according to the following rule:
Select only those rows whose pTime is a maximum 2 hours away from the input qTime on any day.
ex.
query point: `2001-01-01 07:00`
lower limit: `ANY-ANY-ANY 05:00`
upper limit: `ANY-ANY-ANY 09:00`
After this query the query point will shift by 1 row (5 minutes), and a new query will be initiated:
query point: `2001-01-01 07:05`
lower limit: `ANY-ANY-ANY 05:05`
upper limit: `ANY-ANY-ANY 09:05`
This is the way I accomplish that:
SELECT * FROM mergetbl WHERE
TIME_TO_SEC(TIMEDIFF(FROM_UNIXTIME(pTime,"%H:%i"),FROM_UNIXTIME(qTime,"%H:%i")))/3600
BETWEEN -2 AND 2
Although I have an index on pTime, this piece of code significantly slows down my software.
I would like to pre-process this statement for each value of pTime (which will later serve as an input qTime), but I cannot figure out a way to do this.
You query still needs to scan every value because of how you are testing the time within certain ranges that are not spanning of the index.
You would need to separate your time into a different field and index to gain the benefit of an index here.
(note: answer was edited to fix my original misunderstanding of the question)
If you rely only on time - I'd suggest you to add another column of time type with time fraction of pTime and perform queries over it
DATETIME is the wrong type in this case because no system of DATETIME storage I know of will be able to use an index if you're examining only the TIME part of the value. The easy optimization is, as others have said, to store the time separately in a field of datatype TIME (or perhaps some kind of integer offset) and index that.
If you really want the two pieces of information in the same column you'll have to roll your own data format, giving primacy to the time type. You could use a string type in the format HH:MM:SS YYYY-MM-DD or you could use a NUMERIC field in which the whole number part is a seconds-from-midnight offset and the decimal part a days-from-reference-date offset.
Also, consider how much value the index will be. If your range is four hours, assuming equal distribution during the day, this index will return 17% of your database. While that will produce some benefit, if you're doing any other filtering I would try to work that into your index as well.

How does mysql order by implemented internally?

How does Mysql order by implemented internally? would ordering by multiple columns involve scanning the data set multiple times once for each column specified in the order by constraint?
Here's the description:
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Unless you have out-of-row columns (BLOB or TEXT) or your SELECT list is too large, this algorithm is used:
Read the rows that match the WHERE clause.
For each row, record a tuple of values consisting of the sort key value and row position, and also the columns required for the query.
Sort the tuples by sort key value
Retrieve the rows in sorted order, but read the required columns directly from the sorted tuples rather than by accessing the table a second time.
Ordering by multiple columns does not require scanning the dataset twice, since all data required for sorting can be fetched in a single read.
Note that MySQL can avoid the order completely and just read the values in order, if you have an index with the leftmost part matching your ORDER BY condition.
MySQL is canny. Its sorting algorithm depends on a couple of factors -
Available Indexes
Expected size of result
MySQL version
MySQL has two methods to produce sorted/ordered streams of data.
1. Smart use of Indexes
Firstly MySQL optimiser analyses the query and figures out if it can just take advantage of sorted indexes available. If yes, it naturally returns records in index order. (The exception is NDB engine, which needs to perform a merge sort once it gets data from all storage nodes)
Hands down to the MySQL optimiser, who smartly figures out if the index access method is cheaper than other access methods.
Really interesting thing to see here
The index may also be used even if ORDER BY doesn’t match the index exactly, as long as other columns in ORDER BY are constants
Sometimes, the optimizer probably may not use Index if it finds indexes expensive as compared to scanning through the table.
2. Filesort Algorithm
If Indexes can not be used to satisfy an ORDER BY clause, MySQL utilises filesort algorithm. This is a really interesting algorithm. In a nutshell, It works like
It scans through the table and finds the rows which matches the WHERE condition
It maintains a buffer and stores a couple of values (sort key value, row pointer and columns required in the query) from each row in it. The size of this chunk is system variable sort_buffer_size.
When, buffer is full, it runs a quick sort on it based on the sort key and stores it safely to the temp file on disk and remembers a pointer to it
It will repeat the same step on chunks of data until there are no more rows left
Now, it has a couple of chunks which are sorted
Finally, it applies merge sort on all sorted chunks and puts it in one result file
In the end, it will fetch the rows from the sorted result file
If the expected result fits in one chunk, the data never hits disk, but remains in RAM.
For a detailed Info - https://www.pankajtanwar.in/blog/what-is-the-sorting-algorithm-behind-order-by-query-in-mysql

Simple MySQL output questions

I have a 2 row table pertaining of a number, and that numbers cube. Right now, I have about 13 million numbers inserted, and that's growing very, very quickly.
Is there a faster way to output simple tables quicker than using a command like SELECT * FROM table?
My second question pertains to the selection of a range of numbers. As stated above, I have a large database growing extremely fast to hold numbers and their cubes. If you're wondering, I'm trying to find the 3 numbers that will sum up to 33 when cubed. So, I'm doing this by using a server/client program to send a range of numbers to a client so they can do the equations on said range of numbers.
So, for example, let's say that the first client chimes in. I give him a range of 0, 100. He than goes off to compute the numbers and report back to tell the server if he found the triplet. If he didn't the loop will just continue.
When the client is doing the calculations for the numbers by itself, it goes extremely slow. So, I have decided to use a database to store the cubed numbers so the client does not have to do the calculations. The problem is, I don't know how to access only a range of numbers. For example, if the client had the range 0-100, it would need to access the cubes of all numbers from 0-100.
What is the select command that will return a range of numbers?
The engine I am using for the table is MyISAM.
If your table "mytable" has two columns
number cube
0 0
1 1
2 8
3 27
the query command will be (Assuming the start of the range is 100 and the end is 200):
select number, cube from mytable where number between 100 and 200 order by number;
If you want this query to be as fast as possible, make sure of the following:
number is an index. Thus you don't need to do a table scan to find the start of your range.
the index you create is clustered. Clustered indexes are way faster for
scans like this as the leaf in the index is the record (in comparison, the leaf in a
non-clustered index is a pointer to the record which may be in a completely different
part of the disk). As well, the clustered index
forces a sorted structure on the
data. Thus you may be able to read all 100
records from a single block.
Of course, adding an index will make writing to the table slightly slower. As well, I am assuming you are writing to the table in order (i.e. 0,1,2,3,4 etc. not 10,5,100,3 etc.). Writes to tables with clustered indexes are very slow if you write to the table in a random order (as the DB has to keep moving records to fit the new ones in).

MySQL: low cardinality/selectivity columns = how to index?

I need to add indexes to my table (columns) and stumbled across this post:
How many database indexes is too many?
Quote:
“Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.”
Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)
Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)
Further conditions:
The Database contains 300 Million records
Status can only be “enabled” and “disabled”
150 Million records have status= enabled and 150 Million records have
stauts= disabled
My understanding is, without having an index on status, a select with where status=’enabled’ would result in a full tablescan with 300 Million Records to process?
How efficient is the lookup when I use a BTREE index on status?
Should I index this column or not?
What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?
The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.
The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.
In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.
Now back to your example, you have a where clause of where status = 'enabled', the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.
There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name) as part of your query to allow comparisons between each table access method.
Reference:
http://dev.mysql.com/doc/refman/5.5/en/how-to-avoid-table-scan.html
I'm sorry to say that I do not agree with Mike. Adding an index is meant to limit the amount of full records searches for MySQL, thereby limiting IO which usually is the bottleneck.
This indexing is not free; you pay for it on inserts/updates when the index has to be updated and in the search itself, as it now needs to load the index file (full text index for 300M records is probably not in memory). So it might well be that you get extra IO in stead of limitting it.
I do agree with the statement that a binary variable is best stored as one, a bool or tinyint, as that decreases the length of a row and can thereby limit disk IO, also comparisons on numbers are faster.
If you need speed and you seldom use the disabled records, you may wish to have 2 tables, one for enabled and one for disabled records and move the records when the status changes. As it increases complexity and risk this would be my very last choice of course. Definitely do the move in 1 transaction if you happen to go for it.
It just popped into my head that you can check wether an index is actually used by using the explain statement. That should show you how MySQL is optimizing the query. I don't really know hoe MySQL optimizes queries, but from postgresql I do know that you should explain a query on a database approximately the same (in size and data) as the real database. So if you have a copy on the database, create an index on the table and see wether it's actually used. As I said, I doubt it, but I most definitely don't know everything:)
If the data is distributed like 50:50 then query like where status="enabled" will avoid half scanning of the table.
Having index on such tables is completely depends on distribution of data, i,e : if entries having status enabled is 90% and other is 10%. and for query where status="disabled" it scans only 10% of the table.
so having index on such columns depends on distribution of data.
#a'r answer is correct, however it needs to be pointed out that the usefulness of an index is given not only by its cardinality but also by the distribution of data and the queries run on the database.
In OP's case, with 150M records having status='enabled' and 150M having status='disabled', the index is unnecessary and a waste of resource.
In case of 299M records having status='enabled' and 1M having status='disabled', the index is useful (and will be used) in queries of type SELECT ... where status='disabled'.
Queries of type SELECT ... where status='enabled' will still run with a full table scan.
You will hardly need all 150 mln records at once, so I guess "status" will always be used in conjunction with other columns. Perhaps it'd make more sense to use a compound index like (status, fullname)
Jan, you should definitely index that column. I'm not sure of the context of the quote, but everything you said above is correct. Without an index on that column, you are most certainly doing a table scan on 300M rows, which is about the worst you can do for that data.
Jan, as asked, where your query involves simply "where status=enabled" without some other limiting factor, an index on that column apparently won't help (glad to SO community showed me what's up). If however, there is a limiting factor, such as "limit 10" an index may help. Also, remember that indexes are also used in group by and order by optimizations. If you are doing "select count(*),status from table group by status", an index would be helpful.
You should also consider converting status to a tinyint where 0 would represent disabled and 1 would be enabled. You're wasting tons of space storing that string vs. a tinyint which only requires 1 byte per row!
I have a similar column in my MySQL database. Approximately 4 million rows, with the distribution of 90% 1 and 10% 0.
I've just discovered today that my queries (where column = 1) actually run significantly faster WITHOUT the index.
Foolishly I deleted the index. I say foolishly, because I now suspect the queries (where column = 0) may have still benefited from it. So, instead I should explicitly tell MySQL to ignore the index when I'm searching for 1, and to use it when I'm searching for 0. Maybe.