How to improve the performance of the Mysql Query

How to improve the performance of the Mysql Query - mysql

I have a Mysql Query:
select stu_photo, type, sign, type1, fname,regno1
from stadm
where regno1 = XXXXXX
LIMIT 1000;
stadm table has 67063 rows. Execution time period of above query is 5-6Mints.
am unable to add index for stu_photo and sign column (there datatype is blob & longblob) Table_Enginee is Innodb. How can i increasing the performance (i.e., To reduce the execution time period)?

One improvement I can see for your query would be to add an index on the regno1 column. This would potentially speed up the WHERE clause.
am unable to add index for stu_photo and sign column
These columns should not impact the performance of the query you showed us.
Another factor influencing the performance of the query is the time it takes to send the result set back to your console or application. If each record be very large, then things may appear slow, even for only 1000 records.

create a new column md5_regno1 from type varchar and store the md5 from regno1 in it. then you can create a index on the new column search like this:
select stu_photo, type, sign, type1, fname,regno1
from stadm
where md5_regno1 = MD%('XXXXXX')
LIMIT 1000;

Related

Database table size and mysql query execution time

I have a database table which has about 500000 rows. When I use mysql select query the execution time is quite long, about 0.4 seconds. Same query from a smaller table takes about 0.0004 seconds.
Is there any solutions to make this query faster?

Most important thing: Use an index, suitable for your where-clause.
0.1) Use an index, that covers not only the where clause, but also all selected columns. This way the result can be returned by only using the index and not loading the data from the actual rows indentifed by the index.
If that is not enough you can even use an index that contains all rows that need to be returned by your query. So the query can look up everything from the index and does not have to load the actual rows.
Reduce the number of returned columns to the columns you really need. Don't select all columns if you are not using every one of them.
Use data types appropriate to the stored data, and choose the smalles data types possible. E.g. when you have to store a number that will never exceed 100 you can use a TINYINT that only consumes 1 byte instead of a BIGINT that will use 8 byte in every row (integer types).

Rails 4/ postgresql index - Should I use as index filter a datetime column which can have infinite number of values?

I need to optimize a query fetching all the deals in a certain country before with access by users before a certain datetime a certain time.
My plan is to implement the following index
add_index(:deals, [:country_id, :last_user_access_datetime])
I am doubting the relevance and efficientness of this index as the column last_user_access_datetime can have ANY value of date ex: 13/09/2015 3:06pm and it will change very often (updated each time a user access it).
That makes an infinite number of values to be indexed if I use this index?
Should I do it or avoid using 'infinite vlaues possible column such as a totally free datetime column inside an index ?

If you have a query like this:
select t.
from table t
where t.country_id = :country_id and t.last_user_access_datetime >= :some_datetime;
Then the best index is the one you propose.
If you have a heavy load on the machine in terms of accesses (and think many accesses per second), then maintaining the index can become a burden on the machine. Of course, you are updating the last access date time value anyway, so you are already incurring overhead.
The number of possible values does not have an effect on the value. A database cannot store an "infinite" number of values (at least on any hardware currently available), so I'm not sure what your concern is.

The index will be used. Time for UPDATE and INSERT statements just take that much longer, because the index is updated each time also. For tables with much more UPDATE/INSERT than SELECTs, it may not be fruitful to index the column. Or, you may want to make an index that looks more like the types of queries that are hitting the table. Include the IDs and timestamps that are in the SELECT clause. Include the IDs and timestamps that are in the WHERE clause. etc.
Also, if a table has a lot of DELETEs, a lot of indices can slow down operations a lot.

500,000,000 records , int field timestamp indexed , will mysql handle searching through this query?

assuming i have a MySQL table with 500 million records within it
it have a column ts , which is indexed and have timestamps as values .
is it efficient doing a
select * from table where ts>=time()-24hours
as well as for last 24 hours/week/month/year ?
or shall i go for nosql ?

I've found that MySQL will often throw away indexes when dealing with large date ranges. Your ">=", which technically a range (ending in "now"), might be an exception for the optimizer since it should be able to jump to the starting point...
If you can stand the time to build the table, use EXPLAIN to see if it's using the index.
Hopefully you don't need to SELECT *. If you only need a couple of smaller columns in your results, put them in the index, too, which will keep MySQL from having to read the underlying wider rows from disk.

How to optimize this MySQL query? (moving window)

I have a huge table (400k+ rows), where each row describes an event in the FX market. The table's primary key is an integer named 'pTime' - it is the time at which the event occurred in POSIX time.
My database is queried repeatedly by my computer during a simulation that I constantly run. During this simulation, I pass an input pTime (I call it qTime) to a MySQL procedure. qTime is a query point from that same huge table. Using qTime, my procedure filters the table according to the following rule:
Select only those rows whose pTime is a maximum 2 hours away from the input qTime on any day.
ex.
query point: `2001-01-01 07:00`
lower limit: `ANY-ANY-ANY 05:00`
upper limit: `ANY-ANY-ANY 09:00`
After this query the query point will shift by 1 row (5 minutes), and a new query will be initiated:
query point: `2001-01-01 07:05`
lower limit: `ANY-ANY-ANY 05:05`
upper limit: `ANY-ANY-ANY 09:05`
This is the way I accomplish that:
SELECT * FROM mergetbl WHERE
TIME_TO_SEC(TIMEDIFF(FROM_UNIXTIME(pTime,"%H:%i"),FROM_UNIXTIME(qTime,"%H:%i")))/3600
BETWEEN -2 AND 2
Although I have an index on pTime, this piece of code significantly slows down my software.
I would like to pre-process this statement for each value of pTime (which will later serve as an input qTime), but I cannot figure out a way to do this.

You query still needs to scan every value because of how you are testing the time within certain ranges that are not spanning of the index.
You would need to separate your time into a different field and index to gain the benefit of an index here.
(note: answer was edited to fix my original misunderstanding of the question)

If you rely only on time - I'd suggest you to add another column of time type with time fraction of pTime and perform queries over it

DATETIME is the wrong type in this case because no system of DATETIME storage I know of will be able to use an index if you're examining only the TIME part of the value. The easy optimization is, as others have said, to store the time separately in a field of datatype TIME (or perhaps some kind of integer offset) and index that.
If you really want the two pieces of information in the same column you'll have to roll your own data format, giving primacy to the time type. You could use a string type in the format HH:MM:SS YYYY-MM-DD or you could use a NUMERIC field in which the whole number part is a seconds-from-midnight offset and the decimal part a days-from-reference-date offset.
Also, consider how much value the index will be. If your range is four hours, assuming equal distribution during the day, this index will return 17% of your database. While that will produce some benefit, if you're doing any other filtering I would try to work that into your index as well.

MySQL: low cardinality/selectivity columns = how to index?

I need to add indexes to my table (columns) and stumbled across this post:
How many database indexes is too many?
Quote:
“Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index.”
Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)
Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)
Further conditions:
The Database contains 300 Million records
Status can only be “enabled” and “disabled”
150 Million records have status= enabled and 150 Million records have
stauts= disabled
My understanding is, without having an index on status, a select with where status=’enabled’ would result in a full tablescan with 300 Million Records to process?
How efficient is the lookup when I use a BTREE index on status?
Should I index this column or not?
What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?

The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.
The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.
In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.
Now back to your example, you have a where clause of where status = 'enabled', the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.
There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name) as part of your query to allow comparisons between each table access method.
Reference:
http://dev.mysql.com/doc/refman/5.5/en/how-to-avoid-table-scan.html

I'm sorry to say that I do not agree with Mike. Adding an index is meant to limit the amount of full records searches for MySQL, thereby limiting IO which usually is the bottleneck.
This indexing is not free; you pay for it on inserts/updates when the index has to be updated and in the search itself, as it now needs to load the index file (full text index for 300M records is probably not in memory). So it might well be that you get extra IO in stead of limitting it.
I do agree with the statement that a binary variable is best stored as one, a bool or tinyint, as that decreases the length of a row and can thereby limit disk IO, also comparisons on numbers are faster.
If you need speed and you seldom use the disabled records, you may wish to have 2 tables, one for enabled and one for disabled records and move the records when the status changes. As it increases complexity and risk this would be my very last choice of course. Definitely do the move in 1 transaction if you happen to go for it.
It just popped into my head that you can check wether an index is actually used by using the explain statement. That should show you how MySQL is optimizing the query. I don't really know hoe MySQL optimizes queries, but from postgresql I do know that you should explain a query on a database approximately the same (in size and data) as the real database. So if you have a copy on the database, create an index on the table and see wether it's actually used. As I said, I doubt it, but I most definitely don't know everything:)

If the data is distributed like 50:50 then query like where status="enabled" will avoid half scanning of the table.
Having index on such tables is completely depends on distribution of data, i,e : if entries having status enabled is 90% and other is 10%. and for query where status="disabled" it scans only 10% of the table.
so having index on such columns depends on distribution of data.

#a'r answer is correct, however it needs to be pointed out that the usefulness of an index is given not only by its cardinality but also by the distribution of data and the queries run on the database.
In OP's case, with 150M records having status='enabled' and 150M having status='disabled', the index is unnecessary and a waste of resource.
In case of 299M records having status='enabled' and 1M having status='disabled', the index is useful (and will be used) in queries of type SELECT ... where status='disabled'.
Queries of type SELECT ... where status='enabled' will still run with a full table scan.

You will hardly need all 150 mln records at once, so I guess "status" will always be used in conjunction with other columns. Perhaps it'd make more sense to use a compound index like (status, fullname)

Jan, you should definitely index that column. I'm not sure of the context of the quote, but everything you said above is correct. Without an index on that column, you are most certainly doing a table scan on 300M rows, which is about the worst you can do for that data.
Jan, as asked, where your query involves simply "where status=enabled" without some other limiting factor, an index on that column apparently won't help (glad to SO community showed me what's up). If however, there is a limiting factor, such as "limit 10" an index may help. Also, remember that indexes are also used in group by and order by optimizations. If you are doing "select count(*),status from table group by status", an index would be helpful.
You should also consider converting status to a tinyint where 0 would represent disabled and 1 would be enabled. You're wasting tons of space storing that string vs. a tinyint which only requires 1 byte per row!

I have a similar column in my MySQL database. Approximately 4 million rows, with the distribution of 90% 1 and 10% 0.
I've just discovered today that my queries (where column = 1) actually run significantly faster WITHOUT the index.
Foolishly I deleted the index. I say foolishly, because I now suspect the queries (where column = 0) may have still benefited from it. So, instead I should explicitly tell MySQL to ignore the index when I'm searching for 1, and to use it when I'm searching for 0. Maybe.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008