One day I suspect I'll have to learn hadoop and transfer all this data to a non-structured database, but I'm surprised to find the performance degrade so significantly in such a short period of time.
I have a mysql table with just under 6 million rows.
I am doing a very simple query on this table, and believe I have all the correct indexes in place.
the query is
SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date
the explain returns
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE updateshows range date_idx date_idx 7 NULL 648997 Using where
so i am using the correct index as far as I can tell, but this query is taking 11 seconds to run.
The database is MyISAM, and phpMyAdmin says the table is 1.0GiB.
Any ideas here?
Edited:
The date_idx is indexes both the date and venid columns. Should those be two seperate indexes?
What you want to make sure is that the query will use ONLY the index, so make sure that the index covers all the fields you are selecting. Also, since it is a range query involved, You need to have the venid first in the index, since it is queried as a constant. I would therefore create and index like so:
ALTER TABLE events ADD INDEX indexNameHere (venid, date, time);
With this index, all the information that is needed to complete the query is in the index. This means that, hopefully, the storage engine is able to fetch the information without actually seeking inside the table itself. However, MyISAM might not be able to do this, since it doesn't store the data in the leaves of the indexes, so you might not get the speed increase you desire. If that's the case, try to create a copy of the table, and use the InnoDB engine on the copy. Repeat the same steps there and see if you get a significant speed increase. InnoDB does store the field values in the index leaves, and allow covering indexes.
Now, hopefully you'll see the following when you explain the query:
mysql> EXPLAIN SELECT date, time FROM events WHERE venid='47975' AND date>='2009-07-11' ORDER BY date;
id select_type table type possible_keys key [..] Extra
1 SIMPLE events range date_idx, indexNameHere indexNameHere Using index, Using where
Try adding a key that spans venid and date (or the other way around, or both...)
I would imagine that a 6M row table should be able to be optimised with quite normal techniques.
I assume that you have a dedicated database server, and it has a sensible amount of ram (say 8G minimum).
You will want to ensure you've tuned mysql to use your ram efficiently. If you're running a 32-bit OS, don't. If you are using MyISAM, tune your key buffer to use a signficiant proportion, but not too much, of your ram.
In any case you want to run repeated performance testing on production-grade hardware.
Try putting an index on the venid column.
Related
i'm writing mysql query for checking any existing record in final table, if so then i will update it first and then insert those records which are not present in final table. issue here is using join its taking more time to execute and since using this in aws lambda its timing out means taking more than 15 mins. i'm not using any index here since i couldn't because we have cusomters who uses the unique constraint on different columns.
select count(Staging.EmployeeId)
from Staging
inner join Final on Staging.EmployeeId = Final.EmployeeId
where Staging.status='V'
and Staging.StagingId >= 66518110
and Staging.StagingId <= 66761690
and Staging.EmployeeId is not null
and Staging.EmployeeId <> '' ;
I'm looking in range of 250k records at once and no luck using above query. could anyone suggest how to speed up above query. I cannot use index, so looking for other option to optimize above query. thanks in advance
Creating indexes to support the search conditions and the join conditions would be the most common and the most effective way to optimize this query.
But you said you can't use indexes. This seems like an inadvisable limitation, but so be it.
Your options are therefore:
Allocate more RAM to the InnoDB buffer pool and pre-cache your table data pages, so your table-scans at least occur in RAM and do not have to wait for disk I/O.
Upgrade your server to one with faster CPUs.
Delete data until your table-scans take less time.
I mean no disrespect, but frankly, your question is like asking how to start a fire with wet newspaper.
"unique constraint on different columns" -- this does not necessarily prohibit adding indexes. You must have some indexes, whether they are UNIQUE or not.
Staging: INDEX(status, StagingId, EmployeeId)
Final: INDEX(EmployeeId)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
If any of those columns is the PRIMARY KEY, then my advice may not be correct.
Are the tables 1:1? If not, are the 1:many, and which table is the "one"?
I have 1 MyISAM table with 620,000 rows. Im running XAMPP on a Dual Core Server with 2GB RAM. Apache is installed as a Windows Service, MySQL is controlled from the XAMPP control panel.
The query below is taking 30+ seconds to run.
select `id`,`product_name`,`search_price`,`field1`,`field2`,
`field3`,`field4`
from `all`
where MATCH (`product_name`) AGAINST ('searchterm')
AND `search_price` BETWEEN 0 AND 1000
ORDER BY `search_price` DESC
LIMIT 0, 30
I have a FULLTEXT index on product_name, a BTREE on search_price, auto increment on id
If I explain the above query the results are:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE all fulltext search_price,FULLTEXT_product_name FULLTEXT_product_name 0 NULL 1 Using where; Using filesort
How can I speed up this query? Should it be taking this long on a table of 620,000 rows?
Ive just noticed that this only happens when the database has not been queried for a while, so im guessing this is to do with the cache, the first query is taking 30+ seconds, then if I try a second time the query takes 1 second
MySQL will do the fulltext search first, then look up the rest of the info, filter on price, sort on price, and finally deliver 30. There is essentially no way to shorten that process.
Yes, caching is likely to be the explanation for 30 seconds becoming 1 second.
Switching to InnoDB (which now has FULLTEXT) may provide some benefits.
If running entirely MyISAM, do you have key_buffer_size set to about 20% of available RAM? If you were much lower (or higher) than this, that could cause performance problems.
If running entirely InnoDB, set innodb_buffer_pool_size to about 70% of available RAM.
MySQL's capability of dealing with FULLTEXT is somewhat limited when th size of the table goes above 300,000. And it will peform even worse if you use really common words as search keywords like (in,the,of, etc commonly marked as stop words). I recommend using Sphinx Full Text Search/ Apache Lucene
Stackoverflow links:
Comparison of the two
More Comparison
The orders table has 2m records. There are ~900K unique ship-to-ids.
There is an index on ship_to_id ( the field isint(8)).
The query below takes nearly 10mn to complete. I've run PROCESSLIST which has Command = Query and State = Sending Data.
When I run explain, the existing index is used, and possible_keys is NULL.
Is there anything I should do to speed this query up? Thanks.
SELECT
ship_to_id as customer_id
FROM orders
GROUP BY ship_to_id
HAVING SUM( price_after_discount ) > 0
Does not look like you have a useful index. Try adding an index on price_after_discount, and add a where condition like this:
WHERE price_after_discount > 0
to minimize the number of rows you need to sum as you can obviously discard any that are 0.
Also try running "top" command and look at the io "wait" column while the query is running. If its high, it means your query causes a lot of disk I/O. You can increase various memory buffers if you have the RAM to speed this up (if you're using innodb) or myisam is done through filesystem cacheing. Restarting the server will flush these caches.
If you do not have enough RAM (which you shouldn't need too much for 2M records) then consider a partitioning scheme against maybe ship-to-ids column (if your version of mysql supports it).
If all the orders in that table aren't current (i.e. not going to change again) then you could archive them off into another table to reduce how much data has to be scanned.
Another option is to throw a last_modified timestamp on the table with an index. You could then keep track of when the query is run and store the results in another table (query_results). When it's time to run the query again, you would only need to select the orders that were modified since the last time the query was run, then use that to update the query_results. The logic is a little more complicated, but it should be much faster assuming a low percentage of the orders are updated between query executions.
MySQL will use an index for a group by, at least according to the documentation, as explained here.
To be most useful, all the columns used in the query should be in the index. This prevents the engine from having to reference the original data as well as the index. So, try an index on orders(ship_to_id, price_after_discount).
I have a table which I do mainly updates and I'm wondering if update queries would benefit from having an index on the where column and the updated column or an index on just where column?
Just on the where column. An index on the update column will actually slow down your query because the index has to be updated along with the data. An index on the where column will speed up updates, and selects, but slow down some insertions.
Indices also cause overhead when you delete rows. In general they are a good thing though on columns you are using WHERE on a lot, and they are basically necessary on columns you do joins on, or ORDER BY
Not a straight forward answer for this one. So here goes.
UPDATE table SET ColumnA = 'something'
if an index exists on ColumnA then you will have a slight performance hit as there will be two write operations for each row. First the data in the table and then the write for the index update.You can even have several indexes that each have ColumnA as part of the index which mean you will have several writes in addition to the table row. You can see how having more than a few indexes can start to really slow your updates down.
But if ColumnA is not indexed at all then it will be a single write for each row only.
UPDATE table SET ColumnA = 'something' WHERE ColumnB = 'something else'
For this query if an index exists on ColumnB and not on ColumnA, it will be very fast to locate the record (called a seek) and a single write to update, and as the index doesn't care about columnA, it wont need updating.But if you index ColumnA and not ColumnB, You will read every row in the table first (called a scan and normally a bad thing) which while a read is faster than a write it is still very slow, then it will write to the table and then another write for the index. Basically the slowest way of doing things.
DELETE table WHERE ColumnB = 'somethingelse'
Now if you have an index on any column in this table two writes, delete from table and a update/delete of the record in the index. Again if ColumnB is not indexed, you will scan the table then delete the row(s) from the table and update indexes if any.
INSERT INTO table (ColumnA, ColumnB) VALUES ('something','something else')
If no indexes exist, a single write to the table and it's done.
Again, if indexes do exist, then an extra write for each one.
I haven't mentioned the primary key unique constraints, because you really cant get around them when you need a primary key, but every record must be checked to see if something already exists with that key before insert. Which will be a fast primary key index seek, but nevertheless, its another step in the process. The less steps the faster it will be.
Now back to yours, Basically, if you need to update a specific record, an index will help you locate that record faster than scanning the entire table. The the time saved to locate the record will be much more then the time lost updating the indexes. If you are only inserting and never reading, then indexes will slow you down. It becomes a balance thing. If you need to read specific records, then an index will help immensely. But the more indexes, the slower the writes get.
Most people here don't know how indexes work in MySQL.
It depends on with storage engine you are using. InnoDB uses indexes completely different from MyISAM. This is because MySQL implements indexes on the storage engine level not the MySQL server level.
I'm afraid most people here are giving you answers based on other databases in which indexes work differently from MySQL.
InnoDB
In the case of InnoDB. This is because whenever a row is updated in InnoDB, the index has to be updated as well, as InnoDB's indexes have to be sequential, so it has to find out which page node of the index it is supposed to be in and inserted there. At times that particular page maybe full, so it has to split the page, wasting both space and increasing the time. This happens no matter which column you index because InnoDB uses clustered indexes, where the index stores the data of the entire row.
MyISAM
In the case of MyISAM, it does not have this problem. MyISAM actually uses only 1 column index, even though you can set multiple uniques on more than 1 column. Also MyISAM's index is not stored sequentially so updates are very quick. Likewise inserts are quick as well, as MyISAM just inserts it at the end of the row.
Conclusion
So in regard to your question, you should consider your schema design instead of worrying about whether the query would use the indexes. If you are updating mostly on a table, I suggest you not use InnoDB unless if you need row-level locking, high concurrency, and transactions. Otherwise MyISAM would be much better for update tasks. And no if you are using InnoDB indexes do not really help with updating, especially if the table is very large.
Is there any performance issues if you create an index with multiple columns, or should you do 1 index per column?
There's nothing inherently wrong with a multi-column index -- it depends completely on how you're going to query the data. If you have an index on colA+colB, it will help for queries like where colA='value' and colA='value' and colB='value' but it's not going to help for queries like colB='value'.
Advantages of MySQL Indexes
Generally speaking, MySQL indexing into database gives you three advantages:
Query optimization: Indexes make search queries much faster.
Uniqueness: Indexes like primary key index and unique index help to avoid duplicate row data.
Text searching: Full-text indexes in MySQL version 3.23.23, users have the opportunity to optimize searching against even large amounts of text located in any field indexed as such.
Disadvantages of MySQL indexes
When an index is created on the column(s), MySQL also creates a separate file that is sorted, and contains only the field(s) you're interested in sorting on.
Firstly, the indexes take up disk space. Usually the space usage isn’t significant, but because of creating index on every column in every possible combination, the index file would grow much more quickly than the data file. In the case when a table is of large table size, the index file could reach the operating system’s maximum file size.
Secondly, the indexes slow down the speed of writing queries, such as INSERT, UPDATE and DELETE. Because MySQL has to internally maintain the “pointers” to the inserted rows in the actual data file, so there is a performance price to pay in case of above said writing queries because every time a record is changed, the indexes must be updated. However, you may be able to write your queries in such a way that do not cause the very noticeable performance degradation.