table optimization - mysql

I am using MySQl , I have a table named cars which is in my dev_db database.
I inserted about 6,000,000 data into the table (That's a large amount of data insertion) by using bulk insertion like following:
INSERT INTO cars (cid, name, msg, date)
VALUES (1, 'blabla', 'blabla', '2001-01-08'),
(11, 'blabla', 'blabla', '2001-11-28'),
... ,
(3, 'blabla', 'blabla', '2010-06-03');
After this large data insertion into my cars table
I decide to also optimize the table like following:
OPTIMIZE TABLE cars;
I waited 53min for the optimization, finally it is done and mysql console shows me the following message:
The Msg_text shows me this table does not support optimize... , which makes my brain yields two questions to ask :
1. Does the mysql message above means the 53min I waited actually did nothing useful??
2. is it necessary to optimize my table after large amount data insertion? and why?

Optimize is useful if you have removed or overwritten rows, or if you have changed indexes. If you just inserted data it is not needed to optimize.
The MySQL Optimize Table command will effectively de-fragment a mysql
table and is very useful for tables which are frequently updated
and/or deleted.
Also look here: http://www.dbtuna.com/article.php?id=15

It looks like, You have InnoDB table, which doesn't support OPTIMIZE TABLE

As you can read in the output InnoDB does not support optimize as such.
Instead it does a recreate + optimize on the indexes instead.
The result is much the same and should not really bother you, you end up with optimized indexes.
However you only ever have to optimize your indexes if you delete rows or update indexed fields.
If you only ever insert then your B-trees will not get unbalanced and do not need optimization.
So:
Does the mysql message above means the 53min I waited actually did nothing useful??
The time spend waiting was useless, but not for the reason you think.
If there is anything to optimize, MySQL will do it.
is it necessary to optimize my table after large amount data insertion? and why?
No, never.
The reason is that MySQL (InnoDB) uses B-trees, which are fast only if they are balanced.
If the nodes are all on one side of the tree, the index degrades into a ordered list, which gives you O(n) worst case time, An fully balanced tree has O(log n) time.
However the index can only become unbalanced if you delete rows, or alter the values of indexed fields.

Related

mysql query optimisation on huge record

i'm writing mysql query for checking any existing record in final table, if so then i will update it first and then insert those records which are not present in final table. issue here is using join its taking more time to execute and since using this in aws lambda its timing out means taking more than 15 mins. i'm not using any index here since i couldn't because we have cusomters who uses the unique constraint on different columns.
select count(Staging.EmployeeId)
from Staging
inner join Final on Staging.EmployeeId = Final.EmployeeId
where Staging.status='V'
and Staging.StagingId >= 66518110
and Staging.StagingId <= 66761690
and Staging.EmployeeId is not null
and Staging.EmployeeId <> '' ;
I'm looking in range of 250k records at once and no luck using above query. could anyone suggest how to speed up above query. I cannot use index, so looking for other option to optimize above query. thanks in advance
Creating indexes to support the search conditions and the join conditions would be the most common and the most effective way to optimize this query.
But you said you can't use indexes. This seems like an inadvisable limitation, but so be it.
Your options are therefore:
Allocate more RAM to the InnoDB buffer pool and pre-cache your table data pages, so your table-scans at least occur in RAM and do not have to wait for disk I/O.
Upgrade your server to one with faster CPUs.
Delete data until your table-scans take less time.
I mean no disrespect, but frankly, your question is like asking how to start a fire with wet newspaper.
"unique constraint on different columns" -- this does not necessarily prohibit adding indexes. You must have some indexes, whether they are UNIQUE or not.
Staging: INDEX(status, StagingId, EmployeeId)
Final: INDEX(EmployeeId)
When adding a composite index, DROP index(es) with the same leading columns.
That is, when you have both INDEX(a) and INDEX(a,b), toss the former.
If any of those columns is the PRIMARY KEY, then my advice may not be correct.
Are the tables 1:1? If not, are the 1:many, and which table is the "one"?

Fastest MySQL Storage Engine for `Select` queries

I am having a question about "which storage device to choose" for my database tables. I have a table with 28 million records. I will insert data after creating the table, after that, no insert - update -delete operation will take place. Never. Only select operations.
I have a query like below
SELECT `indexVal`, COUNT(`indexVal`) FROM `key_word` WHERE `hashed_word` IN ('001','01v','0ji','0k9','0vc','0#v','0%d','13#' ,'148' ,'1e1','1sx','1v$','1#c','1?b','1?k','226','2kl','2ue','2*l','2?4','36h','3au','3us','4d~') GROUP BY `indexVal`
This counts how many number of times a particular result appeared in search. In InnoDB, this operation took 5 seconds. This is too much, because my orifginal dataset will be in billions.
To do this kind of work, which MySQL storage you recommend?
More than the storage engine, having the proper index in place seems important.
In your case, CREATE INDEX idx_1 ON key_word (index_val, hashed_word) should help.
And if the data truly never changes, you could even pre-compute and cache some of those results.
For example
CREATE TABLE counts AS SELECT index_val, hashed_word, count(index_val)
FROM key_word
GROUP BY index_val, hashed_word
For SELECT-only queries, ARCHIVE is the fastest storage engine.
As it is MyISAM-based, and the following advice is for MyISAM as well, don't use varchar but fixed-size char columns, and you will get better performance.
Sure, even faster if it's the data is loaded in memory, instead read from disk.

MySQL multiple insert performance

I have data containing about 30 000 records. And I need to insert this data into MySQL table. I group this data in packages by 1000 and create multiple inserts like this:
INSERT INTO `table_name` VALUES (data1), (data2), ..., (data1000);
How can I optimize performance of this inserting? Can I insert more than 1000 records per time? Each row contains data with size about 1KB. Thanks.
Try wrapping your bulk insert inside a transaction.
START TRANSACTION
INSERT INTO `table_name` VALUES (data1), (data2), ..., (data1000);
COMMIT
That might improve performance, I'm not sure if mySQL can partially commit a bulk insert though (if it can't then this likely won't really help much)
Remember that even at 1.5 seconds, for 30,000 records each at ~1k in size, you're doing 20MB/s commit speed you could actually be drive limited depending on your hardware setup.
Advice then would be to investigate a SSD or changing your Raid setup or get faster mechanical drives (there's plenty of online articles on the pros and cons of using a SQL db mounted on a SSD).
You need to check mysql server configurations and specifically check buffer size etc.
You can remove indexes from the table, if any, to make it faster. Create the indexes onces data is in.
Look here, you should get all you need.
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
One insert statement with multiple values, it says, is much faster than multiple insert statements.
Is this a once off operation?
If so, just generate a single sql statement per data element and execute them all on the server. 30,000 really shouldnt take very long and you will have the simplest means of inputting your data.

How to manage Huge operations on MySql

I have a MySql DataBase. I have a lot of records (about 4,000,000,000 rows) and I want to process them in order to reduce them(reduce to about 1,000,000,000 Rows).
Assume I have following tables:
table RawData: I have more than 5000 rows per sec that I want to insert them to RawData
table ProcessedData : this table is a processed(aggregated) storage for rows that were inserted at RawData.
minimum rows count > 20,000,000
table ProcessedDataDetail: I write details of table ProcessedData (data that was aggregated )
users want to view and search in ProcessedData table that need to join more than 8 other tables.
Inserting in RawData and searching in ProcessedData (ProcessedData INNER JOIN ProcessedDataDetail INNER JOIN ...) are very slow. I used a lot of Indexes. assume my data length is 1G, but my Index length is 4G :). ( I want to get ride of these indexes, they make slow my process)
How can I Increase speed of this process ?
I think I need a shadow table from ProcessedData, name it ProcessedDataShadow. then proccess RawData and aggregate them with ProcessedDataShadow, then insert the result in ProcessedDataShadow and ProcessedData. What is your idea??
(I am developing the project by C++)
thank you in advance.
Without knowing more about what your actual application is, I have these suggestions:
Use InnoDB if you aren't already. InnoDB makes use of row-locks and are much better at handling concurrent updates/inserts. It will be slower if you don't work concurrently, but the row-locking is probably a must have for you, depending on how many sources you will have for RawData.
Indexes usually speeds up things, but badly chosen indexes can make things slower. I don't think you want to get rid of them, but a lot of indexes can make inserts very slow. It is possible to disable indexes when inserting batches of data, in order to prevent updating indexes on each insert.
If you will be selecting huge amount of data that might disturb the data collection, consider using a replicated slave database server that you use only for reading. Even if that will lock rows /tables, the primary (master) database wont be affected, and the slave will get back up to speed as soon as it is free to do so.
Do you need to process data in the database? If possible, maybe collect all data in the application and only insert ProcessedData.
You've not said what the structure of the data is, how its consolidated, how promptly data needs to be available to users nor how lumpy the consolidation process can be.
However the most immediate problem will be sinking 5000 rows per second. You're going to need a very big, very fast machine (probably a sharded cluster).
If possible I'd recommend writing a consolidating buffer (using an in-memory hash table - not in the DBMS) to put the consolidated data into - even if it's only partially consolidated - then update from this into the processedData table rather than trying to populate it directly from the rawData.
Indeed, I'd probably consider seperating the raw and consolidated data onto seperate servers/clusters (the MySQL federated engine is handy for providing a unified view of the data).
Have you analysed your queries to see which indexes you really need? (hint - this script is very useful for this).

Slow select when inserting large amounts of data (MYSQL)

I have a process that imports a lot of data (950k rows) using inserts that insert 500 rows at a time. The process generally takes about 12 hours, which isn't too bad. Normally doing a query on the table is pretty quick (under 1 second) as I've put (what I think to be) the proper indexes in place. The problem I'm having is trying to run a query when the import process is running. It is making the query take almost 2 minutes! What can I do to make these two things not compete for resources (or whatever)? I've looked into "insert delayed" but not sure I want to change the table to MyISAM.
Thanks for any help!
Have you tried using priority hints?
SELECT HIGH_PRIORITY ... and INSERT LOW_PRIORITY ...
12 hours to insert 950k rows is pretty heavy duty. How big are these rows? What kind of indexes are on them? Even if the actual data insertion goes quickly, the continual updating of the indexes will definitely cause performance degradation for anything using those table(s) at the time.
Are you doing these imports with the bulk INSERT syntax (insert into tab (x) values (a), (b), (c), etc...) or one INSERT per row? Doing the bulk insert will require a longer index updating period (as it has to generate index data for 500 rows) than doing it for a single row. There will be no doubt be some sort of internal lock placed on the indexes while the data's updated, in which case you're contending with 950k/500 = 1,900 locking sessions at minimum.
I found that on some of my bulk-insert scripts (an http log analyzer for some custom data mining), it was quicker to DISABLE indexes on the relevant tables, then reenabling/rebuilding them after the data dump was completed. If I remember right, it was about 37 minutes to insert 200,000 rows of hit data with keys enabled, and about 3 minutes with no indexing.
So I finally found the slowdown while searching during the import of my data. I had one query like this:
SELECT * FROM `properties` WHERE (state like 'Florida%') and (county like 'Hillsborough%') ORDER BY created_at desc LIMIT 0, 50
and when I ran an EXPLAIN on it, I found out it was scanning around 215,000 rows (even with proper indexes on state and county in place). I then ran an EXPLAIN on the following query:
SELECT * FROM `properties` WHERE (state = 'Florida') and (county = 'Hillsborough') ORDER BY created_at desc LIMIT 0, 50
and saw that it only had to scan 500 rows. Considering that the actual result set was something like 350, I think I identified the slowdown.
I've made the switch to not using "like" in my queries and am very happy with the snappier results.
Thanks to everyone for your help and suggestions. They are much appreciated!
You can try import your data to some auxiliary table and then merge it into the main table. You don't lose performance in your main table, and I think your db can manage the merge much faster than the multiple insertions.