Update IN MYSQL InnoDB million records - mysql

MYSQL Innodb Update Issue:
Once I receive a response (status) for a record ,I need to update the response to a very large table (Approximate 1 million records and will keep increasing),and this will keep happen may be 100 times per second. May I know will there any performance issue? OR any setting I can modify to avoid table locking or query slowing issue.
Thanks.

It sounds like a design issue.
Instead storing the flag (which the status-record update changes) for million data-records, you should store a reference in data-records pointing to the status-record. So, when you update the status-record, no further db operation required. Also, when you're scanning through the data-records, you should JOIN for the status-records (if it's needed to display). If status-record change occurs often, it's better than update millions of data-records.
Maybe, I'm wrong, you should explain the db (structure, table record counts) for more accurate answers.

If you store your table using the MyISAM storage engine, then your table will lock with every update.
However, the InnoDB storage engine is capable of locking individual rows.
If you need to UPDATE multiple records simultaneously, InnoDB may be better.

Any indexes you have on the database (especially clustered indexes) will slow your writes down.
Indexes speed up reading, but they slow down writing. Most databases get read more than written to, but it sounds like yours gets written to much more.

Related

Move existing tables to InnoDB from MyISAM and which one is faster?

A Database already has up to 25-30 tables and all are MyISAM. Most of these tables are related to each other meaning a lot of queries use joins on IDs and retrieve data.
One of the tables contain 7-10 Million records and it becomes slow if i want to perform a search or update or even retrieval of all data. Now i proposed a solution to my boss saying that converting tables into InnoDB might give better performance.
I also explained the benefits of InnoDB:
Since we anyways join multiple tables on keys and they are related, it will be better to use foreign keys and have relational database which will avoid Orphan Rows. I found around 10-15k orphan rows in one of the big tables and had to manually remove them.
Support for transactions, we perform big updates from time to time and if one of them fails on the way we have to replace the entire table with the backed-up one and run the update again to make sure that all queries were executed. With InnoDB we can revert back any changes from query 1 if query 2 fails.
Now the response i got from my boss is that I need to prove that InnoDB will run faster than MyISAM. My question is, wont above 2 things improve the speed of the application itself by eliminating orphan rows?
In general is MyISAM faster than InnoDB?
Note: using MySQL 5.5
You should also mention to your boss probably the biggest benefit you get from InnoDB for large tables with both read/write load - You get row-level locking rather than table-level locking. This can be a great performance benefit for the application in cases where you see a lot of waits for table locks to be released.
Of course the best way to convince your boss is to prove it. Make copies of your large table and place on a testing database. Make one version of data in MyISAM and one in InnoDB. Then run load testing against it with a load mix that approximates your current DB read/write activity. Find out for yourself if it is better.
Just updated for your comment that you are on 5.5. With 5.5 it is a no brainer to use InnoDB. MyISAM engine basically has seen no improvement over the last several years and development effort has been around InnoDB. InnoDB is THE MySQL engine of choice going forward.

MySQL partitioning for table with huge inserts and deletes

I am having a table in which we have some 20 million entries inserted(blind insertion without any constraints) per day. We have two foreign keys and one of it is a reference id to a table with some 10 million entries.
I am planning to delete all the data in this table older than a month, because this data is not needed anymore. But the problem is that with the huge number of insertions happening, if i start deleting, the table will be locked and insertions will be blocked.
I wanted to know if we can use partitioning on the table based on month. This way, i was hoping that when i try deleting all the data older than 2 months, this data should be in a different partition and insertions should be happening in a different partition, and the delete lock will not be blocking the read lock.
Please tell me if this is possible. I am fairly new to using DB, so please let me know if there is something wrong with my thought.
From the MySQL documentation
For InnoDB and BDB tables, MySQL uses table locking only if you
explicitly lock the table with LOCK TABLES. For these storage engines,
avoid using LOCK TABLES at all, because InnoDB uses automatic
row-level locking and BDB uses page-level locking to ensure
transaction isolation.
I'm not sure you even have an issue. Have you tested this and seen locking issues, or are you just theorizing about them right now?
MySQL has partitioning as of version 5.1.
You can run this query to verify if your version of MySQL supports partitioning:
SHOW VARIABLES LIKE 'have_partitioning';
Then you can read the manual to learn how to use it:
http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

MySQL locking processing large LOAD DATA INFILE when live SELECT queries still needed

Looking for some help and advice please from Super Guru MySQL/PHP pros who can spare a moment of their time.
I have a web application in PHP/MySQL which has grown over the years and gets alot of searches on it. Its hitting bottlenecks now when the various daily data dumps of new rows get processed using MySQL LOAD DATA INFILE.
Its a large MyISAM table with about 1.5 million rows and all the SELECT queries occur on it. When these take place during the LOAD DATA INFILE of about 600k rows (and deletion of out dated data) they just get backed up and take about 30+ minutes to be freed up making any of those searches fruitless.
I need to come up with a way to get that table updated while retaining the ability to provide SELECT results in a reasonable timeframe.
Im completely out of ideas and have not been able to come up with a solution myself as its the first time ive encountered this sort of issue.
Any helpful advice, solutions or pointers from similar past experiences would be greatly appreciated as I would love to learn to resolve this sort of problem.
Many thanks everyone for your time! J
You can use the CONCURRENT keywords for LOAD DATA INFILE. This way, when you load the data, the table is still able to server SELECTs.
Concerning the delete, this is more complicated. I would personally add a column called 'status' INT(1), who will define if the line is active or not( = deleted), and then partition my table with a rule based on this column status.
This way, it will be easier to delete all rows where status=0 :P I haven;t tested this last solution, I may do that in a near future.
The CONCURRENT keywords will work if your table is optimized. If there is any FREE_SPACE, then the LOAD DATA INFILE will lock the table.
MyISAM doesn't support row-level locking, so operations like mysqldump are forced to lock the entire table to guarantee a consistent dump. Your only practical options are to switch to another table like (like InnoDB) that supports row-level locking, and/or split your dump up into smaller pieces. The small dumps will still lock the table while they're dumping/reloading, but the lock periods would be shorter.
A hairier option would be to have "live" and "backup" tables. Do the dump/load operations on the backup table. When they're copmlete, swap it out for the live table (rename tables, or have your code dynamically change which table they're using).. If you can live with a short window of potential stale data, this could be a better option.
You should switch your table storage engine from MyISAM to InnoDB. InnoDB provides row-locking (as opposed to MyISAM's table-locking) meaning while one query is busy updating or inserting a row, another query can update a different row at the same time.

How to improve MySQL INSERT and UPDATE performance?

Performance of INSERT and UPDATE statements in our database seems to be degrading and causing poor performance in our web app.
Tables are InnoDB and the application uses transactions. Are there any easy tweaks that I can make to speed things up?
I think we might be seeing some locking issues, how can I find out?
You could change the settings to speed InnoDB inserts up.
And even more ways to speed up InnoDB
...and one more optimization article
INSERT and UPDATE get progressively slower when the number of rows increases on a table with an index. Innodb tables are even slower than MyISAM tables for inserts and the delayed key write option is not available.
The most effective way to speed things up would be to save the data first into a flat file and then do LOAD DATA , this is about 20x faster.
The second option would be create a temporary in memory table, load the data into it and then do a INSERT INTO SELECT in batches. That is once you have about 100 rows in your temp table, load them into the permanent one.
Additionally you can get an small improvement in speed by moving the Index file into a separate physical hard drive from the one where the data file is stored. Also try to move any bin logs into a different device. Same applies for the temporary file location.
I would try setting your tables to delay index updates.
ALTER TABLE {name} delay_key_write='1'
If you are not using indexes, they can help improve performance of update queries.
I would not look at locking/blocking unless the number of concurrent users have been increasing over time.
If the performance gradually degraded over time I would look at the query plans with the EXPLAIN statement.
It would be helpful to have the results of these from the development or initial production environment, for comparison purposes.
Dropping or adding an index may be needed,
or some other maintenance action specified in other posts.

Generating a massive 150M-row MySQL table

I have a C program that mines a huge data source (20GB of raw text) and generates loads of INSERTs to execute on simple blank table (4 integer columns with 1 primary key). Setup as a MEMORY table, the entire task completes in 8 hours. After finishing, about 150 million rows exist in the table. Eight hours is a completely-decent number for me. This is a one-time deal.
The problem comes when trying to convert the MEMORY table back into MyISAM so that (A) I'll have the memory freed up for other processes and (B) the data won't be killed when I restart the computer.
ALTER TABLE memtable ENGINE = MyISAM
I've let this ALTER TABLE query run for over two days now, and it's not done. I've now killed it.
If I create the table initially as MyISAM, the write speed seems terribly poor (especially due to the fact that the query requires the use of the ON DUPLICATE KEY UPDATE technique). I can't temporarily turn off the keys. The table would become over 1000 times larger if I were to and then I'd have to reprocess the keys and essentially run a GROUP BY on 150,000,000,000 rows. Umm, no.
One of the key constraints to realize: The INSERT query UPDATEs records if the primary key (a hash) exists in the table already.
At the very beginning of an attempt at strictly using MyISAM, I'm getting a rough speed of 1,250 rows per second. Once the index grows, I imagine this rate will tank even more.
I have 16GB of memory installed in the machine. What's the best way to generate a massive table that ultimately ends up as an on-disk, indexed MyISAM table?
Clarification: There are many, many UPDATEs going on from the query (INSERT ... ON DUPLICATE KEY UPDATE val=val+whatever). This isn't, by any means, a raw dump problem. My reasoning for trying a MEMORY table in the first place was for speeding-up all the index lookups and table-changes that occur for every INSERT.
If you intend to make it a MyISAM table, why are you creating it in memory in the first place? If it's only for speed, I think the conversion to a MyISAM table is going to negate any speed improvement you get by creating it in memory to start with.
You say inserting directly into an "on disk" table is too slow (though I'm not sure how you're deciding it is when your current method is taking days), you may be able to turn off or remove the uniqueness constraints and then use a DELETE query later to re-establish uniqueness, then re-enable/add the constraints. I have used this technique when importing into an INNODB table in the past, and found even with the later delete it was overall much faster.
Another option might be to create a CSV file instead of the INSERT statements, and either load it into the table using LOAD DATA INFILE (I believe that is faster then the inserts, but I can't find a reference at present) or by using it directly via the CSV storage engine, depending on your needs.
Sorry to keep throwing comments at you (last one, probably).
I just found this article which provides an example of a converting a large table from MyISAM to InnoDB, while this isn't what you are doing, he uses an intermediate Memory table and describes going from memory to InnoDB in an efficient way - Ordering the table in memory the way that InnoDB expects it to be ordered in the end. If you aren't tied to MyISAM it might be worth a look since you already have a "correct" memory table built.
I don't use mysql but use SQL server and this is the process I use to handle a file of similar size. First I dump the file into a staging table that has no constraints. Then I identify and delete the dups from the staging table. Then I search for existing records that might match and put the idfield into a column in the staging table. Then I update where the id field column is not null and insert where it is null. One of the reasons I do all the work of getting rid of the dups in the staging table is that it means less impact on the prod table when I run it and thus it is faster in the end. My whole process runs in less than an hour (and actually does much more than I describe as I also have to denormalize and clean the data) and affects production tables for less than 15 minutes of that time. I don't have to wrorry about adjusting any constraints or dropping indexes or any of that since I do most of my processing before I hit the prod table.
Consider if a simliar process might work better for you. Also could you use some sort of bulk import to get the raw data into the staging table (I pull the 22 gig file I have into staging in around 16 minutes) instead of working row-by-row?