MySQL LOAD DATA INFILE Taking 13 Hours - mysql

Is there anything I can change in the my.ini file to speed up "LOAD DATA INFILE"?
I have two MySQL 5.5 instances each of which has one identical table structured as follows:
CREATE TABLE `log_access` (
`_id` bigint(20) NOT NULL AUTO_INCREMENT,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type_id` int(11) NOT NULL,
`building_id` int(11) NOT NULL,
`card_id` varchar(15) NOT NULL,
`user_key` varchar(35) DEFAULT NULL,
`user_name` varchar(25) DEFAULT NULL,
`user_validation` varchar(10) DEFAULT NULL,
PRIMARY KEY (`_id`),
KEY `log_access__user_key_timestamp` (`user_key`,`timestamp`)
KEY `log_access__timestamp` (`timestamp`)
) ENGINE=MyISAM
On a daily basis I need to move the data from previous day from instance A to instance B, which consists of roughly 25 million records. At the moment I am doing the following:
On instance A, generate an OUTFILE with "WHERE timestamp BETWEEN
'2014-09-23 00:00:00' AND '2014-09-23 23:59:59'. This usually takes
less than 2 minutes.
On instance B, execute "LOAD DATA INFILE". This is the problem area
as it takes about 13 hours.
On instance A, delete records from the previous day. This will probably be another
On instance B, run stats On instance B, truncate the table
I have also considered partitioning the tables and just exchanging the partitions. EXCHANGE PARTITION is supported as of 5.6 and I am willing to update MySQL, however, all documentation discusses exchanging between tables and I haven't been able to confirm that I would be able to do that between DB instances.
Replication between the instances, but as I have not tinkered with replication in the past and this is a time sensitive assignment I am somewhat reluctant to tread into new waters.
Any words of wisdom much appreciated.

CREATE the table without PRIMARY KEY and _id column and add these after LOAD DATA INFILE is complete. MySQL checks the PRIMARY KEY integrity with each INSERT, so I think you can gain a lot of performance here. With MariaDB you can disable keys, but I think this won't work on some storage engines (see here)
Not-very-nice-alternative:
I found it very easy to move a MYISAM-database by just copy/move the files on disk. If you cut/paste the files and run a REPAIR TABLE. on your target machine you can do this without restarting the Server. Just make sure you copy all 3 files (.frm, .myd, .myi)

LOAD DATA INFILE in perfect PK-order, INTO a table that only has the PK-definition, so no secondary indexes yet. After import, add all secondary indexes at once, with 'ALTER TABLE mytable ALGORITHM=INPLACE, LOCK=NONE, ADD KEY ...'.
Consider adding back the secondary indexes on each involved box separately, so not via replication (sql_log_bin=0), to prevent replication lag.
Consider using a partitioned table, as then you can run a 'LOAD DATA INFILE' per partition, in parallel. (applies to RANGE and HASH partitioning, as the separate tsv-files (one or more per partition) are easy to prepare for those)
MariaDB doesn't have the variant 'INTO mytable PARTITION (p000)' yet.
You can load into a separate table first, and then exchange partitions, but MariaDB also doesn't have 'WITHOUT VALIDATION' yet.

Related

How to improve performance of Bulk Inserts in MYSQL

env: windows 10
version mysql 5.7
Ram 32GB
ide : toad mysql
i have sufficient hardware requirement but issue is the performance of insert into simple table that does not have any relation ships. i need to have index on the table.
table structure
CREATE TABLE `2017` (
`MOB_NO` bigint(20) DEFAULT NULL,
`CAF_SLNO` varchar(50) DEFAULT NULL,
`CNAME` varchar(58) DEFAULT NULL,
`ACT_DATE` varchar(200) DEFAULT NULL,
KEY `2017_index` (`MOB_NO`,`ACT_DATE`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I am using above for inserting the records into table. with out index it took around 30 min where as with indexing it took 22 hrs still going on.
SET autocommit=0;
SET unique_checks=0;
SET foreign_key_checks=0;
LOAD DATA LOCAL INFILE 'D:/base/test/2017/2017.txt'
INTO TABLE 2017COLUMNS TERMINATED BY '|';
commit;
i have seen suggestion to change cnf file, Could not find any in my machine.
By adding following lines in my.ini. I am able to achieve it.
innodb_autoinc_lock_mode =2
sync_binlog=1
bulk_insert_buffer_size=512M
key_buffer_size=512M
read_buffer = 50M
and innodb_flush_log_at_trx_commit=2, i have seen in another link where it said that it increase speed to 160x.
Output performance :more than 24hr to 2 hrs
If you begin with an empty table, create it without any indexes. Then, after fully populating the table, adding an index is reported to be faster than inserting with the index already in place.
See:
MySQL optimizing INSERT speed being slowed down because of indices
Is it better to create an index before filling a table with data, or after the data is in place?
Possibly helpful: Create an index on a huge MySQL production table without table locking

Store huge json data of indefinite size with key

I need to store the geo path data comprising of geo-points which should be indexed by unique key. For example: The path traveled by vehicle indexed by its trip id. This path can be of indefinite length.
As of now, I am thinking to store the path in the form of JSON object. The options that I have in my mind are Riak and MongoDB. I want to go with open-source technology. It will be nice if it supports clustering. In case one node goes down, we won't have any downtime in our application.
MySQL is currently our source of raw data (which we will be anyhow moving to the NoSQL DB but not as of now). But with the huge amount of data (2 million geo-point entries per day), it takes MYSQL a lot of time to filter the data based on timestamp. MySQL will still be our primary data source. The solution I am looking for will act as a cache for faster path retrieval based on id.
In current MySQL schema, the fields I have are:
system_timestamp,
gps_timestamp,
speed,
lat,
lot
This table store all the geo-points of the vehicle whether vehicle is on trip or not. Here trip is based on whether driver wants to track the movement or not. If he want to track the movement, we generate a unique trip id and associate it to the driver along with the trip's start time and the end time. Later for displaying the path based on trip id, we use the start & end time of the trip to filter the data from the raw table.
I want to store the trip path into secondary database as a cache so that it's retrieval will be fast.
Which database should be my ideal choice? What other options do I have?
I'm going to go out on a limb here and say that I believe there is a less complicated way of fixing your performance issue.
I assume you are using MySQL with InnoDB and you are indexing the timestamp field(s).
If I were you, I would simply turn the relevant timestamp (system or gps) into the primary key. With InnoDB, the table data is physically organized to do ultra-fast lookups based on the primary key column(s). Also, make sure that the relevant timestamp column is of the unsigned non-null type.
Now, instead of doing a lookup for the paths in between start and end time (as you're currently doing), I would create a separate table within the same MySQL database containing pairs of trip ID/path timestamp, where "path timestamp" is the primary key from the paths table, as mentioned earlier. Primary index the trip ID. Populate this table using the same logic/mechanism you initially imagined for Riak or MongoDB. This will basically be your "caching" system, using nothing but MySQL.
A typical lookup would take the trip ID to find all of the path timestamps associated and thus all of the path data.
CREATE TABLE IF NOT EXISTS `paths` (
`system_timestamp` int(10) unsigned NOT NULL,
`gps_timestamp` int(10) NOT NULL,
`speed` smallint(8) unsigned NOT NULL,
`lat` decimal(10,6) NOT NULL,
`lng` decimal(10,6) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `paths` ADD PRIMARY KEY (`system_timestamp`);
CREATE TABLE IF NOT EXISTS `trips` (
`trip_id` int(10) unsigned NOT NULL,
`system_timestamp` int(10) unsigned NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `trips` ADD PRIMARY KEY (`trip_id`);
SELECT * FROM `trips`
INNER JOIN `paths` ON
`trips`.`system_timestamp` = `paths`.`system_timestamp`
WHERE `trip_id` = 1;

How to improve INSERT performance on a very large MySQL table

I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:
(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )
CREATE TABLE IF NOT EXISTS items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id, name),
UNIQUE KEY name_key_unique_key (name, key),
INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;
Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:
CREATE TEMPORARY TABLE temp_items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id)
)
ENGINE=MyISAM;
LOAD DATA LOCAL INFILE 'file_to_process.csv'
INTO TABLE temp_items
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
(name, key, created_at, updated_at);
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();
DROP TEMPORARY TABLE temp_items;
The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem.
I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.
How can I improve the performance of my script?
Thanks to all in advance.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
Reference: MySQL.com: 8.2.4.1 Optimizing INSERT Statements
Your linear key on name and the large indexes slows things down.
LINEAR KEY needs to be calculated every insert.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html
can you show us some example data of file_to_process.csv maybe a better schema should be build.
Edit looked more closely
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.
There is a piece of documentation I would like to point out, Speed of INSERT Statements.
By thinking in java ;
Divide the object list into the partitions and generate batch insert statement for each partition.
Utilize CPU cores and available db connections efficiently, nice new java features can help to achieve parallelism easily(e.g.paralel, forkjoin) or you can create your custom thread pool optimized with number of CPU cores you have and feed your threads from centralized blocking queue in order to invoke batch insert prepared statements.
Decrease the number of indexes on the target table if possible. If foreign key is not really needed, just drop it. Less indexes faster inserts.
Avoid using Hibernate except CRUD operations, always write SQL for complex selects.
Decrease number of joins in your query, instead forcing the DB, use java streams for filtering, aggregating and transformation.
If you feel that you do not have to do, do not combine select and inserts as one sql statement
Add rewriteBatchedStatements=true to your JDBC string, it will help to decrease TCP level communication between app and DB.
Use #Transactional for the methods that carry out insert batch and write rollback methods yourself.
You could use
load data local infile ''
REPLACE
into table
etc...
The REPLACE ensure that any duplicate value is overwritten with the new values.
Add a SET updated_at=now() at the end and you're done.
There is no need for the temporary table.

Simple MySQL UPDATE query - very low performance

A simple mysql update query is very slow sometimes. Here is the query:
update produse
set vizite = '135'
where id = '71238'
My simplified table structure is:
CREATE TABLE IF NOT EXISTS `produse`
(
`id` int(9) NOT NULL auto_increment,
`nume` varchar(255) NOT NULL,
`vizite` int(9) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `vizite` (`vizite`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=945179 ;
I use MySQL 5.0.77 and the table is MyISAM.
The table is about 752.6 MO and has 642,442 for the moment.
The database runs on a dedicated VPS that has 3Gb of RAM and 4 processors of 2G each. There are no more than 6-7 queries of that type per second when we have high traffic, but the query is slow not only then.
First, try rebuilding your indexes, it might happen that query is not using them (you can see that using EXPLAIN statement with your update query).
Another possibility is that you have many selects on that table or long running selects, which causes long locks. You can try using replication and have your select queries executed on slave database, only, and updates on master, only. That way, you will avoid table locks caused by updates while you are doing selects and vice versa.

Slow MySQL InnoDB Inserts and Updates

I am using magento and having a lot of slowness on the site. There is very, very light load on the server. I have verified cpu, disk i/o, and memory is light- less than 30% of available at all times. APC caching is enabled- I am using new relic to monitor the server and the issue is very clearly insert/updates.
I have isolated the slowness to all insert and update statements. SELECT is fast. Very simple insert / updates into tables take 2-3 seconds whether run from my application or the command line mysql.
Example:
UPDATE `index_process` SET `status` = 'working', `started_at` = '2012-02-10 19:08:31' WHERE (process_id='8');
This table has 9 rows, a primary key, and 1 index on it.
The slowness occurs with all insert / updates. I have run mysqltuner and everything looks good. Also, changed innodb_flush_log_at_trx_commit to 2.
The activity on this server is very light- it's a dv box with 1 GB RAM. I have magento installs that run 100x better with 5x the load on a similar setup.
I started logging all queries over 2 seconds and it seems to be all inserts and full text searches.
Anyone have suggestions?
Here is table structure:
CREATE TABLE IF NOT EXISTS `index_process` (
`process_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`indexer_code` varchar(32) NOT NULL,
`status` enum('pending','working','require_reindex') NOT NULL DEFAULT 'pending',
`started_at` datetime DEFAULT NULL,
`ended_at` datetime DEFAULT NULL,
`mode` enum('real_time','manual') NOT NULL DEFAULT 'real_time',
PRIMARY KEY (`process_id`),
UNIQUE KEY `IDX_CODE` (`indexer_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=10 ;
First: (process_id='8') - '8' is char/varchar, not int, so mysql convert value first.
On my system, I had long times (greater than one second) to update users.last_active_time.
The reason was that I had a few queries that long to perform. As I joined them for the users table. This resulted in blocking of the table to read. Death lock by SELECT.
I rewrote query from: JOIN to: sub-queries and porblem gone.