I am trying to process a huge CSV file (650 million rows) row by row using NodeJS and then inserting the rows in a MySQL database. I keep in memory sets of 5000 rows and then I perform a multiple INSERT on the database (INSERT INTO tableName (field1,field2) VALUES ('a','a'),('b','b')...)
When launching the script it works pretty well (5000 rows are inserted each 1.5 seconds approximately). However, when 6 million rows have been processed and inserted, the INSERTs start to take about 8-9 seconds, slowing down the process.
What could be happening? Is the MySQL server acting as bottleneck? Any ideas?
Thanks in advance.
UPDATE: This is the CREATE TABLE:
sips2consumo_20200826 | CREATE TABLE `sips2consumo_20200826` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`field1` varchar(22) NOT NULL,
`field2` varchar(10) NOT NULL,
`field3` varchar(10) NOT NULL,
`field4` varchar(3) NOT NULL,
`field5` int(11) DEFAULT NULL,
`field6` int(11) DEFAULT NULL,
`field7` int(11) DEFAULT NULL,
`field8` int(11) DEFAULT NULL,
`field9` int(11) DEFAULT NULL,
`field10` int(11) DEFAULT NULL,
`field11` int(11) DEFAULT NULL,
`field12` int(11) DEFAULT NULL,
`field13` int(11) DEFAULT NULL,
`field14` int(11) DEFAULT NULL,
`field15` int(11) DEFAULT NULL,
`field16` int(11) DEFAULT NULL,
`field17` int(11) DEFAULT NULL,
`field18` int(11) DEFAULT NULL,
`field19` int(11) DEFAULT NULL,
`field20` int(11) DEFAULT NULL,
`field21` int(11) DEFAULT NULL,
`field22` int(11) DEFAULT NULL,
`field23` varchar(1) DEFAULT NULL,
`field24` varchar(2) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_consumption` (`field1`,`field2`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4
Depends on how you are using mysql, most probably you are creating connections every time, you need to create connection once and insert data in batches as you are doing with 5000 rows a time and when completed close the connection,
Also it is good practice to keep your counter (last row inserted) persistent somewhere so in case of failure you don't start from 0th row, instead continue from the last inserted row before failure.
Related
I'm getting very slow response running a very simple query in a small table (115k records)...
It takes about 8sec to respond, and I can't figure out why it's taking that long. Any advice would be awesome
Table:
CREATE TABLE `financeiro_fluxo` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`branch` int(10) unsigned NOT NULL,
`abertura` int(10) DEFAULT NULL,
`origem` int(10) unsigned DEFAULT NULL,
`status_pagamento` tinyint(3) unsigned DEFAULT NULL,
`conta` int(10) unsigned NOT NULL,
`tipo_lancamento` tinyint(3) unsigned NOT NULL,
`categoria` int(10) unsigned NOT NULL,
`tipo_entidade` varchar(32) COLLATE utf8_unicode_ci NOT NULL,
`entidade` int(10) unsigned DEFAULT NULL,
`entidade_input` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`tipo_pagamento` tinyint(3) unsigned NOT NULL,
`parcela` smallint(5) unsigned NOT NULL,
`parcelas` smallint(5) unsigned NOT NULL,
`valor` decimal(12,2) NOT NULL,
`valor_taxa` decimal(12,2) DEFAULT NULL,
`valor_troco` decimal(12,2) DEFAULT NULL,
`confirmado` tinyint(3) unsigned DEFAULT NULL,
`data_confirmacao` datetime DEFAULT NULL,
`vencimento` date NOT NULL,
`info` varchar(510) COLLATE utf8_unicode_ci DEFAULT NULL,
`bandeira` int(10) unsigned DEFAULT NULL,
`user_add` int(10) unsigned NOT NULL,
`user_last` int(10) unsigned NOT NULL,
`param_ref` varchar(32) COLLATE utf8_unicode_ci DEFAULT NULL,
`param` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`file` int(10) unsigned DEFAULT NULL,
`date_created` datetime NOT NULL,
`date_modified` datetime NOT NULL,
`status` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=116749 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Query:
SELECT * from financeiro_fluxo
Explain:
id select_type table type key key_len rows
1 SIMPLE financeiro_fluxo ALL 116244
The same query running on localhost with the same table, returns in less than a sec...
Profile:
Seems you are doing a full table scan because your query does not include any limiting conditions (for example WHERE clause or LIMIT). To let the query preform better use indexed columns with some kind of criteria. What happens if you add WHERE id IS NOT NULL
I assume you need all the records, if not limit the result set by added conditions in a more specific WHERE clause (on a indexed column) or a LIMIT clause.
Will the "reports" aggregate data? Of so, you could speed up the 8 second (remote) query by doing more work in the server, thereby shipping less data across the wire.
That is, think about whether AVG(..), COUNT(*), SUM(..), MAX(..), etc can be done in the SELECT.
Taking that another step... Build and maintain a "Summary table" that has subtotals (etc). Then, reading (or scanning) the summary table and summing up the subtotals, etc, will be even faster, both in the Server and across the wire.
(And I agree with the need to avoid *, and that the 8 seconds is probably due to network delay (and "bandwidth"). Where is the server geographically? How long does SELECT 1; take?)
Is there a way I can speed this up? Right now it's taking an unbelievably insane amount of time to query.
SELECT trades.*, trader1.user_name as trader1_name,
trader2.user_name as trader2_name FROM trades
LEFT JOIN logs_players trader1 ON trader1.user_id = trader1_account_id
LEFT JOIN logs_players trader2 ON trader2.user_id = trader2_account_id
ORDER BY time_added
LIMIT 20 OFFSET 0;
I've done as much as I could in terms of searching online for a solution. Or even just trying to get some more information why it's taking so long to execute.
The query takes about 45 seconds or so to complete.
Create statements:
CREATE TABLE `trades` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`trader1_account_id` int(11) DEFAULT NULL,
`trader2_account_id` int(11) DEFAULT NULL,
`trader1_value` bigint(20) DEFAULT NULL,
`trader2_value` bigint(20) DEFAULT NULL,
`trader1_ip` varchar(16) DEFAULT NULL,
`trader2_ip` varchar(16) DEFAULT NULL,
`world` int(11) DEFAULT NULL,
`x` int(11) DEFAULT NULL,
`z` int(11) DEFAULT NULL,
`level` int(11) DEFAULT NULL,
`trader1_user` varchar(12) DEFAULT NULL,
`trader2_user` varchar(12) DEFAULT NULL,
`time_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8
CREATE TABLE `logs_players` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`user_name` varchar(20) DEFAULT NULL,
`world_stage` varchar(20) DEFAULT NULL,
`world_type` varchar(20) DEFAULT NULL,
`bank` longtext,
`inventory` longtext,
`equipment` longtext,
`total_wealth` mediumtext,
`total_play_time` mediumtext,
`rights` int(11) DEFAULT NULL,
`icon` int(11) DEFAULT NULL,
`ironmode` int(11) DEFAULT NULL,
`x` int(11) DEFAULT NULL,
`z` int(11) DEFAULT NULL,
`level` int(11) DEFAULT NULL,
`last_ip` varchar(16) DEFAULT NULL,
`last_online` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`muted_until` timestamp NULL DEFAULT NULL,
`banned_until` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8
I filled a sample database with 10k rows each, and found that a few indexes were what you needed:
ALTER TABLE `logs_players` ADD INDEX(`user_id`);
ALTER TABLE `trades` ADD INDEX(`time_added`);
The main index we need is an index on user_id. Changing that we went from a query time of 20.1390 seconds, to 0.0130 seconds:
We can even get that down further, by adding an index on time_added to make sorting a lot faster, now we ended up with an impressive query time:
Do some research on indexes! A simple EXPLAIN query would show you that you're using filesort (Which is rather bad!):
After indexes, this looks a lot better:
I'm trying to run the following command in the command line terminal of MySql, on my backup server:
Table Schema :
CREATE TABLE `cadveiculoequip` (
`idveiculoequip` VARCHAR(36) NOT NULL,
`idveiculo` VARCHAR(36) DEFAULT NULL,
`idequipamento` VARCHAR(36) DEFAULT NULL,
`datainstalacao` DATE DEFAULT NULL,
`datadesinstalacao` DATE DEFAULT NULL,
`idchkheader` VARCHAR(36) DEFAULT NULL,
`ID` INT(11) DEFAULT NULL,
`lastupdate` TIMESTAMP NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
`primario` INT(1) NOT NULL DEFAULT '1',
`mapeia_sombra` INT(1) DEFAULT '0',
`tec_responsavel` VARCHAR(100) DEFAULT NULL,
`km_rodado` VARCHAR(100) DEFAULT NULL,
`situacao` INT(2) DEFAULT NULL,
`teclado` VARCHAR(36) DEFAULT NULL,
`rfid` VARCHAR(36) DEFAULT NULL,
`idsatelite` VARCHAR(36) DEFAULT NULL,
`telemetria` VARCHAR(10) DEFAULT NULL,
`idosdes` VARCHAR(36) DEFAULT NULL,
PRIMARY KEY (`idveiculoequip`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 AVG_ROW_LENGTH=129;
Query is :
INSERT INTO `cadveiculoequip` (`idveiculoequip`,`idveiculo`,`idequipamento`,`datainstalacao`,`datadesinstalacao`,`idchkheader`,`ID`,`lastupdate`,`primario`,`mapeia_sombra`,`tec_responsavel`,`km_rodado`,`situacao`,`teclado`,`rfid`,`idsatelite`,`telemetria`,`idosdes`)
VALUES ('IGORM','6788103A-8109-430C-A131-9FBBAF6D01F3','8abef011-c107-11df-b983-68b5558ab3e2','2018-02-21',NULL,'2CA923B6-60E9-42CF-BCFB-4BA238263079',4257329,'2018-02-21 08:44:51',1,0,'IGORM','0',1,NULL,NULL,NULL,'',NULL);
After execute the command, i just receive the message "0 rows affected".
Okay, so what's the problem..? I have executed the exactly same command in my development server (identical to backup server) the command has executed successfully, besides has returned "1 rows affected".
I discard the idea of being a MySQL Error 1064.. Does anyone know what it could be?
(Sorry about the bad english, I'm kind of rusty... hehehe)
Edit:
Table 'cadveiculoequip'
This surely seems like poor design of the college_major table.
CREATE TABLE `college_majors` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date_time` datetime DEFAULT NULL,
`UNITID` varchar(255) DEFAULT NULL,
`CIPCODE` varchar(255) DEFAULT NULL,
`AWLEVEL` varchar(255) DEFAULT NULL,
`CTOTALT` varchar(255) DEFAULT NULL,
`CTOTALM` varchar(255) DEFAULT NULL,
`CTOTALW` varchar(255) DEFAULT NULL,
`CAIANT` varchar(255) DEFAULT NULL,
`CASIAT` varchar(255) DEFAULT NULL,
`CBKAAT` varchar(255) DEFAULT NULL,
`CHISPT` varchar(255) DEFAULT NULL,
`CNHPIT` varchar(255) DEFAULT NULL,
`CWHITT` varchar(255) DEFAULT NULL,
`C2MORT` varchar(255) DEFAULT NULL,
`CUNKNT` varchar(255) DEFAULT NULL,
`CNRALT` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=270167 DEFAULT CHARSET=utf8;
I can reduce this table to three columns - id, CIPCODE and UNITID. But the problem is even simple queries like *select * FROM college_majors* is taking too long to execute and sometime not even executing.
I increased the query execution to 6000.00 sec, but still the query won't run.
Any suggestion on how to improve the design, create a new table and insert the data from this table (college_majors).
Thanks,
A
If field 'codevalue' in college_majors_mapping is unique you can indexed it and increase join performance.
I've a table with around 6-7lacs records and it's going to grow as time passes.It has around 16-20 columns in it. There are no one-many relationship to any of these columns.
User data entries are stored in these table.
So would it be feasible to split my table into multiple small tables or else just split the table into 2 halfs one with all the entries in it and other the recently fresh records which would be present to the data entry operators to feed in their entries.
In short my question is whether the mysql execution time would be faster if I split the tables, or would it be faster if I split them into two half's.
I guess the latter would be more feasible since it would not perform any join queries.
Updated:
CREATE TABLE `images` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`primary_category_id` int(10) unsigned DEFAULT NULL,
`secondary_category_id` int(10) unsigned DEFAULT NULL,
`front_url` varchar(255) DEFAULT NULL,
`back_url` varchar(255) DEFAULT NULL,
`title` varchar(100) DEFAULT NULL,
`part` varchar(10) DEFAULT NULL,
`photo_id` int(10) unsigned DEFAULT NULL,
`photo_dt_month` varchar(2) DEFAULT NULL,
`photo_dt_day` varchar(2) DEFAULT NULL,
`photo_dt_yr` varchar(4) DEFAULT NULL,
`type` varchar(25) DEFAULT NULL,
`size_width` int(10) unsigned DEFAULT NULL,
`size_height` int(10) unsigned DEFAULT NULL,
`dpi` int(10) unsigned NOT NULL DEFAULT '0',
`dpix` int(10) unsigned DEFAULT NULL,
`dpiy` int(10) unsigned DEFAULT NULL,
`in_stock` varchar(50) DEFAULT NULL,
`outlet` varchar(50) DEFAULT NULL,
`source` varchar(50) DEFAULT NULL,
`keywords` varchar(255) DEFAULT NULL,
`emotional_keywords` varchar(255) DEFAULT NULL,
`mechanical_keywords` varchar(255) DEFAULT NULL,
`description` text,
`notes` text,
`comments` text,
`exported_to_ebay_dt` datetime DEFAULT NULL,
`exported_to_ebay` set('Y','N') NOT NULL DEFAULT 'N',
`updated_worker_id` int(10) unsigned DEFAULT NULL,
`updated_worker_dt` datetime DEFAULT NULL,
`locked_worker_id` int(10) unsigned DEFAULT NULL,
`locked_worker_dt` datetime DEFAULT NULL,
`updated_admin_id` int(10) unsigned DEFAULT NULL,
`updated_admin_dt` datetime DEFAULT NULL,
`added_dt` datetime DEFAULT NULL,
`updated_manager_id` int(10) unsigned DEFAULT NULL,
`updated_manager_dt` datetime DEFAULT NULL,
`manager_review` set('Y','N') NOT NULL DEFAULT 'N',
`paid_status` set('Y','N') NOT NULL DEFAULT 'N',
`exported_to_web_dt` datetime DEFAULT NULL,
`exported_to_web` set('Y','N') DEFAULT 'N',
`prefix` varchar(50) DEFAULT NULL,
`is_premium` set('Y','N') DEFAULT 'N',
`template` varchar(50) DEFAULT 'HIPE_default',
`photographer` varchar(100) DEFAULT NULL,
`copyright` varchar(100) DEFAULT NULL,
`priority` int(4) DEFAULT '1',
`step` set('1','2') DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `part` (`part`),
KEY `primary_category_id` (`primary_category_id`),
KEY `updated_worker_id` (`updated_worker_id`),
KEY `updated_worker_dt` (`updated_worker_dt`)
) ENGINE=MyISAM AUTO_INCREMENT=1013687 DEFAULT CHARSET=latin1
The above is my table structure.After there are entries being made say around 1lac I would split it into another table say images_history with same structure.Is this feasible or should I split them into multiple tables to reduce the query execution time
Why do you want to split the table? It would lead to a ton of extra code and slow down the execution time by adding extra queries if you still want to access both of the new tables. (If one of the tables are going to store rarely used previous versions of records of the images table - i.e. version control - it may still be a good idea).
Before even thinking about splitting the table, see if you can increase performance by optimizing the existing code by making sure none of the following performance disasters are:
Do all SELECTs filter by PRIMARY KEY?
Is the index cache large enough to hold all indices in the computers RAM?
Are any string matching SELECTs with LIKE using the indices? I.e. only exact matches or wildcards on the right, never on the left (e.g. "searchword%" and never "%searchword"
Are there any slow performing queries that use SELECT * instead of selecting only the columns you need?
Have you avoided using OR in SELECTs?
Performing queries on a table with 700 000 records shouldn't be slow if tabels are properly indexed and queries are actually using those indices.