Individiaul MySQL INSERT statements vs writing to local CSV first and then LOAD DATA - mysql

I'm trying to extract information from 50 million HTML files into a MySQL database. My question is at what point during the process should I store the information into the MySQL database. For example, I'm considering these options:
Open each file and extract the information I need. Perform an INSERT after each file gets parsed.
Open each file and extract the information I need. Store the information into a CSV file as an intermediary. After all the files have been parsed into the CSV, perform a bulk upload using LOAD DATA INFILE
I know that LOAD DATA INFILE is much faster than individual INSERT statements if I already have the information in a CSV. However, if I don't have the information already in a CSV, I don't know if it's faster to create the CSV first.
At the crux of the question: Is writing to a local CSV faster or about the same as a single INSERT statement?
I'm using PHP in case it matters. Thanks in advance!

They key is not to do one insert per entry, but batch the entries in memory then perform a batch insert.
See: https://dev.mysql.com/doc/refman/5.7/en/insert.html
INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
ORMs like SQLAlchemy or Hibernate are smart enough (depending on configuration) to automatically batch your inserts.

Related

How to run multiple update and insert queries after a LOAD DATA INFILE

We update several tables of our database thanks to CSV files (exported from our ERP)
The problem is that these tables must be able to update several other tables
Let's take a fictitious example, when I make a LOAD DATA INFILE of the all_order table (containing all the commands), it must directly:
Update the KPI table order_kpi
Insert in the order_summary table, the number of orders recorded by commercial
I wanted to put a trigger on the insert of the table all_order, but I think that it executes the trigger for each line of the CSV (more than 15 000 lines ...) and makes me plant my local server
To note: It is a PHP script which manages the LOAD DATA INFILE (because I need it to be dynamic and that it corresponds to the needs of the user) and I do not wish to put my requests of insertion and update in this PHP script
So my question is:
How to create a trigger (or other mysql function) that runs after the LOAD DATA INFILE is complete ?

Google BigQuery bulk load data into table

I think that what I want to do is not feasible at the moment but want to clarify.
I have a bucket say bucketA with files served to the public and a bucket say bucketB where access logs of bucketA are stored in a specific CSV format
What I want to do is to run SQL queries to these access logs. The problem that I have is that the logs are stored in different CSVs (one per hour I think). I tried to import them through BigQuery UI interface but it seems that there is a one to one CSV to table mapping. When you define the input location the placeholder and documentation as you to put a gs://<bucket_name>/<path_to_input_file>.
Based on the above my question is: Is it possible to upload a all files in a bucket to a single BigQuery table, with something like an "*" asterisk operator?
Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?
Based on the above my question is: Is it possible to upload a all
files in a bucket to a single BigQuery table, with something like an
"*" asterisk operator?
You can query them directly in GCS (federated source) or load then all into a native table using * in both cases:
Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?
If you leave it as en external table, then each time you query BigQuery will scan all the files, so you'll get new files/data. If you load it as a native table, then you'll need to schedule a job yourself to append each new file to your table.
Using BigQuery web UI, after I have created the new table + some initial data with the standard upload csv method.
For quick testing, how to use BigQuery web UI to insert more new data into the existing table?
I realized I CANNOT copy and paste multiple insert statements in the Query editor textbox.
INSERT INTO dataset.myschema VALUES ('new value1', 'more value1');
INSERT INTO dataset.myschema VALUES ('new value2', 'more value2');
Wow, then it will be tedious to insert new line of data 1 by 1.
Luckily BigQuery supports INSERT statements that use VALUES syntax can insert multiple rows.
INSERT INTO dataset.myschema VALUES ('new value1', 'more value1'),
('new value2', 'more value2');

How to replace a Column simultaneously with LOAD INFILE in MySQL

Suppose we have table with a DECIMAL column with values, for example: 128.98, 283.98, 21.20.
I want to import some CSV Files to this table. However, in the columns of these files, I have values like 235,69, 23,23, with comma instead of points.
I know I can REPLACE that column, but is there some way of doing that before LOAD INFILE?
I do not believe you can simultaneously replace that column and load the data. Looks like you will have to do multiple steps to get the results you want.
Load the data first into a raw table using the LOAD INFILE command. This table can be identical to the main table. You can use the Create Table like command to create the table.
Process the data (i.e. change the comma to a . where applicable) in the raw table.
select the data from the raw table and insert into main table either with row by row processing or bulk insert.
This can all be done in a stored procedure (SP) or by a 3rd party script written in python, php, etc...
If you want to know more about SP's in Mysql, Here is a useful link.

import csv file with LOAD DATA LOCAL INFILE in symfony 1.4

I need to fill several of tables with CSV files. I tried to use a loop that do insert with each row but a file with 65,000 records take me more then 20 min.
I want to use the MySQL command LOAD DATA LOCAL INFILE, but I received this message :
LOAD DATA LOCAL INFILE forbidden in C:\xampp\htdocs\myProject\apps\backend\modules\member\actions\actions.class.php on line 112
After a little research, I understand there is need to change one of the security parameters of the PDO (PDO::MYSQL_ATTR_LOCAL_INFILE) to true.
In symfony2, you need to change it at config.yml of your app, but I can't find it on symfony 1.4.
Let me try to understand the question (or questions?!).
If you need to optimize the INSERT queries you should probably batch them at a single INSERT query or a few ones, but definitely not for each row. Besides, the INSERT query in MySQL will be always slow especially for a large amount of data inserted, also depends on indexing, engine and schema structure of the DB.
About the second question, take a look here, maybe it will help.

mysqldump and --tab speed

According to the MySQL Certification Guide, when --tab option is used,
a SELECT ... INTO OUTFILE is used to generate a tab delimited file in the specified directory, a file containing SQL (CREATE TABLE) will also be generated
and that
using --tab to produce tab-delimited dump files is much faster
but how can it be faster if both generate a SQL file and the --tab one just has an extra tab?
I would say that :
without using --tab, a file is generated, that contains both :
create statements
and data, as lots of insert statements
with --tab, two files are generated :
one with create statements
one other with data, in a tab-delimited format, instead of insert statements
The difference is the second part :
inserts
vs tab-delimited
I'm guessing that creating lots of insert statements takes more time than just dumping data with a tab-delimited format -- maybe it's the same with importing the data back, too ?