mysqldump and --tab speed - mysql

According to the MySQL Certification Guide, when --tab option is used,
a SELECT ... INTO OUTFILE is used to generate a tab delimited file in the specified directory, a file containing SQL (CREATE TABLE) will also be generated
and that
using --tab to produce tab-delimited dump files is much faster
but how can it be faster if both generate a SQL file and the --tab one just has an extra tab?

I would say that :
without using --tab, a file is generated, that contains both :
create statements
and data, as lots of insert statements
with --tab, two files are generated :
one with create statements
one other with data, in a tab-delimited format, instead of insert statements
The difference is the second part :
inserts
vs tab-delimited
I'm guessing that creating lots of insert statements takes more time than just dumping data with a tab-delimited format -- maybe it's the same with importing the data back, too ?

Related

Individiaul MySQL INSERT statements vs writing to local CSV first and then LOAD DATA

I'm trying to extract information from 50 million HTML files into a MySQL database. My question is at what point during the process should I store the information into the MySQL database. For example, I'm considering these options:
Open each file and extract the information I need. Perform an INSERT after each file gets parsed.
Open each file and extract the information I need. Store the information into a CSV file as an intermediary. After all the files have been parsed into the CSV, perform a bulk upload using LOAD DATA INFILE
I know that LOAD DATA INFILE is much faster than individual INSERT statements if I already have the information in a CSV. However, if I don't have the information already in a CSV, I don't know if it's faster to create the CSV first.
At the crux of the question: Is writing to a local CSV faster or about the same as a single INSERT statement?
I'm using PHP in case it matters. Thanks in advance!
They key is not to do one insert per entry, but batch the entries in memory then perform a batch insert.
See: https://dev.mysql.com/doc/refman/5.7/en/insert.html
INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
ORMs like SQLAlchemy or Hibernate are smart enough (depending on configuration) to automatically batch your inserts.

How do I import/handle large text files for MS SQL?

I have a 1.7GB txt file (about 1.5million rows) that is apparently formatted in some way for columns and rows, though I don't know the delimiter. I will need to be able to import this data into MySQL and MS SQL databases to run queries on.
I can't even open it in notepad to see a sample of the data.
For future reference, how does one handle and manipulate very large data files? What file format is best? To my knowledge Excel and CSV do not support unlimited numbers of rows.
You can use bcp in as below
bcp yourtable in C:\Data\yourfile.txt -c -t, -S localhost -T
Hence you know the column name from mysql, you can create table with that structure before hand in sql server

Load and replace file path string with the content from that file in a MySQL database

I have a database of entries consisting of a 'name', 'id' and a 'description', but currently the 'description' field is set to the file path of a .txt file that actually contains the description content. Each .txt file's name is each row's 'id', plus the .txt extension and they all reside in the same directory.
Can I load and replace each 'description' field with the content from the relevant text file (using MySQL)?
You can't write a MySQL query directly that will read the description values from your file system. That would require the MySQL server to be able to read raw text from files in your file system. You Can't Do That™.
You certainly can write a program in your favorite host language (php, java, PERL, you name it) to read the rows from your database, and update your description rows.
You could maybe contrive to issue a LOAD DATA INFILE command to read each text file. But the text files would have to be very carefully formatted to resemble CSV or TSV files.
Purely using mysql this would be a difficult, if not impossible exercise because mysql does not really offer any means to open files.
The only way to open an external text file from mysql is to use LOAD DATA INFILE command, which imports the text file into a mysql table. What you can do is to write a stored procedure in mysql that:
Create a temporary table with a description field large enough to accommodate all descriptions.
Reads all id and description field contents into a cursor from your base table.
Loop through the cursor and use load data infile to load the given text file's data into your temporary table. This is where things can go wrong. The account under which mysql daemon / service runs needs to have access to the directories and fiels where the description files are stored. You also need to be able to parametrise the load data infile command to read the full contents of the text file into a single field, so you need to set the field and line terminated by parameters to such values that cannot be found in any of the description files. But, even for this you need to use a native UDF (user defined function) that can execute command line programs because running load data infile directly from stored procedures is not allowed.
See Using LOAD DATA INFILE with Stored Procedure Workaround-MySQL for full description how to this.
Issue an update statement using the id from the cursor to update the description field in your base table from the temporary table.
Delete the record from your temp table.
Go to 3.
It may be a lot easier to achieve this from an external programming language, that has better file manipulation functions and can update each record in your base table accordingly.

When to use CSV storage engine for MySQL?

From the docs, it states:
The CSV storage engine stores data in text files using comma-separated
values format.
What are the advantages of this? Here are some I can think of:
You can edit the CSV files using simple text editor (however, you can export data easily using SELECT INTO OUTFILE)
Can be easily imported into Spreadsheet programs
Lightweight and maybe better performance (wild guess)
What are some disadvantages?
No indexing
Cannot be partitioned
No transactions
Cannot have NULL values
Granted this (non-exhaustive) list of advantages and disadvantages, in what practical scenarios should I consider using the CSV storage engine over others?
I seldom use the CSV storage engine. One scenario I have found it useful, however, is for bulk data imports.
Create a table with columns matching my input CSV file.
Outside of mysql, just using a shell prompt, mv the CSV file into the MySQL data dictionary, overwriting the .csv file that belongs to my table I just created.
ALTER TABLE mytable ENGINE=InnoDB
VoilĂ ! One-step import of a huge CSV data file using DDL instead of INSERT or LOAD DATA.
Granted, it's less flexible than INSERT or LOAD DATA, because you can't do NULLs or custom overrides of individual columns, or any "replace" or "ignore" features for handling duplicate values. But if you have an input file that is exactly what you want to import, it could make the import very easy.
This is a tad bit hacky, but as of MySQL 8, assuming you know the data structure beforehand and have permissions in the CSV-based schema directory, you can create the table definition in MySQL and then overwrite the generated CSV table file in the data directory with a symlink to the data file:
mysql --execute="CREATE TABLE TEST.CSV_TEST ( test_col VARCHAR(255) ) ENGINE=CSV;"
ln -sf /path/to/data.file /var/lib/mysql/TEST/CSV_TEST.CSV
An advantage here is that this completely obviates the need to run import operations (via LOAD DATA INFILE, etc.), as it allows MySQL to read directly from the symlinked file as if it were the table file.
Drawbacks beyond those inherent to the CSV engine:
table will contain header row if there is one (you'd need to filter it out from read operations)
table metadata in INFORMATION_SCHEMA will not update using this method, will just show the CREATE_TIME for which the initial DDL is run
Note this method is obviously more geared toward READ operations, though update/insert operations could be conducted on the command line using SELECT ... INTO OUTFILE and then copying onto/appending the source file.

Mysql select from table left join with csv export

I have tables that are on different mysql instances. I want to export some data as csv from a mysql instance, and perform a left join on a table with the exported csv data. How can I achieve this?
Quite surprisingly that is possible with MySQL, there are several steps that you need to go through.
First create a template table using CSV engine and desired table layout. This is the table into which you will import your CSV file. Use CREATE TABLE yourcsvtable (field1 INT NOT NULL, field2 INT NOT NULL) ENGINE=CSV for example. Please note that NULL values are not supported by CSV engine.
Perform you SELECT to extract the CSV file. E.g. SELECT * FROM anothertable INTO OUTFILE 'temp.csv' FIELDS TERMINATED BY ',';
Copy temp.csv into your target MySQL data directory as yourcsvtable.CSV. Location and exact name of this file depends on your MySQL setup. You cannot perform the SELECT in step 2 directly into this file as it is already open - you need to handle this in your script.
Use FLUSH TABLE yourcsvtable; to reload/import the CSV table.
Now you can execute your query against the CSV file as expected.
Depending on your data you need to ensure that the data is correctly enclosed by quotation marks or escaped - this needs to be taken into account in step 2.
CSV file can be created by MySQL on some another server or by some other application as long as it is well-formed.
If you export it as CSV, it's no longer SQL, it's just plain row data. Suggest you export as SQL, and import into the second database.