When to use CSV storage engine for MySQL? - mysql

From the docs, it states:
The CSV storage engine stores data in text files using comma-separated
values format.
What are the advantages of this? Here are some I can think of:
You can edit the CSV files using simple text editor (however, you can export data easily using SELECT INTO OUTFILE)
Can be easily imported into Spreadsheet programs
Lightweight and maybe better performance (wild guess)
What are some disadvantages?
No indexing
Cannot be partitioned
No transactions
Cannot have NULL values
Granted this (non-exhaustive) list of advantages and disadvantages, in what practical scenarios should I consider using the CSV storage engine over others?

I seldom use the CSV storage engine. One scenario I have found it useful, however, is for bulk data imports.
Create a table with columns matching my input CSV file.
Outside of mysql, just using a shell prompt, mv the CSV file into the MySQL data dictionary, overwriting the .csv file that belongs to my table I just created.
ALTER TABLE mytable ENGINE=InnoDB
VoilĂ ! One-step import of a huge CSV data file using DDL instead of INSERT or LOAD DATA.
Granted, it's less flexible than INSERT or LOAD DATA, because you can't do NULLs or custom overrides of individual columns, or any "replace" or "ignore" features for handling duplicate values. But if you have an input file that is exactly what you want to import, it could make the import very easy.

This is a tad bit hacky, but as of MySQL 8, assuming you know the data structure beforehand and have permissions in the CSV-based schema directory, you can create the table definition in MySQL and then overwrite the generated CSV table file in the data directory with a symlink to the data file:
mysql --execute="CREATE TABLE TEST.CSV_TEST ( test_col VARCHAR(255) ) ENGINE=CSV;"
ln -sf /path/to/data.file /var/lib/mysql/TEST/CSV_TEST.CSV
An advantage here is that this completely obviates the need to run import operations (via LOAD DATA INFILE, etc.), as it allows MySQL to read directly from the symlinked file as if it were the table file.
Drawbacks beyond those inherent to the CSV engine:
table will contain header row if there is one (you'd need to filter it out from read operations)
table metadata in INFORMATION_SCHEMA will not update using this method, will just show the CREATE_TIME for which the initial DDL is run
Note this method is obviously more geared toward READ operations, though update/insert operations could be conducted on the command line using SELECT ... INTO OUTFILE and then copying onto/appending the source file.

Related

Importing and exporting TSVs with MySQL

I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.

How to export database? MySql

I am new at databases. I have table and and I need to export it and save its structure. I'm using MySql Workbench. It is my first task and I have no idea and know just few things about databases.
Your question is a bit unprecise. What exactly do you want to export? The table structure + data for later restore (if so use a dump) or just the table data in a common format like CSV for further processing (if so use the table data export wizard).
A dump is what is usually used to store SQL data + structure in text files (conventionally tagged with an .sql extension). These contain Data Definition Language (DDL) constructs which define the meta data (e.g. CREATE TABLE) as well as Data Manipulation Language (DML) commands to manage the content (e.g. INSERT or DELETE). This structure serves well for copying content between servers and such, as it is what a database server speaks natively. In MySQL Workbench you can import and export such dumps via the Management tab -> Data Import/Restore:
For importing and exporting data via a common file format like CSV or JSON, use the table data import/export feature, reachable via the context menu for a given table:
this does however not preserve the structure of the table in a manner which would allow to recreate it automatically (like SQL statements do).

Import multiple (100+) Excel files into MySQL Table

I have 100+ XLSX files that I need to get into a MySQL Database. Each file is a bit different, so I've created a massive table that contains a field for each possible column header across all files. This way they can auto-map on import.
Using Navicat I can import the files one at a time, but I'm wondering if there is a way to import all of the files at one time?
I'm thinking 'no' but if your going to do this a lot I think there are ways to automate this.
Export your xls-files into csv-files instead. Write some script to transform them between csv-excel style to csv-mysql style.
Create tables for importing with the csv engine. Put your files inplace of the ones created by mysql and flush tables. Now your data is ready to be read inside mysql and copied over to more powerful tables engines.
Another way is to do a VBA-script that exports the data in a format recognized by load data infile and then load them using mysql.

Can I import tab-separated files into MySQL without creating database tables first?

As the title says: I've got a bunch of tab-separated text files containing data.
I know that if I use 'CREATE TABLE' statements to set up all the tables manually, I can then import them into the waiting tables, using 'load data' or 'mysqlimport'.
But is there any way in MySQL to create tables automatically based on the tab files? Seems like there ought to be. (I know that MySQL might have to guess the data type of each column, but you could specify that in the first row of the tab files.)
No, there isn't. You need to CREATE a TABLE first in any case.
Automatically creating tables and guessing field types is not part of the DBMS's job. That is a task best left to an external tool or application (That then creates the necessary CREATE statements).
If your willing to type the data types in the first row, why not type a proper CREATE TABLE statement.
Then you can export the excel data as a txt file and use
LOAD DATA INFILE 'path/file.txt' INTO TABLE your_table;

mysqldump and --tab speed

According to the MySQL Certification Guide, when --tab option is used,
a SELECT ... INTO OUTFILE is used to generate a tab delimited file in the specified directory, a file containing SQL (CREATE TABLE) will also be generated
and that
using --tab to produce tab-delimited dump files is much faster
but how can it be faster if both generate a SQL file and the --tab one just has an extra tab?
I would say that :
without using --tab, a file is generated, that contains both :
create statements
and data, as lots of insert statements
with --tab, two files are generated :
one with create statements
one other with data, in a tab-delimited format, instead of insert statements
The difference is the second part :
inserts
vs tab-delimited
I'm guessing that creating lots of insert statements takes more time than just dumping data with a tab-delimited format -- maybe it's the same with importing the data back, too ?