I hava text file full of values like this:
The first line is a list of column names like this:
col_name_1, col_name_2, col_name_3 ......(600 columns)
and all the following columns have values like this:
1101,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1101,1,3.86,65,0.46418,65,0.57151...
What is the best way to import this into mysql?
Specifically how to come up with the proper CREATE TABLE command so that the data will load itself properly? What is the best generic data type which would take in all the above values like 1101 or 3.86 or 0.57151. I am not worried about the table being inefficient in terms of storage as I need this for a one time usage.
I have tried some of the suggestions in other related questions like using Phpmyadmin (it crashes I am guessing due to the large amount of data)
Please help!
Data in CSV files is not normalized; those 600 columns may be spread across a couple of related tables. This is the recommended way of treating those data. You can then use fgetcsv() to read CSV files line-by-line in PHP.
To make MySQL process the CSV, you can create a 600 column table (I think) and issue a LOAD DATA LOCAL INFILE statement (or perhaps use mysqlimport, not sure about that).
The most generic data type would have to be VARCHAR or TEXT for bigger values, but of course you would lose semantics when used on numbers, dates, etc.
I noticed that you included the phpmyadmin tag.
PHPMyAdmin can handle this out of box. It will decide "magically" which types to make each column, and will CREATE the table for you, as well as INSERT all the data. There is no need to worry about LOAD DATA FROM INFILE, though that method can be more safe if you want to know exactly what's going on without relying on PHPMyAdmin's magic tooling.
Try convertcsvtomysql, just upload your csv file and then you can download and/or copy the mysql statement to create the table and insert rows.
Related
I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.
I convert excel to csv first, then import to phpmyadmin only import 100 rows, I changed config.inc buffer size but still did not changed the result. Could you please help me ???
My main idea to do this, compare two tables on mysql workbench, I have one table already sql, i need excel to convert sql then i can use "compare schemas" creating EER Model of existing database.
Good you described the purpose of this approach. This way I can tell you in advance that it will not help to convert that Excel data to a MySQL table.
The model features (sync, compare etc.) all work on meta data only. They do not consider any table content. So instead you should do a textual comparison, by converting the table you have in the server to CSV.
Comparing such large documents is however a challenge. If you only have a few changes then using a diff tool (visual like Araxis Merge or diff on the command line) may help. For larger changesets a small utility app (may self written) might be necessary.
Intro
I've searched all around about this problem, but I didn't really found a source of knowledge about this, so I'm sorry if this problem seems basic to you, but for me is rather quite intriguing due the fact that I'm having hard time to guess what keywords to use on google in order to retrieve proper info.
Problem Description :
As a matter of fact, i have to issues that i don't know how to deal in a MySQL instance installed in a laptop in a windows environment:
I have a DB in MySQL with 50 tables, of with 15 or 20 tables are tables with original data. The other tables were tables that i generated from the original data tables, in order to properly create tables that would allow me to analyze data in PowerBI. The original data tables are fed by dumps from a ERP Database.
My issue is the following:
How would one automate the process of receiving cumulative txt/csv files (via pen-drive or any other transfer mechanism), store those files into a folder and then update the existing tables with the new information? Is there any reference of best practices to deal with such a scenario?
How can i maintain the good shape of my database with the successive data integration, I mean, how can I make my database scalable and responsive?
Can you point me some sources that would help me with this?
At the moment I imported data into tables, in 2 steps:
1st - I created the table structure with the Workbench import wizard help ( I had to do it this way because the tables have a lot of fields - dozens of them, literally, and those fields need to be in the database). I also inserted primary keys and indexes in those tables;
2nd - I Managed to load the data from the files into those tables, using LOAD DATA IN FILE command.
Some of the fields of the tables created with the import wizard, were created as data type text, with is not necessary in this scenario. I would like to revert those fields to data type NVARCHAR(255) or something, However there are a lot of field to alter the data type and in multiple tables at this point, and i was wondering if i can write a query to do the job of creating all the ALTER TABLES statements i need.
So my issue here is: is it safe to alter the data type in multiple fields in multiple columns (in this case i would like to change fields with datatype text to NAVARCHAR(255))? What is the best way to do this? Can you point me to some sources or best practices for this, please?
Thank you, in advance, for your help.
Cheers
You need a scripting language, not a UI. See mysql commandline tool, the shell of your OS, etc, etc.
DROP DATABASE and reCREATE it
LOAD DATA
Massage the data to get the columns cleaner than what the load data provided
Sic the BI tool on the data.
If you want to discuss Step 3, we need details about what transformations are needed between step 2 and step 4. That includes providing the format or schema for steps 2 and 4.
I am using mysql. Some of the tables contain sensitive data like user names, email addresses, etc. I want to dump the data but with these columns in the table removed or modified to some fake data. Is there any way to do it easily?
I'm using this approach:
Copy contents of sensitive tables to a temporary table.
Clear/encrypt the sensitive columns.
Provide --ignore-table arguments to mysqldump.exe to leave the original tables out.
It preserves foreign key contraints, and you can keep columns that are not sensitive.
The first two actions are contained in a stored procedure that I call before doing the dump. It looks something like this:
BEGIN
truncate table person_anonymous;
insert into person_anonymous select * from person;
update person_anonymous set Title=null, Initials=mid(md5(Initials),1,10), Midname=md5(Midname), Lastname=md5(Lastname), Comment=md5(Comment);
END
As you can see, I'm not clearing the contents of the fields. Instead, I keep a hash. That way, you can still see which rows have the same value, and between exports you can see if something changed or not, without anyone being able to read the actual values.
There is a tool called Jailer that is typically used to export a subset of a database. We use this at work to create a smaller test database from a production backup, with all sensitive data obfuscated.
The GUI is a bit crude, but Jailer is the best alternative I have found so far.
You can simply unselect the sensitive tables or columns and get a full copy of the rest. Jailer also supports obfuscating data during export - you could for instance md5 hash all user names or change all email addresses to user#example.org.
There is a tutorial to get you started.
ProxySQL is another approach.
Here is an article explaining how to obfuscate data with proxysql.
https://proxysql.com/blog/obfuscate-data-from-mysqldump
As the title says: I've got a bunch of tab-separated text files containing data.
I know that if I use 'CREATE TABLE' statements to set up all the tables manually, I can then import them into the waiting tables, using 'load data' or 'mysqlimport'.
But is there any way in MySQL to create tables automatically based on the tab files? Seems like there ought to be. (I know that MySQL might have to guess the data type of each column, but you could specify that in the first row of the tab files.)
No, there isn't. You need to CREATE a TABLE first in any case.
Automatically creating tables and guessing field types is not part of the DBMS's job. That is a task best left to an external tool or application (That then creates the necessary CREATE statements).
If your willing to type the data types in the first row, why not type a proper CREATE TABLE statement.
Then you can export the excel data as a txt file and use
LOAD DATA INFILE 'path/file.txt' INTO TABLE your_table;