How to replace some columns' values while doing an SQL dump - mysql

I need to do regulars SQL dump to save a database, but in the table "users" there is the columns "name" and "firstname" that must be anonymous (by changing the values with random string or else).
I would like to know how could I dump my data and change the values of these columns at the same time?
This is for a MySQL database

You can't do this with the mysqldump tool. It doesn't have any option for transforming the output, it only dumps the data that is in your database. So if you use mysqldump, your options are:
Store anonymized data and dump that data (as suggested by #DroidX86 in a comment above)
Get really good at sed and use it to filter the output of mysqldump. I am really good with sed, and I wouldn't try that.
So you may try to dump data without using mysqldump, so you have more options for custom transformations on the data.
You can write a script to SELECT ... FROM MyTable; which is basically what mysqldump does. But if you write your own tool, you can use expressions to anonymize the name and firstname instead of dumping the contents of those columns as-is.
However, coding your own replacement for mysqldump is harder than it sounds. Some developers have attempted it and gotten it spectacularly wrong. I've answered questions about it before:
Why is my database backup script not working in php?
PHP Database Dump Script - are there any issues?

Related

Importing and exporting TSVs with MySQL

I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.

How to create a dump file when using mysqldump

I'm new to sql and trying to get to grips with some of the basics.
Say I've created a database as follows:
create database mydatabase;
and I want to back this up to a dump file. My confusion comes in here - what is this dump file? Does this automatically generate when I run the mysqldump command? Do I have to create it beforehand? If so, how? Sorry if this comes off stupid but I'm just lost here.
I know the final command would look as follows:
mysqldump -u -p mydatabase > SOMETHING;
but I don't know what to actually insert as the something
The dump file is just a file usually with a bunch of scripts you can run to recreate something in the database. Different database system have their own way of generating these, but you could effectively take a dump from an mysql database and into another rdbms as long as the scripts are compatible.
There are different options;
You could create a dump of just the schema. So that you can recreate that database later on without the data.
Create the dump with schema and data. So that you can recreate the scheme as well as the data within it.
Create the dump with just the data. So you can run those insert commands against another database of the same schema.
Basically dumps are just a way to get back to a previous state. They come in handy for different situation. Such as backing up data, replicating a data from one database to another.
As for the commands I have not used mysql in a while but looks like you are on the right track.

How to import Oracle sql file into MySQL

I have a .sql file from Oracle which contains create table/index statements and a lot of insert statements(around 1M insert).
I can manually modify the create table/index part(not too much), but for the insert part there are some Oracle functions like to_date.
I know MySql has a similar function STR_TO_DATE but the usage of the parameter is different.
I can connect to MySQL, but the .sql file is the only thing I got from Oracle.
Is there any way I can import this Oracle .sql file into MySQL?
Thanks.
Although the above job can be done by manually editing the script appropriately however there are products available which can be of use. Refer to the link for more information on one such product.
P.S. I am not affiliated in any way to the product
Since you mention about insert script basically i think you will be inserting data for this you can use any ETL tool, like open source tool like Pentaho data integrator, pretty simple to do, just search table to table transformation from different database connection on youtube to learn you should be able to connect to both mysql and oracle database else this wont help, but all the table structures you should create manually in the source database for data - you can just load it using ETL, no need to edit for every single line of insert if its more than 100 may be its very painful thing to do.

Replace BLOBs with other BLOB in MySQL

I have the database in MySQL. I need the dump of this database in another environment, but this dump needs to be a little bit different from the original one. What I would like to achieve is to replace 3 columns in one of my tables with "fake default" values.
The problem is that one of the columns is the type of longblob and I have no idea, what I could do with it. With other situation which is quite similar to this, I saw an example where bash script is running mysqldump command and then cat the dump and perform in it grep and sed in order to modify the dump. I was trying to find any information if it is possible with command mysqldump, but I didn't find anything useful.
What I was thinking about is maybe I could create a bash script/cronjob, which will create a new table almost same as the existing one, but the difference would be desirable values instead of the original one. Then it would create a dump of the whole database and finally with grep and sed rename the newly created table with the name of the original one, which would be removed.
I don't know if there is any other way to do this. The perfect solution would be just performing an UPDATE operation on the fly of creation mysqldump.
So my question is if there is an easy way to achieve this? Or maybe is something that I missed?

question about MySQL database migration

If I have a MySQL database with several tables on a live server, now I would like to migrate this database to another server. Of course, the migration I mean here involves some database tables, for example: add some new columns to several tables, add some new tables etc..
Now, the only method I can think of is to use some php/python(two scripts I know) script, connect two databases, dump the data from the old database, and then write into the new database. However, this method is not efficient at all. For example: in old database, table A has 28 columns; in new database, table A has 29 columns, but the extra column will have default value 0 for all the old rows. My script still needs to dump the data row by row and insert each row into the new database.
Using MySQLDump etc.. won't work. Here is the detail. For example: I have FOUR old databases, I can name them as 'DB_a', 'DB_b', 'DB_c', 'DB_d'. Now the old table A has 28 columns, I want to add each row in table A into the new database with a new column ID 'DB_x' (x to indicate which database it comes from). If I can't differentiate the database ID by the row's content, the only way I can identify them is going through some user input parameters.
Is there any tools or a better method than writing a script yourself? Here, I dont need to worry about multithread writing problems etc.., I mean the old database will be down (not open to public usage etc.., only for upgrade ) for a while.
Thanks!!
I don't entirely understand your situation with the columns (wouldn't it be more sensible to add any new columns after migration?), but one of the arguably fastest methods to copy a database across servers is mysqlhotcopy. It can copy myISAM only and has a number of other requirements, but it's awfully fast because it skips the create dump / import dump step completely.
Generally when you migrate a database to new servers, you don't apply a bunch of schema changes at the same time, for the reasons that you're running into right now.
MySQL has a dump tool called mysqldump that can be used to easily take a snapshot/backup of a database. The snapshot can then be copied to a new server and installed.
You should figure out all the changes that have been done to your "new" database, and write out a script of all the SQL commands needed to "upgrade" the old database to the new version that you're using (e.g. ALTER TABLE a ADD COLUMN x, etc). After you're sure it's working, take a dump of the old one, copy it over, install it, and then apply your change script.
Use mysqldump to dump the data, then echo output.txt > msyql. Now the old data is on the new server. Manipulate as necessary.
Sure there are tools that can help you achieving what you're trying to do. Mysqldump is a premier example of such tools. Just take a glance here:
http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
What you could do is:
1) You make a dump of the current db, using mysqldump (with the --no-data option) to fetch the schema only
2) You alter the schema you have dumped, adding new columns
3) You create your new schema (mysql < dump.sql - just google for mysql backup restore for more help on the syntax)
4) Dump your data using the mysqldump complete-insert option (see link above)
5) Import your data, using mysql < data.sql
This should do the job for you, good luck!
Adding extra rows can be done on a live database:
ALTER TABLE [table-name] ADD [row-name] MEDIUMINT(8) default 0;
MySql will default all existing rows to the default value.
So here is what I would do:
make a copy of you're old database with MySql dump command.
run the resulting SQL file against you're new database, now you have an exact copy.
write a migration.sql file that will modify you're database with modify table commands and for complex conversions some temporary MySql procedures.
test you're script (when fail, go to (2)).
If all OK, then goto (1) and go live with you're new database.
These are all valid approaches, but I believe you want to write a sql statement that writes other insert statements that support the new columns you have.