My use case is to take data from one version of MySQL schema and put it in another. So even before putting the data, I want to check if the schema of the source is compatible with destination. For examples, if the new column is added in the destination and which is nullable, then they are still compatible, where as dropping a column is not compatible change since source now has extra column and destination doesn't and will break import of data.
to compare the schema of two MySQL databases I suggest to use:
TiCodeX SQL Schema Compare (https://www.ticodex.com).
It also gives you the migration script to update the destination database in case there are differences.
It's a very cheap but professional tool and with the same license you can use it also for MicrosoftSQL and PostgreSQL databases.
It's worth to mention that is the only tool I've found that also works nicely on Linux and MacOS.
Related
I'm migrating a large(approx. 10GB) MySQL database(InnoDB engine).
I've figured out the migration part. Export -> mysqldump, Import -> mysql.
However, I'm trying to figure out the optimum way to validate if the migrated data is correct. I thought of the following approaches but they don't completely work for me.
One approach could have been using CHECKSUM TABLE. However, I can't use it since the target database would have data continuously written to it(from other sources) even during migration.
Another approach could have been using the combination of MD5(), GROUP_CONCAT, and CONCAT. However, that also won't work for me as some of the columns contain large JSON data.
So, what would be the best way to validate that the migrated data is correct?
Thanks.
How about this?
Do SELECT ... INTO OUTFILE from each old and new table, writing them into .csv files. Then run diff(1) between the files, eyeball the results, and convince yourself that the new tables' rows are an appropriate superset of the old tables'.
These flat files are modest in size compared to a whole database and diff is fast enough to be practical.
Is it possible to search for something that is in two databases? For example, I want to do a "starts with" search on a column in Postgres as well as a column in MySQL where one is "name" and one is "email"
Copying over data is not reliable as new data will be created in both databases constantly.
Yes, it is possible. For the "starts with" part, you should be able to use the standard Postgres string functions, of which starts_with is one, and indexing on the desired columns.
Getting the data from MySQL is the more complicated part.
You would most likely want to use a foreign data wrapper (e.g. FDW) from Postgres to access the MySQL data, and then handle the unioning of it (or other desired processing) with the Postgres data for returning the combined data set.
You could write your own FDW if you have particularly specific requirements, or you could try an open source one, such as this one from EnterpriseDB. EnterpriseDB is a Postgres consultancy and offers their own Postgres version, but the doc on the Github page for this says it is compatible with base Postgres as well as their own version.
We want to introduce database schema versioning on an existing schema, because the current way of handling the schema versioning (manually) has lead to differences in the schema in the various databases in production.
For the sake of definition: when the word 'database' is used in this question, I mean the set of tables, views, procedures etc. that are available to 1 customer. Each customer has his own database. Several databases in 1 MySQL server instance. We have several MySQL servers running.
Context:
multiple MySQL databases in production
version of database is defined by the product that uses the database, i.e. multiple versions exists in production
Problem:
databases of the same version differ slightly in schema definition
unclear if schema migrations have been correctly applied
I've tried Flyway (commandline) and defined a baseline of a certain version and created migration scripts to migrate the database version to schema of the latest version. Obviously this works well when you start with a known baseline.
So we need to migrate all databases to a known baseline.
When I read the documentation and multiple articles about setting the baseline it all comes down to: 'figure out what the baseline should be and migrate the database manually to this baseline'.
However, since there are multiple databases in production (> 500) which differ slightly even when 'in the same version', it is not possible to manually change the databases to the desired baseline.
Is there a way with either Flyway or Liquibase to say something like 'migrate this schema to this baseline' which comes down to a set of 'if this column doesn't match the desired schema alter table modify column....' statements?
I'd settle for changes in tables only and recreate all views, triggers, procedures etc. anyway.
I read that Liquibase can do a 'rollback', but since MySQL doesn't support transactions on DDL, I wonder if this is implemented in Liquibase or only available for database servers that DO support transactions on DDL.
If anyone knows, please let me know. This might be an argument to move from Flyway to Liquibase.
Does any database support revision control on its command line interface?
So for instance, I'm at the mysql> command prompt. I'm going to add a column to a table. I type
ALTER TABLE X ADD COLUMN Y bigint;
and then the database prompts me:
"OK. Check these changes in using git?"
I respond yes, and it prints out the revision number for the schema.
(No need for a mysqldump to get the schema, removing the data, and checkin at the bash command line.)
At another time, I decide to check in all of my data as well. So I type something like
CHECKIN SNAPSHOT TABLE X
Seems like a common sense approach to me, but has anyone modified a database to support such behavior?
Thanks
Well, for one thing, checking in one table makes no sense; its a relational database for a reason; you can't just roll back one table to a previous state and expect that to be OK.
Instead, you can roll back the entire database to a previous state. The database world calls this "point in time recovery"; and you can do it in MySQL with a combination of a full backup and replication logs.
I have no idea why you'd want to store your backups in git.
What you're looking I believe are "migrations". These are usually done at the code level rather than at the database level. Rails supports migrations, Django does with South, I'm sure there's others out there as well.
These migrations are just code files (either in the programming language of choice, or sql scripts) that can be checked in.
The usual case is Schema migrations that don't migrate data but some (most?) allow you to write custom code to do so if you wish.
Generally speaking you wouldn't want to version control data, but there are exceptions when some of the data is part of the logic, but you generally want to separate the type of data that you consider part of the functionality and the parts that you don't.
If you want to store everything, that's just called backing up. Those generally don't do into version control just because of the size. Either it's in a binary format that doesn't diff well, or it's many times larger than it needs to be in a text format. If you need to diff the data, there are tools for that as well.
I need to convert data that already exists in a MySQL database, to a SQL Server database.
The caveat here is that the old database was poorly designed, but the new one is in a proper 3N form. Does any one have any tips on how to go about doing this? I have SSMS 2005.
Can I use this to connect to the MySQL DB and create a DTS? Or do I need to use SSIS?
Do I need to script out the MySQL DB and alter every statement to "insert" into the SQL Server DB?
Has anyone gone through this before? Please HELP!!!
See this link. The idea is to add your MySQL database as a linked server in SQL Server via the MySQL ODBC driver. Then you can perform any operations you like on the MySQL database via SSMS, including copying data into SQL Server.
Congrats on moving up in the RDBMS world!
SSIS is designed to do this kind of thing. The first step is to map out manually where each piece of data will go in the new structure. So your old table had four fields, in your new structure fileds1 and 2 go to table a and field three and four go to table b, but you also need to have the autogenerated id from table a. Make notes as to where data types have changed and you may need to make adjustments or where you have required fileds where the data was not required before etc.
What I usually do is create staging tables. Put the data in the denormalized form in one staging table and then move to normalized staging tables and do the clean up there and add the new ids as soon as you have them to the staging tables. One thing you will need to do if you are moving from a denormalized database to a normalized one is that you will need to eliminate the duplicates from the parent tables before inserting them into the actual production tables. You may also need to do dataclean up as there may be required fileds in the new structure that were not required in the old or data converstion issues becasue of moving to better datatypes (for instance if you stored dates in the old database in varchar fields but properly move to datetime in the new db, you may have some records which don't have valid dates.
ANother issue you need to think about is how you will convert from the old record ids to the new ones.
This is not a an easy task, but it is doable if you take your time and work methodically. Now is not the time to try shortcuts.
What you need is an ETL (extract, transform, load) tool.
http://en.wikipedia.org/wiki/Extract,_transform,_load#Tools
I don't really know how far an 'ETL' tool will get you depending on the original and new database designs. In my career I've had to do more than a few data migrations and we usually always had to design a special utility which would update a fresh database with records from the old database, and yes we coded it complete with all the update/insert statements that would transform data.
I don't know how many tables your database has, but if they are not too many then you could consider going the grunt root. That's one technique that's guaranteed to work after all.
If you go to your database in SSMS and right-click, under tasks should be an option for "Import Data". You can try to use that. It's basically just a wizard that creates an SSIS package for you, which it can then either run for you automatically or which you can save and then alter as needed.
The big issue is how you need to transform the data. This goes into a lot of specifics which you don't include (and which are probably too numerous for you to include here anyway).
I'm certain that SSIS can handle whatever transformations you need to do to change it from the old format to the new. An alternative though would be to just import the tables into MS SQL as-is into staging tables, then use SQL code to transform the data into the 3NF tables. It's all a matter of what your most comfortable with. If you go the second route, then the import process that I mentioned above in SSMS could be used. It will even create the destination tables for you. Just be sure that you give them unique names, maybe prefixing them with "STG_" or something.
Davud mentioned linked servers. That's definitely another way that you can go (and got my upvote). Personally, I prefer to copy the tables over into MS SQL first since linked servers can sometimes have weirdness, especially when it comes to data types not mapping between different providers. Having the tables all in MS SQL will also probably be a bit faster and saves time if you have to rerun or correct portions of the data. As I said though, the linked server method would probably be fine too.
I have done this going the other direction and SSIS works fine, although I might have needed to use a script task to deal with slight data type weirdness. SSIS does ETL.