I am using Talend Open Studio for data migration as I am upgrading my existing application architecture to a new one. I am using MySQL in both the applications but with different schema. I have migrated data successfully between single tables but while I am transferring data from a single table to a parent-child relationship table with a foreign key constraint, the data transfer is tremendously slow.
For e.g. I am migrating my Cities table to Cities and Citiesi18n and below is the schema for them:
My old schema :
CITIES (
id
city_name
status
created_at
)
The newly created schema where I need to migrate the data :
CITIES (
id
status
created_at
)
CITIESI18N (
id
lang_code
name
fk_city_id (// foreign key of cities table)
)
Below are the snapshots from my Talend jobs:
And here is the tmap configuration :
Now when I transfer the data without the foreign key the result are super fast. See below :
But the same when I transfer with a foreign key, my transfer is super slow :
(Note: I have taken province table for example as it is similar to cities table)
I think with Foreign key constraint it must be indexing the columns while transferring the data making it slower, but I am not sure. Is there any way I can fix this as I have a lot of tables similar to this which needs to be migrated. I am just curious to know the reason.
I don't know why you have this behaviour : you can try to redirect 'provience_i18n' to a tHashOutput (cache component), then link to a subjob with tHashInput (refering to your tHashOutput)-->tMySQLOutput. You'll have 2 subjobs, one for each insertion.
You are loading data to the parent & child at the same time. Using one tmap. When you are inserting foreign key in the second table, there is also insertion being made in the foreign/parent table. What you could alternatively do is: Load the data in the main CITIES table first, then onSubJobOk, load into child/CITIESI18N table. It would be faster. Let me know if it works.
Related
I'm developing an Android application in which the data is stored in a SQLite database.
I have made sync with a MySQL database, in the web, to where I'm sending the data stored in the SQLite in the device.
The problem is that I don't know how to maintain the relations between tables, because the primary keys are going to be updated with AUTO_INCREMENT, and the foreign keys remain the same, breaking the relations between tables.
If this is a full migration, don't use auto increment during migration - create tables with normal columns. Use ALTER TABLE to change the model after import.
For incremental sync, the easiest way I see is additional column in each MySQL table called sqlite_id and filled with original id. Then you can update references using UPDATE (with joins).
Alternatives involve temporary tables for storing data and an auxiliary table used for pairing. Tedious for bigger data model.
The approach I tend to use, if possible, is to avoid auto increment in such situations. I have usaully an auxiliary table with four columns like this: t_import(tablename, operationid, sqlite_id, mysqlid).
Process is the following:
Import the primary keys into t_import. Use operationid to separate parallel imports if needed.
Generate new keys for data tables and store them into t_import table. This can be combined with step one.
Import the actual data and use t_import for setting new primary keys and restore relations.
That should work for most scenarios I know about.
Thanks or the help, you have given me some ideas.
I will try to add a id2 field to the tables that will store the same value as the primary key (_id)
When I send the information from SQLite to MySQL and the primary key is incremented I will have the id2 field with the original value of the primary key so that I can compare it with the foreign key of the other tables and update it.
Let’s see if it works.
Thanks
I am building a database on SQL Server 2014. I have a users table and a profiles table and I need to have a relationship with both these tables. I am relating the userid (primary key on user table) to the profiles table (userid there as foreign key). This is just an example to consider.
What I need to know is, what if the profiles table is on another server instance? Is there a way with which I can link both? The reason is that I don't want to overload the sql server with too many tables and data...
Thanks,
Sarin Gopalan
It is not at all possible to create foreign key relations between databases - let alone between server instances.
You might be able to create a trigger on the Profiles table, that checks if a userid exists in the User table in the other database, but I fear the performance of this approach will be very bad.
A much better solution would be to replicate one table to the other database, and then create the foreign key relations in a normal way. How you replicate the table (SSIS, CDC, triggers, etc.) is up to you.
I am designing a MySQL database for a new project. I will be importing 50-60 MB of data on a daily basis.
There will be a main table with a primary key. Then there will be child tables with their own primary key and a foreign key pointing back to the main table.
New data has to be parsed from a giant text file and then some minor manipulations made prior to importing into the master database. The parsing and import operation may involve a significant amount of troubleshooting so I want to import new data into a temporary database and ensure its integrity before adding to the master.
For this reason, I thought initially to parse and import new data into a separate, temporary database each day. In this way, I would be able to inspect the data prior to adding to the master and at the same time I would have each day's data stored as a separate database should I ever need to rebuild the master later on from the individual temporary databases.
I am considering the use of primary keys / foreign keys with the InnoDB engine in order to maintain relational integrity across tables. This means I have to worry about auto-increment ids (primary key) not having any duplicates when I go to import the new data each day.
So, given this situation, what would be best?
Make a copy of the master and import directly into the copy of the master each day. Replace existing master with the new copy.
Import new data into a temporary database each day but change auto-increment start value of the primary keys to be greater than the maximum in the master. Would I then also change the auto-increment values for the primary keys for all tables (main table and its children)?
Import new data into a temporary database each day, not worrying about the primary key values. Find some other way to merge the temporary database with the master without collisions of the primary keys? If using this strategy, how can I update the primary key in the main table for the new data while making sure all the relationships with the child tables remain correct?
I'm not sure this is as complicated as you are making it?
Why not just do this:
Import raw data into temporary table (why does it have to be a separate database?)
Run your transformations/integrity checks on the temporary table.
When the data is good, insert it directly into the master table.
Use auto incrementing ids on the master table that are not dependent on your data being imported. That allows you to have a unique id and the original ids that might have existed in your import.
Add a field to your master table(s) that gives you a record of which import the records came from.
In addition to copying the data to your master table, make a log that ties back to the data you merged. Helps you back out the data if you find it's wrong/bad and gives you an audit trail.
In the end just set up a sandbox database, write a bunch of stored procedures and test the crap out of it. =)
My plan is such as following:
I have the Asp.Net membership database that keeps info about users.
I want to create my own database in which some of the tables will have fields like userId as a foreign key to primary key field in the membership table Users. So how would I achieve this and what is the best practice for this? Or should I get the value of the user during run time, and then copy that value to my table without the need of any foreign key relations.
Just add tables to the existing database, and reference the pk from the Users table as needed. There's no need to use a separate database as long as it's all logically the same application.
Finally reached data migration part of my Project and now trying to move data from MySQL to SQL Server.
SQL Server has new schema (mapping is not always one to one).
I am trying to use SSIS for the conversion, which I started learning today morning.
We have customer and customer location table in MySQL and equivalent table in SQL Server. In SQL server all my tables now have surrogate key column (GUID) and I am creating the same in Script Component.
Also note that I do have a primary key in current mysql tables.
What I am looking for is how I can add child records to customer location table with newly created guid as parent key.
I see that SSIS have Foreach loop container, is this of any use here.
if not another possibility that I can think of is create two Data Flow Task and [somehow] just before the master data is sent to Destination Component [Table] on primary dataflow task , add a variable with newly created GUID and another with old PrimaryID, which will be used to create source for DataTask Flow for child records.
May be to simplyfy , this can also be done once datatask for master is complete and then datatask for child reads this master data and inserts child records from MySQL to SQL Server table. This would though mean that I have to load all my parent table records back into memory.
I know this is all too confusing and it is mainly because I am very confused :-(, to bear with me and if you want more information let me know.
I have been through may links that i found through google search but none of them really explains( or I was not able to uderstand) how the process is carried out.
Please advise
regards,
Mar
** Edit 1**
after further searching and refining key words i found this link in SO and going through it to see if it can be used in my scenario
How to load parent child data found in EDI 823 lockbox file using SSIS?
OK here is what I would do. Put the my sql data into staging tables in sql server that have identity columns set up and an extra column for the eventual GUID which will start out as null. Now your records have a primary key.
Next comes the sneaky trick. Pick a required field (we use last_name) and instead of the real data insert the value form the id field in the staging table. Now you havea record that has both the guid and the id in it. Update the guid field in the staging table by joing to it on the ID and the required field you picked out. Now update the last_name field with the real data.
To avoid the sneaky trick and if this is only a onetime upload, add a column to your tables that contains the staging table id. Again you can use this to get the guid for inserting to related tables. Then when you are done, drop the extra column.
You are aware that there are performance issues involved with using GUIDs? Make sure not to make them the clustered index (as the PK they will be by default unless you specify differntly) and use newsequentialid() to populate them. Why are you using GUIDs? If an identity would work, it is usually better to use it.