ssis lookup transformation for very large table - ssis

I have two tables Person & Ownership having one to many relationship (each person have many ownerships). Primary Key is person_id of type GUID. I am developing SSIS package to load data from both tables to destination. In destination tables, I want to add surrogate key so that in future my sql joins use surrogate key not the GUID (Because GUID is slow). I did following tasks
in destination table I created person table with one additional column person_id_sk of type bigint (identity column).
load data to person
similarly created Ownership table with one additional column person_id_sk of bigint
load data to person with "lookup transformation".
The process in very slow as there are millions of record in both table and the package needs to be run twice in a week.
Is this the only way to insert surrogate key values in parent child relationship? or there are any other efficient way.
regards,

An SQL operation would almost certainly be faster than the SSIS one. Load the Person data in SSIS, with the database engine creating the IDENTITY surrogate key. Leave the GUID in so that it goes to the table. Load the Ownership data in SSIS, but don't do anything about the surrogate key at this stage.
Then update the child rows FK in an SQL operation like this:
UPDATE o
SET person_id_sk=p.person_id_sk
FROM
Ownership o
INNER JOIN
Person p
ON o.GUID=p.GUID

Related

Identifying FK's in a MySQL database that were not defined upon database creation?

A database was created with 5 tables. These tables were populated with data upon creation - perhaps it was imported from a previous database.
When the DB was created, primary keys were created for each table, however foreign keys were not.
How do I run a query to identify which tables columns contain data that relates to the PK in other tables? Effectively, how do I identify the FK column(s) on each table? Some tables may contain 2 FK's.
The end goal is to identify the FK('s) in each table and properly set up the table with appropriate FK structure and table relations.
Don't try to use queries to automate this database design / reverse-engineering process. (If you had 500 tables, maybe. But you only have five.)
Eyeball your table definitions. If you have, for example, an id primary key column in your user table, your contact table might have a user_id column. That is the FK to user.id. It will help you greatly if you really understand how your tables tie together with FKs.
And, keep in mind that your system will still work tolerably well if you don't bother to actually declare these foreign keys. What you'll lose:
constraints, in which the database engine prevents, for example a contact.user_id column value that doesn't point to any user.id row.
possibly some helpful indexing.
MySql Workbench has a reverse engineering feature. It inspects the definition of a database and does its best to sort out various entities (tables) and the relationships (foreign key dependencies) between them. It presents graphical e:r diagrams and can generate DDL. That can help you understand a database and set up appropriate FKs. But still, check the relationships it suggests: this data is yours, not Workbench's.

Is it okay to use the same column as a primary key for different tables?

I am a total novice to this whole database world and I have a question. I am building a database for my final project for my masters class. The database includes cities, counties, and demographic data for the state of Colorado. The database ultimately will be used as a spatial database. At this point I have all my tables built in Access, and have a ODBC connection to PostgreSQL to import the tables after they are created. Access does not allow for shapefiles to be added to the database, PostgreSQL does.
My question is about primary keys, each of my tables in Access share an FIPS code (this code allows me to join the demographic data to a shapefile and display the data in ArcMap with the proper coordinates). I have a many demographic data tables with this FIPS code. Is it acceptable to set the FIPS as the primary key for each table? Or does each table need its own individual primary key that is different from the others?
Thanks for the help!
The default PK is “ID”, so there really no problem with using this default for all tables.
In fact it means for any table or code you write you can now always rest easy as to what the primary key is going to be.
And if you copy or re-name a table, then again you know the ID.
Some people do prefer having the table name as part of the PK, but that does violate normalizing of data since now your attaching an external attribute to that PK column.
However for a FK (foreign key), since the VERY definition of the column is an external dependency, then I tend to include the table name like this:
Customers_ID
And once again due to this naming convention, then you can always “guess” or “know” the name of a FK column (table name + ID).
At the end of the day, there is not really a convention on this issue. However I will recommend for all tables you create, you do allow access to create that default PK of “id”. This of course assumes your database design is not using natural keys. And the debate of natural keys vs surrogate key (an auto number pk “id”) has many pros and cons. You can google natural keys vs surrogate keys for endless discussions on this issue.

Replace primary key that is foreign key in other tables

I am currently rebuilding a database which is used to store patient records. In the current database, the primary key for a patient is their name and date of birth, (a single column, ie "John Smith 1970-01-01", it is not composite). This is also a foreign key in many other tables to reference the patients table. I am planning to replace this key with an auto-generated integer key (since there will obviously be duplicate keys one day under the current system). How can I add a new primary key to this table and add appropriate foreign keys on all the other tables? Keep in mind that there is already a very large amount of data (~500,000 records) and these data references cannot be broken.
Thanks!
If up to me..
Add a new future-PK column as a non-null unique index (it must be a KEY, but not necessarily the PK) with auto_increment.
Add the appropriate new-FK columns to all the related tables, these should be initially nullable.
Set the new-FK value to the appropriate future-PK value based on the current-PK/FK relationships. Use an "UPDATE .. JOIN" for this step.
Enable the Referential Integrity Constraints (DRI) on the relevant tables. It only needs to be KEY/FK, not PK/FK, which is why the future-PK can be used. Every existing DRI constraint using the current-PK should likely be updated during this step.
Remove the new-FK column nullability based on modeling requirements.
Remove any residue old-FK columns as they are now redundant data.
Switch the old-PK and the new/future-PK (this can be done in one command and may take awhile to physically reorganize all the rows). Remove the old PK column as applicable, or perhaps simply remove the KEY status.
I would also offline the database during the process, review and test the process (use a testing database for dry-runs), and maintain backups.
The Data-Access Layer and any Views/etc will also need to be updated. These should be done at the same time, again through a review and testing process.
Also, even when adding an auto-increment PK, the table should generally still have an appropriate covering natural key enforced with unique constraints.
I solved the problem using the following method:
1- Assigned added a new primary key to the patients table and assigned unique values to all existing records
2- Created materialized views (without triggers) for each of the referencing tables that included all fields in the referencing table as well as the newly created id field in the patients table (via a join).
3- Deleted the source referencing tables
4- Renamed the materialized views to the names of the original source tables
The materialized views are now the dependent tables.
A reference for materialized views: http://www.fromdual.com/mysql-materialized-views

MySQL foreign key dependency resolution

I am modelling a relational database where the following schema is used to describe 2 tables: ERD Model.
The rules specified are that:
An office has a manager
Each staff member is assigned to an office
In order to model this I created an ERD using MySQL workbench, which provided the following DDL.
The issue I have is that in order to enforce that an office must have a manager, the foreign key in the office table is not nullable. Likewise, the foreign key in the staff table representing the office they work for is required for every staff and therefore not nullable. This makes sense to me in the model, however for the implementation it makes it impossible to insert data as each rely on the existance of tuples in the tables.
The only answer I can think of is to make the keys nullable such that one can temporarily exist without the other.
Is this the correct way to resolve the issue? The database will eventually be normalised to 3NF perhaps BCNF.
The problem is that you're attempting to record the relationship between offices and staff twice. Once in the office record and again in the staff record. You should only record the relationship in one place. Often this is done in a cross-reference table with two columns: Office_ID and Staff_ID. But it's also common to skip the third table and just record the relationship in one of the tables.
In this case, you can eliminate your problem by removing the Office field and foreign key from the Staff table. You'll be able to create as many Staff records as you need. Then when you create an Office record, you will be able to assign one of the Staff to the Office.

Master-detail migrate in SSIS 2008

I have two MSSQL 2008 databases dbA and dbB,
dbA contains master-detail tables pair: AMaster, ADetail. Corresponding it dbB also contains BMaster and BDetail. The only one difference betwee A and B is type of primary key. In source database (dbA) it is integer but in destination (dbB) it is uniqueidentifier.
Dear colleagues: how to describe dataflow in SSIS to convert this case? I need convert all fields, but replace it with new key type.
If you do not want to store the natural primary key in the destination how do you mange updates. Like if a record is changed in the source and you want to replicate a corresponding change in the destination table, how will you do that, I mean this not logically possible. Either you will have to keep the integer based natural key in the destination table or you will have to keep mapping table which stores old natural key and corresponding new key mapping info.
And UID is far bigger number and I don't think that converting integer to UID is a good option. If you really want to do it, do it this way. INT >> HEX STRING >> UID. Check more on UID here [http://msdn.microsoft.com/en-us/library/ms187942.aspx]