SSIS import using foreign key data? - ssis

I have an old database (OldDB) with a table (let's call it Call) that I'm using SSIS (2008) and a new database (NewDB) with the following setup:
OldDB.Call has a column called Statuswhich currently is varchar(1) and holds values such as "C", "D", etc.
NewDB now maps all the possible statuses in its own table with a foreign key constraint so that OldDB.Call.Status is now NewDB.CallStatus.id An example of the data in the NewDB.Call.StatusID would be 1,2, 3 and so forth.
NewDB.CallStatus now has a column called Status which holds the actual nvarchar(1) value of A,B, C, etc.
I'm using SSIS to migrate the data. So far, I know I need to use a Sort transformation for each source and then a Merge Join transformation to map the new NewDB.Call.StatusID to the OldDB.Call.Status value. For whatever reason, it seems to start just fine but ends up grabbing other columns (like a description column, for example) and shoves the wrong kind of data in there. In short, it's not mapping the foreign key like it should.
I've found numerous examples on the web on how to do this (like this) but it seems like I'm missing some key, critical piece of information in order to understand what I'm doing because I keep borking it.
In a perfect world, a step-by-step would be great but a good and concise tutorial or explanation would be useful as well. In short, I need to know how to hook those two tables up and map the value in OldDB to the foreign key in the the NewDB and store that value in NewDB.CallStatus.

I would use the Lookup Transformation for this requirement.
Within the Lookup definition, the Connection would point to your NewDB.CallStatus (writing a SELECT is best practice, rather than just choosing the table - it caches the metadata). On the Columns pane, map Status to Status, and choose StatusID as a Lookup column.
Now your data flow will carry that added column downstream, and you can deliver it (typically using an OLE DB Destination).
Lookup's default mode is Full Cache which will be much faster and use much less Memory compared to a Sort & Merge solution.

Related

PowerApps: Access-like update query

I’m pretty new to PowerApps and need to migrate an Access database over to PowerApps, first of all it’s tables to Dataverse. It’s a typical use case for a model-driven app, with many relationships between the tables. All Access tables had an autogenerated ID field as their primary key.
I transferred all tables via Excel ex/import to Dataverse. Before importing,I renamed all ID fields (columns) to ID_old and let Dataverse create its own, autogenerated ID field for each table.
What I want to achieve is to re-establish all relationships between the tables, where the foreign key points to the new primary key provided by Dataverse, as I want to avoid double keys. As a first step I created relationships between the ID_old field and the corresponding (old) foreign key field in the related table.
In good old Access, I’d now simply run an update query, filling the new (yet empty) foreign key field with the new ID of the related table. Finally, I would change the relationship to the new primary and foreign keys and then delete the old ID fields.
Where I got stuck is the update query. I searched the net and found a couple of options like UpdateIf / Patch functions or Power Query or Excel ex/import and some more. They all read pretty complicated and time intensive and I think I must have overseen a very simple solution for such a pretty common problem.
Is there someone out there who might point me in the right (and simple) direction? Thanks!
A more efficient approach would be to start with creating extra ID columns in Access. Generate your GUIDs and fix your foreign keys there. This can be done efficiently using a few SQL update statements.
When it comes to transferring your Access tables to Dataverse you just provide your Access shadow primary keys in the Create message.
I solved the issue as follows, which is pretty efficient in my perception. I”m assuming you have a auto-numbered ID field in every Access table, which you used for your relationships
Export your tables from Access to Excel.
Rename your ID fields to ID_old in all tables using Excel, as well as your foreign key fields to e.g. ForeignKey_old. This will make it easy to identify the fields later in Dataverse.
Import into Dataverse, using the Power Query tool. Important: Make sure, that you choose ID_old as additional primary key field in the last import step.
Re-create all relationships in Dataverse, using the Lookup datatype. This will create a new, yet empty column in your table.
Now use the “Edit in Excel” feature to open your table in Excel. You should get your prefix_foreignkey_old column with the old foreign keys displayed, as well as the reference to your related table, e.g. prefix_referencetable.prefix_id_old, which is still empty.
Now just copy the complete prefix_foreignkey_old column values into the prefix_referencetable.prefix_id_old column.
Import the changes and you’re done.
Hope this is helpful for some of you out there.

MySQL, how to restructure optional multiple foreign keys

For this example, I'm trying to build a system that will allow output from multiple sources, but these sources are not yet built. The output "module" will be one component, and each source will be its own component to be built and expanded upon later.
Here's an example I designed in MySQLWorkbench:
The goal is to make my output module display data from the output table while being easily expanded upon later as more sources are built. I also want to minimize schema updates when adding new sources. Currently, I will have to add a new table per source, then add a foreign key to the output table.
Is there a better way to do this? I don't know how I feel about these NULL-able foreign keys because the JOIN query will contains IFNULL's and will get unruly quickly.
Thoughts?
EDIT 1: Clarification
I will be displaying a grid using data in the output table. The output table will contain general data for all items in the grid and will basically act as an aggregator for the output_source_X tables:
output(id, when_added, is_approved, when_approved, sort_order, ...)
The output_source_X tables will contain additional data specific to a source. For example, let's say one of the output source tables is for Facebook posts, so this table will contain columns specific to the Facebook API:
output_source_facebook(id, from, message, place, updated_time, ...)
Another may be Twitter, so the columns are specific for Twitter:
output_source_twitter(id, coordinates, favorited, truncated, text, ...)
A third output source table could be Instagram, so the output_source_instagram table will contain columns specific to Instagram.
There will be a one-to-one foreign key relationship with the output table and ONLY ONE of the output_source_X tables, depending on if the output item is a Facebook, Twitter, Instagram, etc... post, hence the NULL-able foreign keys.
output table
------------
foreign key (source_id_facebook) references output_source_facebook(id)
foreign key (source_id_twitter) references output_source_twitter(id)
foreign key (source_id_instagram) references output_source_instagram(id)
I guess my concern is that this is not as modular as I'd like it to be because I'd like to add other sources as well without having to update the schema much. Currently, this requires me to join output_source_X on the output table using whatever foreign key is not null.
This design in almost certainly bad in a few ways.
It's not that clear what your design is representing but a straightforward one would be something like:
// source [id] has ...
source(id,message,...)
// output [id] is approved when [approved]=1 and ...
output(id,approved,...)
// output [output_id] has [source_id] as a source
output_source(output_id,source_id)
foreign key (source_id) references source(id)
foreign key (source_id) references source(id)
Maybe you have different subtypes of outputs and/or sources? Based on sources and/or outputs? Maybe each source is restricted to feeding particular outputs? Are "outputs" and "sources" actually kinds of outputs and sources, and this is info not on how outputs are sourced but info on what kinds of output-source pairings are permittted?
Please give us statements parameterized by column names for the basic statements you want to make about your application. Ie for the application relationships you are interested in. (Eg like the code comments above.) (You could do it for the diagrammed design but probably that would be overly complicated and not really reflecting what you are trying to model.)
Re your EDIT:
There will be a one-to-one foreign key relationship with the output
table and ONLY ONE of the output_source_X tables, depending on if the
output item is a Facebook, Twitter, Instagram, etc... post, hence the
NULL-able foreign keys.
You have a case of multiple disjoint subtypes of a supertype.
Your situation is a lot like that of this question except that where they have a subtype discriminator/tag column indicating which subtype table you have a set of columns where the non-empty one indicates which subtype table. See Erwin Smout's & my answers. Also this answer.
Please give us statements parameterized by column names for the basic
statements you want to make about your application
and you will find straightforward statements (as above). And if you give the statements for your current design you will find them complex. See also this.
I guess my concern is that this is not as modular as I'd like it to be
because I'd like to add other sources as well without having to update
the schema much.
Your structure is not reducing schema changes compared to proper subtype designs.
Anyway, DDL is there for that. You can genericize subtypes to avoid DDL only by loss of the DBMS managing integrity. That would only be relevant or reasonable based on evaluating DDL vs DML performance tradeoffs. Search re (usually, anti-pattern) EAV.
(Only after you shown that creating & deleting new tables is infeasible and the corresponding horrible integrity-&-concurrency-challenged mega-joining table-and-metadata-encoded-in-table EAV information-equivalent design is feasible should you consider using EAV.)

Custom primary key for MS Access

I am new in microsoft access. I was just wondering on how can I use create a custom primary key? for example abc-123 format?
It depends on how you want the abc-123 values to be created.
If you want to create them by yourself in your code, just create a Text column and use that as your primary key.
If you want Access to create these values...that's not really possible. The only thing that Access is able to auto-generate are increasing numerical values (data type AutoNumber).
So the best thing you can do is to use an AutoNumber internally as the actual primary key, and create the abc-123 value out of that, just for displaying.
Here are some examples how to do this, from previous similar questions that I answered in the past:
access 2003 text display leading zero
Automatically generate numbers
Disclaimer: I don't know if a similar approach would work in your case.
If not, you need to give more information how exactly you want your numbers to be created:
do you want the number to increase?
do you want the letters to change/"increase"/always stay the same?
Actually, you could create a table trigger if using 2010 or later. The table trigger could take some field (where you get the abc from) and then some other field (seq num) and then add + 1 to the value.
The "air" code would look like this:
The beauty of the table trigger is it runs at table (data engine) level, and thus if you open the database with ODBC, VB.net, FoxPro, Access etc. then the PK key will always auto generate for you.

Master-detail migrate in SSIS 2008

I have two MSSQL 2008 databases dbA and dbB,
dbA contains master-detail tables pair: AMaster, ADetail. Corresponding it dbB also contains BMaster and BDetail. The only one difference betwee A and B is type of primary key. In source database (dbA) it is integer but in destination (dbB) it is uniqueidentifier.
Dear colleagues: how to describe dataflow in SSIS to convert this case? I need convert all fields, but replace it with new key type.
If you do not want to store the natural primary key in the destination how do you mange updates. Like if a record is changed in the source and you want to replicate a corresponding change in the destination table, how will you do that, I mean this not logically possible. Either you will have to keep the integer based natural key in the destination table or you will have to keep mapping table which stores old natural key and corresponding new key mapping info.
And UID is far bigger number and I don't think that converting integer to UID is a good option. If you really want to do it, do it this way. INT >> HEX STRING >> UID. Check more on UID here [http://msdn.microsoft.com/en-us/library/ms187942.aspx]

Implementing custom fields with ALTER TABLE

We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.