SSIS migration: split one record to many tables - ssis

I'm refactoring my db's User object from a schema that combines BillingAddress with Shipping Address:
[BillingFirstName] [nvarchar](50) NOT NULL,
[BillinglastName] [nvarchar](50) NOT NULL,
[BillingAddress] [nvarchar](100) NOT NULL,
[BillingCity] [nvarchar](100) NOT NULL,
[BillingZip] [varchar](16) NOT NULL,
[BillingState] [nvarchar](2) NOT NULL,
[shippingFirstName] [nvarchar](50) NULL,
[shippingLastName] [nvarchar](50) NULL,
[shippingAddress] [nvarchar](100) NULL,
[shippingCity] [nvarchar](100) NULL,
[shippingState] [nvarchar](2) NULL,
[shippingZip] [nvarchar](20) NULL,
[shippingPhone] [nvarchar](30) NULL,
Refactored to one table for User and a separate table for addresses bound by a foreign key Users.ID => Addresses.idUser
CREATE TABLE [dbo].[Addresses](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Type] [nchar](10) NOT NULL, // designates Billing or Shipping
[Formatted] [nchar](600) NOT NULL,
[Street] [nchar](100) NOT NULL,
[City] [nchar](100) NOT NULL,
[POBox] [nchar](50) NULL,
[Region] [nchar](50) NULL,
[PostalCode] [nchar](50) NULL,
[Country] [nchar](50) NULL,
[ExtendedAddress] [nchar](100) NULL,
[idUser] [int] NULL,
How do I tell SSIS to import a record to the simplified User object and then create 2 addresses records; one with the Shipping info and the other with Billing?
I'd want preserve existing ID Key.
thx

Multicast your source data.
Add a Derived Column component to each output stream. In your mind, designate one as the "Billing Address" stream and one as the "Shipping Address" stream.
Add a new column named "Type", hardcoded to "Billing" and "Shipping", respectively.
Add a destination component to each streams both pointing to your Address table.
Map the appropriate columns in each stream (ie BillingCity to the City in the "billing address" stream, ShippingCountry to the Country in the "shipping address" stream, etc.)

Related

By using query, how do I create a table in a database instead of a schema in SSMS

I know using the CREATE TABLE [example_schema].[example table] creates the table in the schema but I want to create the table in a database instead, but I don't know the syntax
CREATE TABLE Royal_Poly_DB.staff_relation (
"staff_no" CHAR(4) NOT NULL,
"staff_name" VARCHAR(100) NOT NULL,
"supervisor" CHAR(4) NULL,
"dob" DATE NOT NULL,
"grade" CHAR(5) NOT NULL,
"marital_status" CHAR(1) NOT NULL,
"pay" DECIMAL(7,2) NULL,
"allowance" DECIMAL(7,2) NULL,
"hourly_rate" DECIMAL(7,2) NULL,
"gender" CHAR(1) NOT NULL,
"citizenship" VARCHAR(10) NOT NULL,
"join_yr" INT NOT NULL,
"dept_cd" VARCHAR(5) NOT NULL,
"type_of_employment" CHAR(2) NOT NULL,
"highest_qln" VARCHAR(10) NOT NULL,
"designation" VARCHAR(20) NOT NULL,
PRIMARY KEY (staff_no))
Right click the database in the object explorer and select
"New Query" then add in you code there

Internal database summation

So basically I have this table Class_Sumations where I need to come up with some way to calculate the sum of the NumberofClassesTaken and the AmountPaid columns from the other tables...into the respectful columns in the last table.
The NumberofClassesTaken column would just sum all the classes the Client took and give a number. The AmountPaid column would sum all the dollar amounts spent on classes by a given client and give a total.
So for example Client 1 could have taken 2 yoga classes and paid $200 total so the NumberofClassesTaken would show 2, and the AmountPaid column would show 100.
I tried doing SELECT and sum but I always get the
Error code 1064, SQL state 42000: You have an error in your SQL syntax
thrown back at me.
Can anyone point me in the right direction here? (Keep in mind I'm relatively new to this so if you have some documentation to what you're suggesting that would be awesome too)
PS: I omitted the TRAINER table because it is a separate total from the club classes so you can ignore that foreign key in CLUB_CLASSES.
CREATE TABLE CLIENT(
ClientNumber INT(50) NOT NULL,
ClientLastName VARCHAR(50) NOT NULL,
ClientFirstName VARCHAR(50) NOT NULL,
ClientPhone VARCHAR(50) NOT NULL,
ClientEmail VARCHAR(50) NOT NULL,
ClientState VARCHAR(50) NOT NULL,
ClientCity VARCHAR(50) NOT NULL,
ClientStreet VARCHAR(50) NOT NULL,
ClientAddress VARCHAR(50) NOT NULL,
ClientZipCode INT NOT NULL,
PRIMARY KEY (ClientNumber));
INSERT INTO CLIENT
VALUES
('1','Marget','Michael','7703399207','MM#gmail.com',
'Kentucky','Merlin','Wending Way','312 Wending Way','30144');
INSERT INTO CLIENT
VALUES
('2','Squarepants','Spongebob','7701274532','SS#gmail.com',
'Kentucky','Merlin','Pineapple Under the Sea Way',
'856 Pineapple Under the Seas Way','30122');
CREATE TABLE CLUB_CLASSES(
ClassID INT(50) NOT NULL,
InstructorTrainerID INT(50) NOT NULL,
ClassName VARCHAR(50) NOT NULL,
ClassStartDate date NOT NULL,
ClassEndDate date NOT NULL,
ClassCost VARCHAR(50) NOT NULL,
PRIMARY KEY(ClassID),
FOREIGN KEY (InstructorTrainerID) REFERENCES TRAINER(TrainerID));
INSERT INTO CLUB_CLASSES
VALUES ('3501','1154','Yoga','1/3/13','1/5/13','100');
INSERT INTO CLUB_CLASSES
VALUES ('3502','2856','Pillate','1/3/13','2/5/13','50');
CREATE TABLE CLASS_SUMMATIONS(
ClassID INT(50) NOT NULL,
ClientID INT(50) NOT NULL,
NumberofClassesTaken VARCHAR(50) NOT NULL,
AmountPaid VARCHAR(50) NOT NULL,
PRIMARY KEY(ClassID, ClientNumber),
FOREIGN KEY(ClientNumber) REFERENCES CLIENT(ClientNumber));
INSERT INTO CLIENT_SUMMATIONS (
'3501','1',
'SELECT SUM(CLUB_CLASSES.ClassID','SELECT SUM(CLUB_CLASSES.ClassCost)');
INSERT INTO CLIENT_SUMMATIONS('3502','2','xxx','xxxx');

database desigining for city details from multiple web service

I am working on travel application, so we have to deal with different web services like GTA, Gallileo, Kuoni etc . for getting information regarding Hotel details.
Each web service has its own list of city code and city name.
I want to design a table to store the city details from different web service, after some research I came to these two approaches
1st approach
CREATE TABLE [dbo].[City](
[CityID] [int] NOT NULL,
[CountryCode] [varchar](5) NOT NULL,
[AppCityCode] [varchar](10) NOT NULL,
[AppCityName] [varchar](200) NOT NULL,
[GTACityCode] [varchar](10) NULL,
[GTACityName] [varchar](200) NULL,
[GWSCityCode] [varchar](10) NULL,
[GWSCityName] [varchar](200) NULL,
[KuoniCityCode] [varchar](10) NULL,
....
....
....
....
....
....
)
In this approach when ever a new webservice is added then two columns (city code and city name) corresponding to the webservice is added, due to this modification and there will be a change in stored procedure and in frontend application code.
There will be no duplication while loading the cities in the textbox
2nd Approach
WSSupplier table is used to store Webservice details like GTA, Gallileo..
CREATE TABLE [dbo].[WSSupplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[City](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[AppCityCode] [varchar](20) NULL,
[AppCityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL,
[WSSupplierID] [smallint] NULL,
[WSCityCode] [varchar](20) NULL,
[WSCityName] [varchar](150) NULL
)
In the 2nd approach the cities will be added row by row with corresponding web service supplier ID
If new webservice come then I don't have to modify the table structure or in frontend application.
While loading cities I have to use DISTINCT to load unique city in the textbox or dropdown in frontend
In both approach I am using Appcitycode and Appcityname this will load the city textbox or dropdown in the application. While selecting the Appcityname, it will get the corresponding web service city code and send it as request to the webservice to search a hotel in a particular city.
I want to know which will be the best approach or if there is any other good approach
A third approach would be to create an intersection table between your city table and your supplier table that lists the supplier's version of the city code.
Your city table would just have your own system's city identifier. The city would appear only once. Each time you add a supplier you insert new records into the intersection table with the city codes for the cities that supplier cares about. The translation of a supplier city code to your internal city code is a simple lookup in the intersection table.
Consider something like this:
CREATE TABLE [dbo].[WSSupplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[City](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL
)
CREATE TABLE [dbo].[SupplierCityCode](
[CityID] [int] NOT NULL,
[WSSupplierID] [smallint] NULL,
[WSCityCode] [varchar](20) NULL,
[WSCityName] [varchar](150) NULL,
FOREIGN KEY [fk_city] [CityID] REFERENCES [dbo].[City],
FOREIGN KEY [fk_supplier] [WSSupplierID] REFERENCES [dbo].[WSSupplier]
)
Your question is about application and database design. From the application design point of view try to abstract from database design and think about it as some storage for your business objects. From database design point of view your question is about Database normalization - start from this article at Wikipedia as a gate to big world of database design. As for me:
CREATE TABLE [dbo].[Supplier](
[SupplierID] [smallint] NOT NULL,
[SupplierName] [varchar](100) NOT NULL
)
CREATE TABLE [dbo].[AppCity](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL,
[CountryCode] [varchar](10) NULL,
)
CREATE TABLE [dbo].[SupplierCity](
[CityID] [int] IDENTITY(1,1) NOT NULL,
[SupplierID] [smallint] NOT NULL,
[CityCode] [varchar](20) NULL,
[CityName] [varchar](150) NULL
)

SQL Server 2008 - Too much denormalization and over Indexing: What use is there for the Matrix?

I have a budding developer who is very enthusiastic about something he is calling “the matrix”
I am looking for peer insight
In a nutshell this is what we have:
- 1 highly denormalized table with about 120 columns
- Data points range from account, customer, household, relationship, product, employee, etc…
- One index per column: about 120 non-clustered indexes
- About 90% of all space in the database used by indexes today are indexes on this table
- Today about 1.5 million rows with a lot of nulls
- Table loaded with a stored procedure whose core is dynamic SQL
- All Field names are generic and do not describe the data
- A data dictionary type table is used with the dynamic SQL to load any data point to any field
- Field mapping is not static: today column dim_0001 is customer name, but tomorrow maybe something else
- No primary key
- No foreign keys
- No real constraints (For example all fields are nullable)
The argument for the table:
- Makes writing queries simpler because it eliminates the needs to write some join
The intended use:
- An End User Layer and would be a core component of a Universe build in Business Objects
- Post ETL process development
My recommendation will either kill the process where it is today (early development in a test environment) or move it to the next step in test.
Based on the research I have done, my education, and experience I do not support it and want the tables dropped as soon as the one or two processes that depend on these tables have been migrated to another solution.
Script below for your reference (I limited to one index example).
Any insight you can offer (even just a one word opinion) is valuable
-- The Matrix
CREATE TABLE [z005497].[tblMatrix](
[as_of_dt] [datetime] NOT NULL,
[dim_0001] [varchar](100) NULL,
[dim_0002] [varchar](103) NULL,
[dim_0003] [varchar](100) NULL,
[dim_0004] [varchar](100) NULL,
[dim_0005] [varchar](100) NULL,
[dim_0006] [varchar](100) NULL,
[dim_0007] [varchar](100) NULL,
[dim_0008] [varchar](100) NULL,
[dim_0009] [varchar](100) NULL,
[dim_0010] [varchar](100) NULL,
[dim_0011] [varchar](100) NULL,
[dim_0012] [varchar](100) NULL,
[dim_0013] [varchar](100) NULL,
[dim_0014] [varchar](100) NULL,
[dim_0015] [varchar](100) NULL,
[dim_0016] [varchar](100) NULL,
[dim_0017] [varchar](103) NULL,
[dim_0018] [varchar](103) NULL,
[dim_0019] [varchar](103) NULL,
[dim_0020] [varchar](103) NULL,
[dim_0021] [varchar](103) NULL,
[dim_0022] [varchar](103) NULL,
[dim_0023] [varchar](103) NULL,
[dim_0024] [varchar](103) NULL,
[dim_0025] [varchar](103) NULL,
[dim_0026] [varchar](11) NULL,
[dim_0027] [varchar](11) NULL,
[dim_0028] [varchar](11) NULL,
[dim_0029] [varchar](11) NULL,
[dim_0030] [varchar](11) NULL,
[dim_0031] [varchar](11) NULL,
[dim_0032] [varchar](11) NULL,
[dim_0033] [varchar](11) NULL,
[dim_0034] [varchar](11) NULL,
[dim_0035] [varchar](11) NULL,
[dim_0036] [varchar](11) NULL,
[dim_0037] [varchar](11) NULL,
[dim_0038] [varchar](11) NULL,
[dim_0039] [varchar](11) NULL,
[dim_0040] [varchar](11) NULL,
[dim_0041] [varchar](11) NULL,
[dim_0042] [varchar](11) NULL,
[dim_0043] [varchar](11) NULL,
[dim_0044] [varchar](11) NULL,
[dim_0045] [varchar](11) NULL,
[dim_0046] [varchar](11) NULL,
[dim_0047] [varchar](11) NULL,
[dim_0048] [varchar](11) NULL,
[dim_0049] [varchar](11) NULL,
[dim_0050] [varchar](11) NULL,
[dim_0051] [varchar](11) NULL,
[dim_0052] [varchar](11) NULL,
[dim_0053] [varchar](11) NULL,
[dim_0054] [varchar](5) NULL,
[dim_0055] [varchar](5) NULL,
[dim_0056] [varchar](5) NULL,
[dim_0057] [varchar](5) NULL,
[dim_0058] [varchar](5) NULL,
[dim_0059] [varchar](5) NULL,
[dim_0060] [varchar](5) NULL,
[dim_0061] [varchar](5) NULL,
[dim_0062] [varchar](5) NULL,
[dim_0063] [varchar](5) NULL,
[dim_0064] [varchar](5) NULL,
[dim_0065] [varchar](5) NULL,
[dim_0066] [varchar](5) NULL,
[dim_0067] [varchar](5) NULL,
[dim_0068] [varchar](5) NULL,
[dim_0069] [varchar](5) NULL,
[dim_0070] [varchar](5) NULL,
[dim_0071] [varchar](5) NULL,
[dim_0072] [varchar](5) NULL,
[dim_0073] [varchar](5) NULL,
[dim_0074] [varchar](5) NULL,
[dim_0075] [varchar](5) NULL,
[dim_0076] [varchar](5) NULL,
[dim_0077] [varchar](5) NULL,
[dim_0078] [varchar](5) NULL,
[dim_0079] [varchar](5) NULL,
[dim_0080] [varchar](5) NULL,
[dim_0081] [varchar](5) NULL,
[dim_0082] [varchar](5) NULL,
[dim_0083] [varchar](5) NULL,
[dim_0084] [int] NULL,
[dim_0085] [int] NULL,
[dim_0086] [int] NULL,
[dim_0087] [int] NULL,
[dim_0088] [int] NULL,
[dim_0089] [int] NULL,
[dim_0090] [int] NULL,
[dim_0091] [int] NULL,
[dim_0092] [int] NULL,
[dim_0093] [int] NULL,
[dim_0094] [varchar](12) NULL,
[dim_0095] [varchar](12) NULL,
[dim_0096] [varchar](12) NULL,
[dim_0097] [varchar](120) NULL,
[dim_0098] [varchar](120) NULL,
[dim_0099] [varchar](120) NULL,
[dim_0100] [numeric](20, 0) NULL,
[dim_0101] [varchar](20) NULL,
[dim_0102] [varchar](20) NULL,
[dim_0103] [varchar](20) NULL,
[dim_0104] [varchar](20) NULL,
[dim_0105] [varchar](20) NULL,
[dim_0106] [varchar](20) NULL,
[dim_0107] [varchar](20) NULL,
[dim_0108] [varchar](20) NULL,
[dim_0109] [varchar](20) NULL,
[dim_0110] [varchar](20) NULL,
[dim_0111] [varchar](20) NULL,
[dim_0112] [varchar](20) NULL,
[dim_0113] [varchar](20) NULL,
[dim_0114] [varchar](20) NULL,
[dim_0115] [varchar](20) NULL,
[dim_0116] [varchar](20) NULL,
[dim_0117] [varchar](20) NULL,
[dim_0118] [varchar](20) NULL,
[dim_0119] [varchar](20) NULL,
[dim_0120] [varchar](20) NULL,
[lastLoad] [datetime] NULL
) ON [PRIMARY]
-- Index example
CREATE NONCLUSTERED INDEX [idx_dim_0001 (not unique)] ON [z005497].[tblMatrix]
(
[dim_0001] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
-- The configuration table from which developers would find out what is in the Matrix
CREATE TABLE [z005497].[tblMatrixCfg](
[dimId] [int] IDENTITY(100000,1) NOT NULL,
[colName] [varchar](25) NOT NULL,
[dataType] [varchar](25) NOT NULL,
[dimName] [varchar](25) NOT NULL,
[dimDesc] [varchar](500) NOT NULL,
[dimpath] [varchar](5000) NOT NULL,
[loadDate] [datetime] NOT NULL,
[modUser] [varchar](100) NOT NULL,
[modDate] [datetime] NOT NULL,
CONSTRAINT [PK_tblMatrixCfg_1] PRIMARY KEY CLUSTERED
(
[dimId] ASC,
[colName] ASC,
[dimName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Kill it if you can.
Also, that developer needs a lot more experience. And he/she should get it at another company.
It's basically violating so many things I don't know where to start.
Even if you end up fighting a highly normalized model which is following someone's best practices slavishly, it won't compare to the disaster which this design is going to create.
Just to give one example of what Cade meant with "I don't know where to start" :
"today column dim_0001 is customer name, but tomorrow maybe something else"
This typically also means that in the User acceptance system, dim_0001 can be customer name (and the system might seem to work and get accepted), and then you move to production, and dim_0001 gets to be name of the president's wife or so, and then hours of meetings need to be spent trying to figure out (a) where the problem is, and (b) how to get it fixed in as little time as possible.
( (b) usually amounts to patching the code with stuff like "if col_name = dim_0001 then don't treat it as what the matrix says it is, but treat it as what is hardcoded here instead".)
"What use is there for the Matrix?"
Well, I certainly don't get it.
I have never seen anything like this before and I don't understand how it is meant to be used or how the indexes is meant to speed up anything or how it is possible to query this table without using at least self joins.
Call me inexperienced if you like but this is a first for me. I would think that if this is the way to do things, the db vendors should not put so much effort into allowing us developers to define tables, with columns that have different data types, with relationships.
This is the result of trying to stuff an object oriented paradigm into a relational system. Document databases allow for this sort of programming:
Documents inside a document-oriented database are similar, in some
ways, to records or rows, in relational databases, but they are less
rigid. They are not required to adhere to a standard schema nor will
they have all the same sections, slots, parts, keys, or the like. For
example here's a document:
FirstName="Bob", Address="5 Oak St.", Hobby="sailing".
Another document could be:
FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8},
{Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].
Both documents have some similar information and some different.
Unlike a relational database where each record would have the same set
of fields and unused fields might be kept empty, there are no empty
'fields' in either document (record) in this case. This system allows
new information to be added and it doesn't require explicitly stating
if other pieces of information are left out.
Trying to use this paradigm in a relational database is a "square peg, round hole" problem. A document database might be excellent for a highly transactional system, but analysis would be better served by loading the transactional data into various fact tables in a data warehouse.

Use hierarchyid to store address of a customer

I have a table named 'AddressDemo' to store address of a customer with the following fields,
CREATE TABLE [dbo].[AddressDemo](
[AddressID] [int] IDENTITY(1,1) NOT NULL,
[State] [nvarchar](50) NULL,
[District] [nvarchar](50) NULL,
[Taluk] [nvarchar](50) NULL,
[Village] [nvarchar](50) NULL,
[Street1] [nvarchar](50) NULL,
[Street2] [nvarchar](50) NULL,
[Phone] [nvarchar](50) NULL,
[Mobile] [nvarchar](50) NULL,
[Email] [nvarchar](50) NULL,
CONSTRAINT [PK_AddressDemo] PRIMARY KEY CLUSTERED
(
[AddressID] ASC
))
Where there is a hierarchy exists, which is akin to
State --> District --> Taluk --> Village --> Street1 --> Street2
Isn't it a good idea to keep a separate table to store the hierarchy so that we can avoid duplication of data. How is the following
CREATE TABLE [dbo].[LocationDemo](
[LocationID] [int] IDENTITY(1,1) NOT NULL,
[LocationNodeID] [hierarchyid] NULL,
[Location] [nvarchar](50) NULL,
CONSTRAINT [PK_LocationDemo] PRIMARY KEY CLUSTERED
(
[LocationID] ASC
))
So the 'AddressDemo' will look like the following
CREATE TABLE [dbo].[AddressDemo](
[AddressID] [int] IDENTITY(1,1) NOT NULL,
[LocationID] [int] NULL,
[Phone] [nvarchar](50) NULL,
[Mobile] [nvarchar](50) NULL,
[Email] [nvarchar](50) NULL,
CONSTRAINT [PK_AddressDemo] PRIMARY KEY CLUSTERED
(
[AddressID] ASC
))
and LocationID of AddressDemo reference to LocationID of LocationDemo.
While your proposed solution is more dynamic than the flattened solution you described I would not go with a completely dynamic schema for locations in this case. Adding hierarchical processing is not something to be done without good reason because it complicates your database queries later on and limits your performance optimisation alternatives (views containing CTEs cannot be indexed, and you would need views to reasonably consume this data by your application).
If you're talking about a low volume system or one in which the number of addresses being stored is small you can play with the dynamic address element route, but considering the fact that no one address would logically exist without the majority of the location elements I would again say it's overkill.
Go for a more normalized route without going overboard. Consider making a State table and a FK to that table from Address, a District table and a FK and so on...