Different kind of incremental load - ssis

Please look at the following staging table. It has multiple rows for the same policy.
Data to this table is loaded from a flat file I receive from external sources.
Column values can change between one row to the next row. See ColA. There could be limited columns populated in the first row. More columns will be populated in the next rows. See columns ColB and ColC, they are null initially and are populated in second and third rows.
`CREATE TABLE #Stg
(
PolicyNum VARCHAR(10) ,
ColA VARCHAR(10) ,
ColB VARCHAR(10) ,
ColC VARCHAR(10) ,
TimeStampKey VARCHAR(100)
)
INSERT #Stg
( PolicyNum, ColA, ColB, ColC, TimeStampKey )
VALUES ( 'MDT1000', 'SomeVal_A1', NULL, NULL, '2013041113033140MDT1000ZA' )
,
( 'MDT1000', 'SomeVal_A2', 'SomeVal_B', NULL, '2013041113051756MDT1000ZA' )
,
( 'MDT1000', 'SomeVal_A3', 'SomeVal_B', 'SomeVal_C', '2013041113115418MDT1000ZA' )`
From this staging table I need to load data to a destination table while maintaing history. Destination table is basically a type 2 slowly chaning dimension. In other words, I've load the first row from staging because it doesn't exist and update it with the second row and update again with the third row.
Folliwing is an example of destination schema:
CREATE TABLE #Dst
(
PolicyKey INT IDENTITY(1,1) PRIMARY KEY
, PolicyNum VARCHAR(10)
, ColA VARCHAR(10)
, ColB VARCHAR(10)
, ColC VARCHAR(10)
, IsActive BIT
, RowStartDate DATETIME
, RowEndDate DATETIME
)
Normally I'd write a merge statement or an SSIS package to handle incremental loads and scd dimensions, but since original record and change records are in the same file the standard approach doesn't work.
I'd appreciate if you can throw some light on how to approach this. I'm trying to avoid row by row operations.
Thanks,
Sam.

try this:
SELECT
Stg.*
FROM Stg
INNER JOIN
(
SELECT PolicyNum, MAX (TimeStampKey) AS MAX_TimeStampKey
FROM Stg
GROUP BY PolicyNum
) T
ON T.PolicyNum = Stg.PolicyNum
AND T.MAX_TimeStampKey = Stg.TimeStampKey
The result:
PolicyNum ColA ColB ColC TimeStampKey
---------- ---------- ---------- ---------- -------------------
MDT1000 SomeVal_A3 SomeVal_B SomeVal_C 2013041113115418MDT1000ZA
Please let us know if this helped you.

Related

Rename columns in SQL

I want to (directly) generate a table with the content from 2 columns of another table. How ca I change the names of the columns (rename them in the new table)?
Here´s an example:
CREATE TABLE X AS
SELECT
Table1.name,
Table1.act
FROM Y
->I don´t want to name the columns "name" and "act" as in the original table - I want to have "name" replaced by "customer" and "act" replaced by "status"
Do it like this:
CREATE TABLE X AS
SELECT
Table1.name as customer,
Table1.act as status
FROM Y
You could just specify name aliases in the query:
CREATE TABLE X AS
SELECT
Table1.name AS customer,
Table1.act AS status
FROM Y
Alternatively, you could specify the column definitions in brackets after the table name:
CREATE TABLE X (customer VARCHAR(10), status VARCHAR(10)) AS
SELECT
Table1.name,
Table1.act
FROM Y
You will need to use a While Loop to do that:
$my_query = "SELECT * FROM tj_ethnicity FROM Table_X";
while($row = mysql_fetch_assoc($my_query))
{
$new_query = "CREATE TABLE `."$row['customer']".` (
`customerid` int(11) NOT NULL AUTO_INCREMENT,
`status` varchar(50) DEFAULT ."$row['status']".,
`created` datetime NOT NULL,
PRIMARY KEY (`customerid`)
)";
}

How to MERGE records in MYSQL

I'm able to use MERGE statement in both Oracle and MSSQL. Right now I have to use MYSQL. Does MYSQL has similar statement to merge data.
Lets say I have two tables:
create table source
(
col1 bigint not null primary key auto_increment,
col2 varchar(100),
col3 varchar(50)
created datetime
)
create table destination
(
col1 bigint not null primary key auto_increment,
col2 varchar(100),
col3 varchar(50)
created datetime
)
Now I want move all data from "source" to "destination". If record already exists in "destination" by key I need update, otherwise I need insert.
In MSSQL I use the following MERGE statement, similar can be used in ORACLE:
MERGE destination AS TARGET
USING(SELECT * FROM source WHERE col2 like '%GDA%') AS SOURCE
ON
(TARGET.col1 = SOURCE.col1)
WHEN MATCHED THEN
UPDATE SET TARGET.col2 = SOURCE.col2,
TARGET.col3 = SOURCE.col3
WHEN NOT MATCHED THEN
INSERT INTO
(col2,col3,created)
VALUES
(
SOURCE.col2,
SOURCE.col3,
GETDATE()
)OUTPUT $action INTO $tableAction;
WITH mergeCounts AS
(
SELECT COUNT(*) cnt,
MergeAction
FROM #tableAction
GROUP BY MergeAction
)
SELECT #Inserted = (SELECT ISNULL(cnt,0) FROM mergeCounts WHERE MergeAction = 'INSERT'),
#Updated = (SELECT ISNULL(cnt,0) FROM mergeCounts WHERE MergeAction = 'UPDATE'),
#Deleted = (SELECT ISNULL(cnt,0) FROM mergeCounts WHERE MergeAction = 'DELETE')
so here I'm update records if exists and insert if new record. After MERGE statement I also able to count how many records was inserted, updated ...
Does it possible to have such implementation in MYSQL ??
Mysql has insert ... on duplicate key update ... syntax. use it like this:
insert into destination(col1, col2, col3, created)
select *
from source
on duplicate key update
col2 = values(col2),
col3 = values(col3),
created = values(created);
demo here
to get the number of affected rows, run select row_count() afterwards

Does my Full-Text Index already contain a particular value?

I've got a SQL 2008 R2 table defined like this:
CREATE TABLE [dbo].[Search_Name](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](300) NULL),
CONSTRAINT [PK_Search_Name] PRIMARY KEY CLUSTERED ([Id] ASC))
Performance querying the Name field using CONTAINS and FREETEXT works well.
However, I'm trying to keep the values of my Name column unique. Searching for an existing entry in the Name column is unbelievably slow for a large number of names (usually batches of 1,000), even with an index on the Name field. Query plans indicate I'm using the index as expected.
To search for an existing value, my query looks like this:
SELECT TOP 1 Id, Name from Search_Name where Name = 'My Name Value'
I've tried duplicating the Name column to another column and searching on the new column, but the net effect was the same.
At this point, I'm thinking I must be mis-using this feature.
Should I just stop trying to prevent duplication? I'm using a linking table to join these search name values to the underlying data. It seems somehow 'dirty' to just store a whole bunch of duplicate values...
...or is there faster way to take a list of 1,000 names and see which ones are already stored in the database?
The first change to make is to get the entire list to SQL Server at one time. Regardless of how you add the names to the existing table, doing it as a set operation will make a big difference in performance.
Passing the List as a table-valued parameter (TVP) is a clean way to handle it. Have a look here for an example. You can still use an OUTPUT clause to track which rows did or didn't make the cut, for example:
-- Some sample existing names.
declare #Search_Name as Table ( Id Int Identity, Name VarChar(32) );
insert into #Search_Name ( Name ) values ( 'Bob' ), ( 'Carol' ), ( 'Ted' ), ( 'Alice' );
select * from #Search_Name;
-- Some (prospective) new names.
declare #New_Names as Table ( Name VarChar(32) );
insert into #New_Names ( Name ) values ( 'Ralph' ), ( 'Alice' ), ( 'Ed' ), ( 'Trixie' );
select * from #New_Names;
-- Add the unique new names.
declare #Inserted as Table ( Id Int, Name VarChar(32) );
insert into #Search_Name
output inserted.Id, inserted.Name into #Inserted
select New.Name
from #New_Names as New left outer join
#Search_Name as Old on Old.Name = New.Name
where Old.Id is NULL;
-- Results.
select * from #Search_Name;
-- The names that were added and their id's.
select * from #Inserted;
-- The names that were not added.
select New.Name
from #New_Names as New left outer join
#Inserted as I on I.Name = New.Name
where I.Id is NULL;
Alternatively, you could use a MERGE statement and OUTPUT the names that were added, those that weren't, or both.

Merge and concat data at the same time

What I am trying to do is the following: I have a database with 3 cols that together form a unique combination. This combination I extracted to a new table (table 1). Now I would like to match the data that is present in seperate columns to the unique 3 col combination. For example:
cat dog day twenty two
cat dog day twenty eleven
cat dog morning eleven ten
should become
cat dog day "twenty=two&twenty=eleven"
cat dog morning "eleven=ten"
As an extra comment I should add that I am unable to predict how many items should be concatenated.
I tried the following: The string field should be updated with a concat of all results of the unique combination of val3,val4 and val5.
UPDATE `db`.`table1` , `db`.`table2`
SET
string =
(
SELECT group_concat(value1,'=',value2,'&') from table2
group by (val3,val4,val5)
)
WHERE
(
`table1`.`val3`=`table2`.`val3` AND
`table1`.`val4=table2`.`val4` AND
`table1`.`val5=table2`.`val5`)
;
A hint or tip would be much appreciated. Thansks in advance.
For reference : Solution does not work for me since I'm working with mySQL and I need to match 2 col's together.
Given this data structure:
CREATE TABLE `dummy` (
`col1` varchar(20),
`col2` varchar(20),
`key` varchar(20) DEFAULT NULL,
`val` varchar(20) DEFAULT NULL
);
The following query will give you what you are asking for:
select col1, col2, group_concat(concat(`key`,'=', `val`) SEPARATOR '&') from dummy group by col1, col2

Create a table based on few columns of another table, but also add some additional columns

I know I can do something like this:
CREATE TABLE new_table AS (
SELECT field1, field2, field3
FROM my_table
)
I'm wondering how do I add more columns to this create table SQL, that are not from my_table, but instead ones that I would write my self and which would be unique to this new_table only.
I know I could just make the table with the above SQL and then additionaly (after the command is completed) add the necessary columns, but am wondering if this all could be done in one command, maybe something like this (tried it like that, but didn't work):
CREATE TABLE new_table AS (
(SELECT field1, field2, field3
FROM my_table),
additional_field1 INTEGER NOT NULL DEFAULT 1,
additional_field2 VARCHAR(20) NOT NULL DEFAULT 1
)
You can also explicitly specify the data type for a generated column:
See Create Table Select Manual
CREATE TABLE new_table
(
additional_field1 INTEGER NOT NULL DEFAULT 1,
additional_field2 VARCHAR(20) NOT NULL DEFAULT 1
)
AS
(
SELECT id, val,
1 AS additional_field1,
1 AS additional_field2
FROM my_table
);
Example: SQLFiddle