SSIS Surrogate Key incrementation

SSIS Surrogate Key incrementation - ssis

I'm using SSIS to create a star schema for a data warehouse with surrogate keys (sg).
My process goes like this:
find max sg (using SQL)
in data flow: data source-> c# script that adds +1 to the max sg -> write to destination.
Now, with fixed dimensions it works without problems. Every added row gets the sequential sg.
However when I use the Slowly Changing Dimension and historically updating a row I get the following:
sg_key | name | city | current_row
1 | a | X | true
2 | b | Y | true
3 | c | Z | false
4 | d | H | true
7 | c | T | true
Now, correct me if I'm wrong but I always thought SSIS is pushing one row at a time through all the flow tasks, but it looks like it first generates ALL the sg_keys for all the rows and then sends the updated row through the flow.
Do I understand how SSISworks in a wrong way? How can I fix it?
Cheers,
Mark.

If you use SQL Server as a destination, why didn't use an IDENTITY Column? (instead of a C# script)
https://msdn.microsoft.com/en-us/library/ms186775.aspx
Identity will automatically increment your column when you insert a new row. If you don't update this column, the value will not change.
Arnaud

Related

mySQL - Reiteratively Count rows that have particular CSV string

2-column MySQL Table:
| id| class |
|---|---------|
| 1 | A,B |
| 2 | B,C,D |
| 3 | C,D,A,G |
| 4 | E,F,G |
| 5 | A,F,G |
| 6 | E,F,G,B |
Requirement is to generate a report/output which tells which individual CSV value of class column is in how many rows.
For example, A is present in 3 rows (with id 1,3,5), and C is present in 2 rows (with id 2,3), and G is in 4 rows (3,4,5,6) so the output report should be
A - 3
B - 3
C - 2
...
...
G - 4
Essentially, column id can be ignored.
The draft that I can think of - first all the values of class column need to picked, split on comma, then create a distinct list of each unique value (A,B,C...), and then count how many rows contain the unique value from that distinct list.
While I know basic SQL queries, this is way too complex for me. Am unable to match it with some CSV split function in MySQL. (Am new to SQL so don't know much).
An alternative approach I made it to work - Download class column values in a file, feed it to a perl script which will create a distinct array of A,B,C, then read the downloaded CSV file again foreach element in distinct array and increase the count, and finally publish the report. But this is in perl which will be a separate execution, while the client needs it in SQL report.
Help will be appreciated.
Thanks

You may try split-string-into-rows function to get distinct values and use COUNT function to find number of occurrences. Specifically check here

MS Access: How to compare and filter data in a column

I am a new user and here is my first question,
I have newly started working on MS access and I am having problems to filter maximum of a column data but according to the data in an another column as well.
Let me explain the situation with a test data:
Table consists of Column A, is a short text, and column B is an integer,
Test Data
With a query, i want to filter out only AA-02, BB-04 and CC-06,
I can compare values in a column very easily in excel however i am having problems in Access,
Thanks for your time in advance.
Best Regards,
M.ER

assuming you want the last instance of column B this is a simple sql Totals query. Using the Query Designer:
In the SQL Tab (not shown but bottom right of the query designer)
SELECT Test.ColumnA, Last(Test.ColumnB) AS ColumnB
FROM Test
GROUP BY Test.ColumnA;
Result:
| ColumnA | ColumnB |
| AA | 2 |
| BB | 4 |
| CC | 6 |

handled dynamically missing source columns in ssis

I have a small SSIS question. I'm extracting data from a MySQL table with a varying column list to a SQL Server table with a fixed column list.
source table: Test(mysql server)
id | name | sal | deptno | loc | referby
1 | abc | 100 |10 | hyd | xyz
2 | mnc | 200 |20 |chen | pqr
First I select MySQL table configuration, then I drag and drop oledbdestination for MySQL server table configuration. I configure the target table, and after that the package works fine and the data looks like below.
Target table : Test (sql server )
id | name | sal |deptno | loc |referby
1 | abc | 100 |10 | hyd | xyz
2 | mnc | 200 |20 |chen | pqr
The second time I run the package, a column has been removed from the source table's schema, so the package fails. I open the MySql server testsource configuration and I edit the query to return NULL for the missing column:
select id,'null' as name,sal,deptno,loc,referby from test
I rerun the package and the data looks like this.
Target table : Test (sql server )
id | name | sal |deptno | loc |referby
1 | null | 100 |10 | hyd | xyz
2 | null | 200 |20 |chen | pqr
I always truncate the target table and load data.
The target table has an unchanging list of columns while the source table's column list can vary. I do not want keep editing the query to account for possible missing columns. How I can handle this at the package level?

A couple ideas:
Use dynamic SQL. Replace your straightforward SELECT ... with a query that iterates through the target table's column list (perhaps fetched via SHOW COLUMNS), builds a SELECT query that inserts NULL for the missing columns then execute it via PREPARE and EXECUTE.
The query-generating query would need to produce a SELECT statement containing the fixed set of columns your target table expects to see. If an expected column doesn't exist in the source, the query-generating query should insert the placeholder NULL AS ColumnName in the query.
(I'm not a MySQL expert so I'm unsure of MySQL's exact capabilities in this regard but in theory this approach sounds workable.)
Use a Script Component as the data source. Configure this component with the output columns you expect. Have the component query the source database (maybe using a simple SELECT * FROM ....) and then copy only the relevant columns that exist from source to output row buffer. With this approach, columns that don't exist will automatically be outputted into the data flow as null/their default value because the Script Component won't have set them to a value.

SSIS is very rigid when it comes to dynamic sources like this. I think your best bet would be to explore BIML which could generate a new package for you each time you need to "refresh" the schema.
http://www.sqlservercentral.com/stairway/100550/

How do I resolve or avoid need for MySQL with multiple AUTO INCREMENT columns?

I have put a lot of effort into my database design, but I think I am
now realizing I made a major mistake.
Background: (Skip to 'Problem' if you don't need background.)
The DB supports a custom CMS layer for a website template. Users of the
template are limited to turning pages on and off, but not creating
their own 'new' pages. Further, many elements are non editable.
Therefore, if a page has a piece of text I want them to be able to edit,
I would have 'manually' assigned a static ID to it:
<h2><%= CMS.getDataItemByID(123456) %></h2>
Note: The scripting language is not relevant to this question, but the design forces
each table to have unique column names. Hence the convention of 'TableNameSingular_id'
for the primary key etc.
The scripting language would do a lookup on these tables to find the string.
mysql> SELECT * FROM CMSData WHERE CMSData_data_id = 123456;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| 1 | 123456 | 1 |
+------------+-----------------+-----------------------------+
mysql> SELECT * FROM CMSDataTypes WHERE CMSDataType_type_id = 1;
+----------------+---------------------+-----------------------+------------------------+
| CMSDataType_id | CMSDataType_type_id | CMSDataType_type_name | CMSDataType_table_name |
+----------------+---------------------+-----------------------+------------------------+
| 1 | 1 | String | CMSStrings |
+----------------+---------------------+-----------------------+------------------------+
mysql> SELECT * FROM CMSStrings WHERE CMSString_CMSData_data_id=123456;
+--------------+---------------------------+----------------------------------+
| CMSString_id | CMSString_CMSData_data_id | CMSString_string |
+--------------+--------------------------------------------------------------+
| 1 | 123456 | The answer to the universe is 42.|
+--------------+---------------------------+----------------------------------+
The rendered text would then be:
<h2>The answer to the universe is 42.</h2>
This works great for 'static' elements, such as the example above. I used the exact same
method for other data types such as file specifications, EMail Addresses, Dates, etc.
However, it fails for when I want to allow the User to dynamically generate content.
For example, there is an 'Events' page and they will be dynamically created by the
User by clicking 'Add Event' or 'Delete Event'.
An Event table will use keys to reference other tables with the following data items:
Data Item: Table:
--------------------------------------------------
Date CMSDates
Title CMSStrings (As show above)
Description CMSTexts (MySQL TEXT data type.)
--------------------------------------------------
Problem:
That means, each time an Event is created, I need to create the
following rows in the CMSData table;
+------------+-----------------+-----------------------------+
| CMSData_id | CMSData_data_id | CMSData_CMSDataType_type_id |
+------------+-----------------+-----------------------------+
| x | y | 6 | (Event)
| x+1 | y+1 | 5 | (Date)
| x+2 | y+2 | 1 | (Title)
| x+3 | y+3 | 3 | (Description)
+------------+-----------------+-----------------------------+
But, there is the problem. In MySQL, you can have only 1 AUTO INCREMENT field.
If I query for the highest value of CMSData_data_id and just add 1 to it, there
is a chance there is a race condition, and someone else grabs it first.
How is this issue typically resolved - or avoided in the first place?
Thanks,
Eric

The id should be meaningless, except to be unique. Your design should work no matter if the block of 4 ids is contiguous or not.
Redesign your implementation to add the parts separately, not as a block of 4. Doing so should simplify things overall, and improve your scalability.

What about locking the table before writing into it? This way, when you are inserting a row in the CMSData table, you can get the last id.
Other suggestion would be to not have an incremented id, but a unique generated one, like a guid or so.
Lock Tables

How to get the right "version" of a database entry?

Update: Question refined, I still need help!
I have the following table structure:
table reports:
ID | time | title | (extra columns)
1 | 1364762762 | xxx | ...
Multiple object tables that have the following structure
ID | objectID | time | title | (extra columns)
1 | 1 | 1222222222 | ... | ...
2 | 2 | 1333333333 | ... | ...
3 | 3 | 1444444444 | ... | ...
4 | 1 | 1555555555 | ... | ...
In the object tables, on an object update a new version with the same objectID is inserted, so that the old versions are still available. For example see the entries with objectID = 1
In the reports table, a report is inserted but never updated/edited.
What I want to be able to do is the following:
For each entry in my reports table, I want to be able to query the state of all objects, like they were, when the report was created.
For example lets look at the sample report above with ID 1. At the time it was created (see the time column), the current version of objectID 1 was the entry with ID 1 (entry ID 4 did not exist at that point).
ObjectID 2 also existed with it's current version with entry ID 2.
I am not sure how to achieve this.
I could use a query that selects the object versions by the time column:
SELECT *
FROM (
SELECT *
FROM objects
WHERE time < [reportTime]
ORDER BY time DESC
)
GROUP BY objectID
Lets not talk about the performance of this query, it is just to make clear what I want to do. My problem is the comparison of the time columns. I think this is no good way to make sure that I got the right object versions, because the system time may change "for any reason" and the time column would then have wrong data in it, which would lead to wrong results.
What would be another way to do so?
I thought about not using a time column for this, but instead a GLOBAL incremental value that I know the insertion order across the database tables.

If you are interting new versions of the object, and your problem is the time column(I assume you are using this column to sort which one is newer); I suggest you to use an auto-incremental ID column for the versions. Eventually, even if the time value is not reliable for you, the ID will be.Since it is always increasing. So higher ID, newer version.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SSIS Surrogate Key incrementation - ssis

If you use SQL Server as a destination, why didn't use an IDENTITY Column? (instead of a C# script) https://msdn.microsoft.com/en-us/library/ms186775.aspx Identity will automatically increment your column when you insert a new row. If you don't update this column, the value will not change. Arnaud

Related

mySQL - Reiteratively Count rows that have particular CSV string

MS Access: How to compare and filter data in a column

handled dynamically missing source columns in ssis

How do I resolve or avoid need for MySQL with multiple AUTO INCREMENT columns?

How to get the right "version" of a database entry?

Categories

Resources