Slowly Changing Dimension Transform in SSIS won't update - ssis
I used the following from a CSV to test the SCD. I thought it would recognize the LocationIDs and update the records where necessary. But it did not. It only inserts new records.
I'm using Visual Studio 2010 and SQL Server 2012 with Win Authentication (I assume its not a permissions issue because it doesn't seem to be acknowledging the changes to the historical data at all if you look at the pic of the executed package.) I also have Windows 7 Home Premium.
There was a lot of nulls in the original and this set also has changes but the changes are not committed. Also notice that when i add a new location, both are added even though the LocationIDs are the same.
Input into the SSIS package. Look no nulls! But data above was not updated.
LocationID,Locations,Address,City,State,Zip,Phone,Country,Region
9,Pluto Disney,5000 Out this World,PlanetRock,PL,85338,(902) 504-1747,US,SolarSystem
1,Disney Lend,159 Mickey Mouse Road,Orlando,FL,58741,(201) 345-1234,US,North
2,Disney Werld,98532 Donald Duck Boulevard,Los Angelos,SA,75523,(601) 375-1345,US,South
3,Disney Pleyground,449 Smoke Mountain Lane,Atlanta,GA,24747,(804) 375-1126,US,East
4,Cajun Desney,Jazz Land Avenue,New Orleans,LA,88888,(904) 325-1237,US,West
5,Wild West Desney,Magic Kingdom Street,Somewhere West,CO,21543,(804) 346-1274,US,Northwest
3,Disney Super Playground,449 Smoke Mountain Lane,Atlanta,GA,24747,(864) 375-1526,US,East
4,Cajen Disney,Jazz Land Avenue,New Orleans,LA,88888,(904) 525-1237,US,West
6,Winter Disney,0 Ice Land Avenue,New Orleans,LA,85588,(900) 507-1297,US,North
2,Disney World,98532 Donald Duck Boulevard,Los Angelos,CA,75523,(671) 375-1345,US,South
7,Desert Disney,100 Melting Pot Way,Phoenix,AZ,85338,(902) 504-1747,US,Southwest
9,Plutian Disney,5000 Out this World,PlanetRock,PL,85338,(902) 504-1747,US,SolarSystem
10,Martian Disney,3000 Rover Drive,RedRock,M,85338,(902) 504-1747,US,SolarSystem
Here are the pictures from my SCD Package
This is where I map all my incoming attributes to the Database attributes.
All most all the data is historical but NO UPDATES
For the next one I've tried different values, It doesn't make a difference which one i pick or if i deselect them all.
I've kept this the same (never changed)
I've enabled and disabled this one. No Results
The finished Screen
Ok I figured it out. I took some thinking through it.
If "Fail the transformation if it detects changes in fixed attributes" is selected as it is below, then the whole package will fail. If you deselect it, the package will run, but if the SCD transform detects changes to the fixed attribute, it will allow all the changes go through except where it detects changes in the Fixed attributes. SO WHAT THIS MEANS, it does not ERROR OUT or completely cancel the package the way it does when checked. But it STILL DOESN'T IGNORE or allow the other changes to take effect if that row has a changed fixed attribute.
The problem is that book I have suggested using a Derived column to create a DateCreated column with a GetDate() function in the Expression column of the Derived Column transform to determine when the column was originally created. The author then suggested that this column should be set as fixed (even though it wasn't actually fixed since it will always enter the SCD with a current date.) The SCD will detect that the DateCreated column's value is different from the one in the database and so all those rows will fail to update because of that one change.
So it was the book's fault.
Related
SSIS - Loop Through Active Directory
Disclaimer: new to SSIS and Active Directory I have a need to extract all users within a particular Active Directory (AD) domain and import them into Excel. I have followed this: https://www.itnota.com/query-ldap-in-visual-studio-ssis/ in order to create my SSIS package. My SQL is: LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=a*));Name,sAMAccountName As you know there is a 1,000 row limit when pulling from the AD. In my SQL I currently have (name=a*) to test the process and it works. I need to know how to setup a loop with variables to pull all records and import into Excel (or whatever you experts recommend). Also, how do I know what the other field names are that are available to pull? Thanks in advance.
How do I see what's in Active Directory Tool recommendations are off topic for the site but a tool that you can download, no install required, is AD Explorer It's a MS tool that allows you to view your domain. Highly recommend people that need to see what's in AD use something like this as it shows you your basic structure. What's my domain controller? Start -> Command Prompt Type set | find /i "userdnsdomain" and look for USERDNSDOMAIN and put that value in the connect dialog and I save it because I don't want to enter this every time. Search/Find and then look yourself up. Here I'm going to find my account by using my sAMAccountName The search results show only one user but there could have been multiples since I did a contains relationship. Double clicking the value in the bottom results section causes the under pane window to update with the details of the search result. This is nice because while the right side shows all the properties associated to my account, it's also updated the left pane to navigate to the CN. In my case it's CN=Users but again, it could be something else in your specific environment. You might discover an interesting categorization for your particular domain. At a very large client, I discovered that my target users were all under a CN (Canonical Name, I think) so I could use that in my AD query. There are things you'll see here that you sure would like to bring into a data flow but you won't be able to. Like the memberOf that's a complex type and there's no equivalent in the data flow data types for it. I think Integer8 is also something that didn't work. Loop the loop The "trick" here is that we'll need to take advantage of the The name of the AD provider has changed since I last looked at this. In VS 2017, I see the OLE DB Provider name as "OLE DB Provider for Microsoft Directory Service" Put in your query and you should get results back. Let that happen so the metadata is set. An ADO.NET source does not support parameterization as the OLE DB does. However, you can apply an Expression on the Data Flow which surfaces the component and that's what we'll do. Click out of the Data Flow and back into the Control Flow and right click on the Data Flow and select Properties. In that properties window, find Expressions and click the ellipses ... Up pops the Property Expressions Editor Find the ADO.NET source under Property and in the Expressions section, click the Ellipses. Here, we'll use your same source query just to prove we're doing the right things "LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=" + "a" + "*));Name,sAMAccountName" We're doing string building here so the problem we're left to solve is how we can substitute something for the "a" in the above query. The laziest route would be to Create an SSIS variable of type String called CurrentLetter and initialize it to a Update the expression we just created to be "LDAP://DC=JOHN,DC=JANE,DC=DOE;(&(objectCategory=person)(objectClass=user)(name=" + #[USer::CurrentLetter] + "*));Name,sAMAccountName" Add a Foreach Loop Container (FELC) to your Control Flow. Configure the FELC with an enumerator of "Foreach Item Enumerator" Click the Columns... Click Add (this results in Column 0 with data type String) so click OK Fill the collection with each letter of the alphabet In the Variable Mappings tab, assign Variable User::CurrentLetter to Index 0 Click OK Old blog posts on the matter because I like clicks https://billfellows.blogspot.com/2011/04/active-directory-ssis-data-source.html http://billfellows.blogspot.com/2013/11/biml-active-directory-ssis-data-source.html
SSIS Slowly changing dimension column
I'm using a Slowly Changing Dimension in SSIS and I'm using a single column called active of type BIT to determine the latest records instead of start date and end date column. My problem is the following: I want to turn the active value to 0 for records that are no more present in the source file. For example imagine if my DWH is empty and in the source file I have the following data(Salary is the historisation attribute): employee_ID|NAME|salary 117|a|100 125|b|150 378|c|200 Now once I charge those into my DWH I get the following data. employee_code|employee_ID|NAME|salary|active 1|117|a|100|1 2|125|b|150|1 3|378|c|200|1 everything is good so far but now imagine I get a new source where the data is like this: employee_ID|NAME|salary 117|a|120 125|b|150 Here when I charge this data in the datawarehouse I get the following: employee_code|employee_ID|NAME|salary|active 1|117|a|100|0 2|125|b|150|1 3|378|c|200|1 4|117|a|120|1 Everything makes sense. Employee A's salary has changed so a new record is added in the DWH and the old record's active value turned to 0. Employee b's salary stayed the same so there is no need to add a new record. However mployee C does not exist in the source file anymore (He quit or got fired) I want to know if there is a way to turn the active value to 0 in such a situation
Is there a way to store database modifications with a versioning feature (for eventual versions comparaison)?
I'm working on a project where users could upload excel files into a MySQL database. Those files are the main source of our data as they come directly from the contractors working with the company. They contain a large number of rows (23000 on average for each file) and 100 columns for each row! The problem I am facing currently is that the same file could be changed by someone (either the contractor or the company) and when re-uploading it, my system should detect changes, update the actual data, and save the action (The fact that the cell went from a value to another value :: oldValue -> newValue) so we can go back and run a versions comparison (e.g 3 re-uploads === 3 versions). (oldValue Version1 VS newValue Version5) I developed a tiny mechanism for saving the changes => I have a table to save Imports data (each time a user import a file a new row will be inserted in this table) and another table for saving the actual changes Versioning data I save the id of the row that have some changes, as well as the id and the table where the actual data was modified (Uploading a file results in a insertion in multiple tables, so whenever a change occurs, I need to know in which table that happened). I also save the new value and the old value which is gonna help me with restoring the "archives data". To restore a version : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} To restore a version for one row : SELECT * FROM 'Archive' WHERE idImport = ${versionNumber} and rowId = ${rowId} To restore all version for one row : SELECT * FROM 'Archive' WHERE rowId = ${rowId} To restore version for one table : SELECT * FROM 'Archine' WHERE tableName = ${table} Etc. Now with this structure, I'm struggling to restore a version or to run a comparaison between two versions, which makes think that I've came up with a wrong approach since it makes it hard to do the job! I am trying to know if anyone had done this before or what a good approach would look like? Cases when things get really messy : The rows that have changed in a version might not have changed in the other version (I am working on a time machine to search in other versions when this happens) The rows have changed in both versions but not the same fields. (Say we have a user table, the data of the user with id 15 have changed in 2nd and 5th upload, great! Now for the second version only the name was changed, but for the fifth version his address was changed! When comparing these two versions, we will run into a problem constrcuting our data array. name went from "some"-> NULL (Name was never null. No name changes in 5th version) and address went from NULL -> "some' is which obviously wrong). My actual approach (php) <?php //Join records sets and Compare them foreach ($firstRecord as $frecord) { //Retrieve first record fields that have changed $fFields = $frecord->fieldName; //Check if the same record have changed in the second version as well $sId = array_search($frecord->idRecord, $secondRecord); if($sId) { $srecord = $secondRecord[$sId]; //Retrieve straversee fields that have changed $sFields = $srecord->fieldName; //Compare the two records fields foreach ($fFields as $fField) { $sfId = array_search($fField, $sFields); //The same field for the same record was changed in both version (perfect case) if($sfId) { $sField = $sFields[$sfId]; $deltaRow[$fField]["oldValue"] = $frecord->deltaValue; $deltaRow[$fField]["newValue"] = $srecord->deltaValue; //Delete the checked field from the second version traversee to avoid re-checking unset($sField[$sfId]); } //The changed field in V1 was not found in V2 -> Lookup for a value else { $deltaRow[$fField]["oldValue"] = $frecord->deltaValue; $deltaRow[$fField]["newValue"] = $this->valueLookUp(); } } $dataArray[] = $deltaRow; //Delete the checked record from the second version set to avoid re-checking unset($secondRecord[$srecord]); } I don't know how to deal with that, as I said I m working on a value lookup algorithm so when no data found in a version I will try to find it in the versions between theses two so I can construct my data array. I would be very happy if anyone could give some hints, ideas, improvements so I can go futher with that. Thank you!
Is there a way to store database modifications with a versioning feature (for eventual versions comparaison [sic!])? What constitutes versioning depends on the database itself and how you make use of it. As far as a relational database is concerned (e.g. MariaDB), this boils down to the so called Normal Form which is in numbers. On Database Normalization: 5th Normal Form and Beyond you can find the following guidance: Beyond 5th normal form you enter the heady realms of domain key normal form, a kind of theoretical ideal. Its practical use to a database designer os [sic!] similar to that of infinity to a bookkeeper - i.e. it exists in theory but is not going to be used in practice. Even the most demanding owner is not going to expect that of the bookkeeper! One strategy to step into these realms is to reach the 5th normal form first (do this just in theory, by going through all the normal forms, and study database normalization). Additionally you can construe versioning outside and additional to the database itself, e.g. by creating your own versioning system. Reading about what you can do with normalization will help you to find better ways to decide on how to structure and handle the database data for your versioning needs. However, as written it depends on what you want and need. So no straight forward "code" answer can be given to such a general question.
LabVIEW - writing data from multiple DAQ Assistants in the same .csv-file
I have the following problem with my VI, which I could not solve by myself or research: When running the VI, the data should be stored in a .csv-File. In the pictures, you can see the block diagram. When running, it produces the following file: Test Steady State T_saug_1/T_saug_2/Unbelegt/Unbelegt/T_ND/T_HD/T_Wasser_ein/T_Wasser_aus/T_front/T_back/T-right/T-left 18,320 18,491 20,873 20,838 20,463 20,969 20,353 20,543 20,480 20,618 20,618 20,238 As you can see, the data gets stored only in the first column (in the preview of the post it looks like it is a row, but it is really a column; T steady state is the header). But these temperatures are not the temperatures of the first sensor, it somehow stored the value for every sensor in the respective row. When the first row was filled, it stopped storing data entirely. I did not figure out how I could insert a file here, otherwise I would have done so... I want to store the data for each sensor in the associated column. Another problem I have: the waveform-chart, which shows all the temperatures, only updates every 4-6 seconds. Not only is the interval between every update not always the same, but from my understanding it should update every second since the while-loop has a wait-timer set to 1000ms. I don't know what my mistake here is... Please let me know if you have any ideas on how to solve the problems I have or suggestions where I could find answers to my questions. I am very new to LabVIEW, I am sorry if this question is silly. With best regards an thank you for the patient help, lempy. csv-file Block diagram DAQ-Assis. for PT100 DAQ-Ass. for TC
The Write Delimited Spreadsheet VI has two boolean inputs: Append to file? and transpose? Append to file? is not set for the first write, which defaults to FALSE. That means, on each write, the file is overwritten. For the second and third call, it is set to TRUE, so those data is appended. The most simple solution is to put the first two write functions outside the main loop. This overwrites the file at start of the VI with the headers, and values will be appended as desired. transpose? will swap rows and columns. Wire TRUE to it, and check if it works. About your second question: A loop runs as fast as the slowest process inside. If the graph is updated every 6s only, something takes 6s to complete. My guess is that those temperature readings take so long...
SSIS Errors for simple CSV Data Flow
Sorry to darken your day with my troubles, but SSIS has broken me! I am new to SSIS and I just seem to be misunderstanding it. For background: I have a few versions of a basic package that includes a Foreach Loop container and a Data Flow with a few Derived Columns that imports CSV files into a SQL Server Staging table. It is very straightforward and does include an Execute SQL task and a File Move but those work fine. The issues are with the Foreach loop and the Data Flow. I have one version of this package (let’s call it “A”) that seemed to be working fine. It would process multiple files in a folder, insert records into the staging table, properly execute the SQL Statements, and move the files to Archive. Everything seemed fine until I carefully QA’d the process. Turns out it was duplicating the data from one file, and never importing the data from a second Source File! Yet, the second/dupe round of data included the Source Filename (via a derived column) of the second file (but the data from the first). So it looked like I had successfully processed BOTH files until I looked at the actual data and saw that none of the values from the second source file were ever written to the Staging table. Once I discovered this, I figured that the problem was in the Foreach loop and how I setup the different file path & name variables. So, I decided to try to make a new version of the package. I started by copying package A and created package B. In B, I deleted the Source Connection manager and created a new Connection Manager along with all new file & path variables. I then tried to cleanup/fix/replace various elements in my Data Flow and Foreach loop. In the process, I discovered that the Advanced Mappings from A – which DID work – were virtually all setup as String (even the Currency and Date columns). That did not seem right, so I modified each source money column by changing to data type Currency, and changed each date-related column to data type Date. What followed has been dozens and dozens of Errors and I cannot get Package B to run. I have even changed all of the B data types back to String (mirroring the setup in Package A which DID work). But, still no joy. This leads me to ask a few questions to those of you smarter than I: 1) Why can’t SSIS interpret Source CSV data using the proper data type? I.e. why do I need to set every Input column as a STRING when some columns are clearly & completely Numeric, Currency or Dates? (Yes, the Source CSV files are VERY clean – most don’t even have NULLS) a. When I do change the Advanced mapping for a date-related Source column to Date, I get the ever present error message: [Flat File Source [30]] Error: Data conversion failed. The data conversion for column "Settle Date" returned status value 2 and status text "The value could not be converted because of a potential loss of data.". 2) When I reset the data types back to String in package B, I still get errors – usually Truncation errors (and Yes – I have adjusted the length to 250 in one of these columns). a. Error Message: "The value could not be converted because of a potential loss of data.". b. When I reset the Mappings to ignore the column (as a test), it throws a similar error at the next column. 3) Any ideas why Package A would dupe a file’s data and not process the second file, yet throw no errors and move both to Archive? 4) Why does the Data Viewer appear to have parsing errors (it shows data in the wrong columns) but when you use the Copy data feature in the data viewer and paste it into Excel, all of the data lines up perfectly? 5) Are there any tips & tricks that a rookie SSIS user needs to understand and which might not be apparent through the documentation and searching web articles as well as this site? I can provide further details if they will help, but these packages are really very simple and should not be causing me this much frustration. THANKS for any insights. DGP
Wow seems like you have a lot of ssis issues... I think the reason for the same file being extracted is because of the the way your 'variable mappings' is defined. Have you had a look and followed this guide: https://www.simple-talk.com/sql/ssis/ssis-basics-introducing-the-foreach-loop-container/ Hope this helps. Shaheen
Thanks Tab & Shaheen, To all SSIS rookies - please learn from my mistakes! It appears that my issue was actually in how I identified the TEXT QUALIFIER in the Connection Manager. I had entered "" and that was causing problems with how my columns were being parsed. The parsing issues caused unexpected values to appear in some of the columns and that was causing the errors in the package. When I tried changing the the Text Qualifier to only ONE double quote - " - the whole thing worked! As I mentioned - and as Shaheen suspected - my initial issues with the duplicate processing was probably due to how I setup the foreach loop. I had already fixed that, bit was still getting errors until I fixed the Text Qualifier. I have only tested it a few times but it looks like that was the issue. Thanks for the contributions. DGP