Select specific cells from multiple spreadsheets into SQL using SSIS - sql-server-2008

I need to loop through a series of spreadsheets (all in the same folder), pulling data from the same cells within the same named range in each, into an existing SQL database, using SSIS (SQL Server 2008 R2).
I started by using the information in How to loop through Excel files and load them into a database using SSIS package? as a point of reference.
However, because my files don't run in a strict columnar format (i.e. the whole of column C plus the whole of column E, etc.), I am struggling with it.
My sheet is as follows:
Basically, the area outlined in red (A6:E11) will be the named range (done this way to allow for additional rows as we move forward) and the yellow cells are those that I need to import.
Let's assume that the range will be named "My_Range"
I need to import a row into the database for each of the rows in the range (currently rows 6 through 11).
e.g.
DBase: Col1, Col2, Col3, Col4
Row 1 = B3....B4....C6....E6
Row 2 = B3....B4....C7....E7
Row 3 = B3....B4....C8....E8
etc..
Any help would be greatly appreciated as I need to find the most efficient way to do this for up to 100 files per night.
If you can help me to get the correct data in the correct format from just 1 file, I can work on the multiple-file problem next.
Thanks guys.

One of the nifty things you can do with the Excel source in SSIS is define the actual range you want. So instead of saying I want "Sheet1" Put into the Sheet1$A5:E.
Just ignore the columns you don't want.
Something llke this.
EDIT:
You might want to use an excel script source to grab the first 2 rows if they are always in the same spot.

Related

SSIS Getting the most up to date record using datetime columns

I am building an SSIS for SQL Server 2014 package and currently trying to get the most recent record from 2 different sources using datetime columns between the two sources and implementing a method to accomplish that. So far I am using a Lookup Task on thirdpartyid to match the records that I need to eventually compare and using a Merge Join to bring them together and eventually have a staging table that has the most recent record.I have a previous data task, not shown that already inserts records that are not in AD1 into a staging table so at this point these records are a one to one match. Both sources look like this with the exact same datetime columns just different dates and some information having null values as there is no history of it.
Sample output
This is my data flow task so far. I am really new to SSIS so any ideas or suggestions would be greatly appreciated.
Given that there is a 1:1 match between your two sources, I would structure this as an Source (V1) -> Lookup (AD1)
Define your lookup based on thirdpartyid and retrieve all the AD1 columns. You'll likely end up with data flow look like like name and name_ad1, etc
I would then add a Derived Column that identifies if the dates are different (assuming in that situation, you need to take action)
IsNull(LastUpdated_AD1) || LastUpdated > LastUpdated_AD1
That expression would create a boolean if the column in AD1 is null or if the V1 last updated column is greater than the AD1 version.
You'd likely then add a Conditional Split into your Data Flow and base it on the value of our new column and then route the changed data into your mechanism for handling updates (OLE DB Command or preferably, an OLE DB Destination + an Execute SQL Task post Data Flow to perform an batch update)
The comment asks
should it all be one expression? like IsNull(AssignmentLastUpdated_AD1) || AssignmentLastUpdated > AssignmentLastUpdated_AD1 || IsNull(RoomLastUpdated_AD1) || RoomLastUpdated > RoomLastUpdated_AD1
You can do it like that but when you get a weird result and someone asks how you got that value, long expressions make it hard to debug. I'd likely have two derived columns components in the data flow. The first would create a has changed column for each set of conditions
HasChangedAssignment
(IsNull(AssignmentLastUpdated_AD1) || AssignmentLastUpdated > AssignmentLastUpdated_AD1)
HasChangedRoom
IsNull(RoomLastUpdated_AD1) || RoomLastUpdated > RoomLastUpdated_AD1
etc
And then in the final derived column, you create the HasChanged column
HasChangedAssignment || HasChangedRoom || HasChangedAdNauseum
Using a pattern based approach like this makes it much easier to build, troubleshoot and/or or make small changes that can have a big impact to the correctness, maintainability and performance of your packages.

Import excel records into access based on column value

I am a newbie with access and I am trying to import records into several tables from an excel file. Each row in excel has different number of columns, but the good thing is column A is able to help me to identify what records need to go to my different tables.
Sample table
As you can see in the picture, Row 1 Column A has the value of "H", which would indicate that this record needs to go to the "H" table. Then the next few rows have a value of "R" in Column A which indicates that these records should go to the "R" table, and so on and so forth. However, the number of records to be imported into each table will vary all the time. Like the sample above rows 2 through 10 belong to the table R, but the next import may have only 5 or 20 records.
Currently I am using a temporary table and using an append query for each table but I am wondering if there is an easier way via VBA or other method that could be faster and more efficient.
Thanks!
The way you are doing it now may be the best way. An alternative would be to do this in two steps:
1) split your column A, and parse out to different sheets (or different workbooks).
http://www.rondebruin.nl/win/s3/win006.htm
2) load those different sheets (or workbooks) into different tables.
http://www.accessmvp.com/KDSnell/EXCEL_Import.htm#ImpAllWktsSepTbl
http://www.accessmvp.com/KDSnell/EXCEL_Import.htm#ImpFldWrkFiles

How to populate 10 different query results with different columns and number of columns to a text file in MSSQL

I am doing a project to generate data extracts on a daily basis. I have ten different queries with different columns and also the number of columns are also different. the database is MSSQL server 2008 R2 and I tried SSIS packet to accomplish the result.I used the components datasource, then a sort and the result of the sort to merge and then to text file. But I am getting error when combining the result saying the columns are different or something. Can anyone suggest a solution or is there any other way to accomplish this.
thanks,
Sivajith
Can you please provide error message? The merge component can merge data flows with various amount of columns, by selecting for the input columns.
First create a template .csv file which contain all the columns from the queries (i.e. if you have the columns A B C in the first query, B, E, F in the 2nd , B , X, Y in the third and so on, make sure your template file will have A B C E F X Y)
Make 10 tasks (one for each query). As a source use sql from command and write your query. As a destination, use the template file created above. Make sure you uncheck "Overwrite data".
Use the same destination for all the queries.
This should do the trick. I am not sure that I completely understood your question since it's a little big vague.
Here are the following reference that may help you a bit more:
SQL Server : export query as a .txt file
You will have to make sure you have a proper connection to the SQL server and then run this as a powershell or a .bat file. This can be scheduled to run daily as well.

SSIS - Writing to Excel After Skipping Rows

Is there a way to write data to an excel spreadsheet after skipping x number of rows...excel is my destination and a sql query would be my source?
My scenario is one where i have a lot of header rows that i need to skip before data insertion. I would like to do this in an SSIS package. I am using SQL 2008 and Excel 2010.
Thanks
if you right click on the excel connection manager at the bottom of the page than click options , there is a setting called FirstRowHasColumnName set it to FALSE .let me know if it helps , didn't really understand if you just want to skip the first row that is the name of the columns from SQL query or more , there are other ways
Easiest way would be to modify your SQL query to exclude the header rows. If you can't do that then you need some logic to determine if the row is a header row (like checking if a certain field is a number):
If you can do that then you can do this:
read all columns in as text
Put in a derived column where you generate a new column IsHeader using your logic
Use Conditional Output to filter out the rows where your IsHeader is true
Use Data Conversion or Derived column to convert the columns to correct datatype
Output to Excel as usual

How to skip irregular header information of a Flat File in SSIS?

I have a file like as seen below: Just Ex:
kwqif h;wehf uhfeqi f ef
fekjfnkenfekfh ijferihfq eiuh qfe iwhuq fbweq
fjqlbflkjqfh iufhquwhfe hued liuwfe
jewbkfb flkeb l jdqj jvfqjwv yjwfvjyvdfe
enjkfne khef kurehf2 kuh fkuwh lwefglu
gjghjgyuhhh jhkvv vytvgyvyv vygvyvv
gldw nbb ouyyu buyuy bjbuy
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
I want to dynamically skip header information and load flatfile to DB
Like below:
ID Name Address
1 Andrew UK
2 John US
3 Kate AUS
The header information may vary (not fixed no. of rows) from file to file.
Any help..Thanks in advance.
The generic SSIS components cannot meet this requirement. You need to code for this e.g. in an SSIS Script task.
I would code that script to read through the file looking for that header row ID Name Address, and then write that line and the rest of the file out to a new file.
Then I would load that new file using the SSIS Flat File Source component.
You might be able to avoid a script task if you'd prefer not to use one. I'll offer a few ideas here as it's not entirely clear which will be best from your example data. To some extent it's down to personal preference anyway, and also the different ideas might help other people in future:
Convert ID and ignore failures: Set the file source so that it expects however many columns you're forced into having by the header rows, and simply pull everything in as string data. In the data flow - immediately after the source component - add a data conversion component or conditional split component. Try to convert the first column (with the ID) into a number. Add a row count component and set the error output of the data conversion or conditional split to be redirected to that row count rather than causing a failure. Send the rest of the data on its way through the rest of your data flow.
This should mean you only get the rows which have a numeric value in the ID column - but if there's any chance you might get real failures (i.e. the file comes in with invalid ID values on rows you otherwise would want to load), then this might be a bad idea. You could drop your failed rows into a table where you can check for anything unexpected going on.
Check for known header values/header value attributes: If your header rows have other identifying features then you could avoid relying on the error output by simply setting up the conditional split to check for various different things: exact string matches if the header rows always start with certain values, strings over a certain length if you know they're always much longer than the ID column can ever be, etc.
Check for configurable header values: You could also put a list of unacceptable ID values into a table, and then do a lookup onto this table, throwing out the rows which match the lookup - then if you need to update the list of header values, you just have to update the table and not your whole SSIS package.
Check for acceptable ID values: You could set up a table like the above, but populate this with numbers - not great if you have no idea how many rows might be coming in or if the IDs are actually unique each time, but if you're only loading in a few rows each time and they always start at 1, you could chuck the numbers 1 - 100 into a table and throw away and rows you load which don't match when doing a lookup onto this table.
Staging table: This is probably the way I'd deal with it if I didn't want to use a script component, but in part that's because I tend to implement initial staging tables like this anyway, and I'm comfortable working in SQL - so your mileage may vary.
Pick up the file in a data flow and drop it into a staging table as-is. Set your staging table data types to all be large strings which you know will hold the file data - you can always add a derived column which truncates things or set the destination to ignore truncation if you think there's a risk of sometimes getting abnormally large values. In a separate data flow which runs after that, use SQL to pick up the rows where ID is numeric, and carry on with the rest of your processing.
This has the added bonus that you can just pick up the columns which you know will have data you care about in (i.e. columns 1 through 3), you can do any conversions you need to do in SQL rather than in SSIS, and you can make sure your columns have sensible names to be used in SSIS.