SSDT - columns out of synchronization - blocks update to table - mysql

Process - first off, i would like to just use a program to enter data into the SQL table but due to certain ... issues ... this is not an option.
Excel sheet has set column headers - they do not change - excel data for columns does change. new records are added and other are removed daily. i have a project in SSDT 2015:
clear temp table - delete from tbldatemp
data flow task
a. source: excel sheet - access mode: table or view
b. Data conversion 0-0 - some conversions necessary for sql to accept data
c. destination - tbldatemp
update main tblda - insert into where not exists a.[blah] = b.[blah] etc etc etc
so every day the data in excel changes. i have a vb.net program that executes the update package with the click of a button - and every day it doesn't work. so i open up the project in SSDT and see a nice yellow ! triangle on my data flow task stating that columns are out of synchronization. when i open up the data flow task, the ! is now on the excel source stating columns are out of synchronization.
i have looked for DAYS trying to figure out how to fix this or get around it and cannot figure it out. please help!!!! Thank you

Related

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

SSIS - Load Point in Time tables - package flow?

First, I am newer to SSIS. I've been working with it for a while, but not very often and only for very basic simple things.
I have an SSIS package set up to load about 30 tables from a "live" data source. This has been working fine and we add new tables and update existing tables all the time. We use these tables to query for reports. This was not a planned out project, it was/is being done on the fly.
Then I was told I need to create "point in time" tables. Basically, duplicate tables from above need to be loaded 3 times every year: once in the beginning, middle and end of year. The data cannot change, but just like the "live" tables, it will be a work in progress with new fields/tables being added. I was under the gun to get it done now (along with a bunch of other stuff of course) and I did it in a way that "works for now", but cannot obviously stay this way. Here is basically what I did:
I created synonyms and revised all the stored procedures to use the synonyms
I set up an SSIS package with the following steps:
Execute SQL Task - Truncate tables
Execute SQL Task - Point synonyms to beginning of the year data source
Data Flow Task - Load tables
Execute SQL Task - Point synonyms to middle of the year data source
Data Flow Task - Load tables
Execute SQL Task - Point synonyms to end of the year data source
Data Flow Task - Load tables
Obviously it cannot stay this way as I now have 4 copies of the same Data Flow Tasks. What a nightmare if I need to add fields or a new table! At least I am able to use the same stored procedures though. Can someone please help me with an idea for a better set up?

SSIS - variable driven package

I have a package that is essentially trying to copy 26 tables from oracle to sql server.
Its not a complete table copy, we are looking for records that belong to certain 'Regions' of our company.
I pull the data from oracle
I started just doing this with elbow grease, but each of the 26 tables required several variable to do the deletes, the fetches etc.
Long story short, I decided to use variables to represent the table names (source, temp and target).
This allowed me to copy/paste one sequence and effectively bypass a lot of click click in bids.
The problem I am running into is that the meta data seems to be very fragile. Sequences all seem to run on their owwn, but when i run the whole package, it breaks. And never in the same place.
Is this approach just a bad idea w/ SSIS?
So just to take this off the board....
Each sequence container had the following ops
Script task - set variables
Execute SQL task - delete from temp
data flow SourceToTemp -
ole db source - used a generic select * from tbl to temp_tbl
derived column1 - insert a timestamp column
oledb destination - map all the columns into a temp table (**THIS IS THE BIG PROBLEM CHILD)
Execute SQL task - delete from target
Execute SQL task - insert target select from temp
the oleDB destination is the piece that kept breaking.
Since it references variables, I had to be very careful at design time to set the variables correctly before opening one of the data flows.
I am pretty sure this is the problem. Since I can not say w/ certainty when SSIS refreshes meta data in the design environment, I cant be sure if/when sequence X refreshed while the variables were set to support sequence Y.
So while it conceptually should work at run time, dev time is a change control night mare.
I have changed all the oleDB destinations to point to a hard table name. this is really a small concession since there are 4 sql statements that are still driven by variables. (saving me a lot of clicking and typing)
This small change has eliminated the 'shifting sands' problem.
Take-a-way lesson : dont have an oledb destination be basesd on a variable.
thanks for the comments

Import Excel to SQL Server 2008

I need to create a process to import a multi tabbed excel spreadsheet into SQL Server 2008R2. Each tab will be a different table in the database. This will need to be done weekly and imports should be automated. Ideally I want to pop the spreadsheet into a folder [or have some intern do it] and have sql run a procedure that looks in this folder, and adds the data to the tables in this db. I would also like to have another table that tracks the imports and date stamps them. I really have no idea where to even start here as I'm a pretty huge noob when it comes to tsql.
There is a nice article by microsoft - http://support.microsoft.com/kb/321686 - that outlines the processes involved.
The process is simply
SELECT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\test\xltest.xls;Extended Properties=Excel 8.0')...[Customers$]
Where XLImport3 is the table you want to import into and the datasource is the excel sheet you want to import from.
If you're limited solely to TSQL, the above two answers will show you some ideas. If you have access to either Data Tools or Business Intelligence, with SSIS, you can automate it with the assumption that each sheet in the Excel workbook matches each time. With SSIS, you'll use a Data Flow task and each sheet will be imported into the table that you want. When you're ready for the file the next week, you'll drop it into the folder and run the SSIS package.
However, if the sheet names change, (for instance, one week sheets are called Cats, Dogs, Rain and the next week it's Sulfur, Fire, Hell) then this would cause the package to break. Otherwise, if only the data within the worksheet change, then this can be completely automated with SSIS.
Example article: https://www.simple-talk.com/sql/ssis/moving-data-from-excel-to-sql-server---10-steps-to-follow/
Below is the code to insert data from a csv file into a given table. I don't what the full requirements are for the project, but if I were you I would just separate each table into a different file and then just run a proc that inserts data into each of the tables.
BULK
INSERT TABLE_NAME
FROM 'c:\filename.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
insert into import_history ('filename', 'import_date') values ('your_file_name', getdate())
Also, for the table that tracks imports and timestamps them, you could just insert some data into that table after each bulk insert as seen above.
Also, here's a link to tutorial on bulk inserting from a csv file that may also help: http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
Its very simple. Right click the Database in Sql Server(2008), select Tasks and select Import Data
Now change the DataSource to Microsoft Excel. Chose the path of Excel file by clicking Browse button and click Next.
Chose the Sql Server instance and chose the database to which the excel to be imported.
Select Copy data from one or more tables or views and click Next.
Now select the sheets to be imported to Sql Server.
Click Next
Now click Finish
Now the wizard imports the data from Excel to Sql Server and click Close.
Here is the table

How to track status of rows successfully processed or failed in SSIS data flow task?

I have a very simple Data flow task reading data from a FF and inserting the data in a table. At the same time I would like to write in an Audit table, how many rows have been inserted, the created date...
How can I do that easily?
If you are interested only in the number of rows being successfully processed or number of rows that encountered errors, then you can make use of in-built SSIS logging feature. Please check the below mentioned steps. I hope the example gives you an idea. I have displayed only two columns from the log table but there are other useful fields like starttime, endtime etc., The example was created in SSIS 2008 R2
Click on the SSIS package.
On the menus, select SSIS --> Logging...
On the Configure SSIS Logs: dialog, select the provider type and click Add. I have chosen SQL Server for this example. Check the Name checkbox and provide the data source under Configuration column. Here SQLServer is the name of the connection manager. SSIS 2008 or SSIS 2008 R2 will create a table named dbo.sysssislog and stored procedure dbo.sp_ssis_addlogentry in the database that you selected. Refer screenshot #1 below. The table name in SSIS 2005 is dbo.sysdtslog90 and the stored procedure is named as dbo.sp_dts_addlogentry
If you need the rows processed, select the checkbox OnInformation. Here in the example, the package executed successfully so the log records were found under OnInformation. You may need to fine tune this event selection according to your requirements. Refer screenshot #2 below.
Here is a sample package execution within data flow task. Refer screenshot #3 below.
Here is a sample output of the log table dbo.sysssislog. I have only displayed the columns id and message. There are many other columns in the table. In the query, I am filtering the output only for the package named 'Package1' and the event 'OnInformation'. You can notice that records with ids 7, 14 and 15 contain the rows processed. Refer screenshot #4 below.
Hope that helps.
Screenshot #1:
Screenshot #2:
Screenshot #3:
Screenshot #4:
You can multicast the flat file, or use a trigger on the table you're inserted data to. If it's a table that's being audited you'll probably want to know whenever any data is inserted.