I have an SSIS package that I used to import excel data, I need to add a column to the table when I import the data, however, the column, is contrived data joined from another table that already exists in the sql server database.
Does anybody know how I would even begin to do this?
I've tried "derived column" however, the data that populates the column is not derived from the source excel data, rather, a join from the data to that other table.
Thanks
You can use a lookup in addition to the methods from #HLGEM
I know of two methods. One you can use a merge join in the dataflow. This tends to be a slow because you have to sort both the sources for the merge. If your data set is not large this might not be too bad though.
If your data source is large, I prefer to import the data to a work table in one data flow first. Then the datasource in the second data flow (the ones that imports to the production table) would be a query that joins the worktable to the existing table you want to grab other information from. This is more timeconsuming to set up, but here we never import anything without a work table because it maskes going back to research data import issues so much easier. It also makes it easier to clean up the data before import in my opinion as I am not a fan of doing the clean up in the data flow.
Related
I was given an excel (csv) sheet containing a database metadata.
I'm asking if there's a simple way to import the csv and create the tables from there?
Data is not part of this question. the csv looks like this:
logical_table_name, physical_table_name, logical_column_name, physcial_column_name, data_type, data_length
There's about 2000 rows of metadata. I'm hoping I don't have to manually create the tables. Thanks.
I don't know of any direct import or creation. However, if I had to do this and I couldn't find one, I would import the excel file into a staging table (just a direct data import). I'd make add a unique auto ID column to staging table to keep the rows in order.
Then I would use some queries to build table and column creation commands from the raw data. Unless this was something I was setting up to do a lot, I would keep it dead simple, not try and get fancy. Build individual add column commands for each column. Build a create Table command for the first row for each table. Sort them all by the order id, tables before columns. Then you should be able to just copy the script column, check the commands, and go.
I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.
Here's the quick description of the business need:
We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.
Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.
Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.
Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.
What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.
What have you done in a similar situation?
If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data.
other approach could be
you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.
I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.
What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.
i need to import some data from very huge csv file which is about 1GB.
instead of importing all, i want to just import matched data, i think it will be more easy and faster than importing all data.
i need to search "Post Code District" column of CSV file, if it contains LS1 or LS2 or LS10, import matched data into tabel in SQL?
Misconception. You think that filtering a text file against a database table is going to be faster than just loading the entire file into the database.
I support there are extreme cases where this might be true. But, in general, the safest way to handle these types of situation is:
Import the file into a staging table.
Add indexes, as necessary to the staging table for performance.
Run a query to copy the data you want from the staging table.
I could phrase this a different way. In the time it would take you to figure out how to efficient combine information from the file and a database table, you could probably go through the above process 10-50 times.
I need to migrate the exceeding database value with new one. I have two database like test and test new. I create the both database with same data. I made the all changes in test now I need migrate that changes in test new without affecting existing value.
If table schema is different, how will I then go about doing this? In my prev job, what I did was import data (in my case, from Access) into my destination (MySQL) leaving table structures, then use SQL to select data and manipulate as required into final destination tables.
in my case, where I don't have documentation for the old database, and the columns was not named correctly, e.g. it uses say 'field1', 'field2' etc. I needed to trace from the application code what the columns mean. Is there any better way? Also, sometimes columns contain multiple values in delimited data, is reading code the only way?
It sounds like you know what to do, but are just not keen to do it.
If there is no documentation then it makes sense that you will have to go to the code to figure out what it does. Regarding porting it across you will most likely have to write custom scripts that pull the data, manipulate it and insert it into the new table based on the new structure.
There are some tools to generate migration scripts - i.e. scripts that generate inserts for all your data. I think mysql workbench does it, but it most likely won't be sufficient since your tables have different structures.
I would like to bring in an XML source and do data conversion and update it in a table. Data from this table will be used to update another table. How to accomplish this in SSIS?
I understand the first two steps. But lost after that.
XML Source (under dataflow task)
Data Conversion
OLE DB Destination? (If I use OLE DB Destination, then I cannot use that as a source again to update another table). What component should I be using to accomplish this?
TIA
Within a dataflow you can split the records to go to multiple tables using either a conditional split (if you want some records to go one way and some to go another way) or a mulicast task if you want all records to go to both destinations. We use a multicast to create two staging tables, one where the raw data from the file will stay and one where the data will be cleaned and transformed before going into our prod tables. This enables us to easily research if some problem data that came in was due to our transformation process (a bug) or bad data being sent (a problem at the client end, but which might require more steps to handle if they can't fix).
You can also have multiple data flows that all have the same source. Or you can insert to one staging table and then have a second data flow or exec SQL task to move that data to where you want it.
Use the OLE DB Destination to inject your XML source data into your staging table. Then, in your control flow use an Execute SQL task after your data flow task to execute a stored procedure or T-SQL script to move your data from the staging table into the production table(s) and truncate the staging table if required.
I've found that SSIS is great for ETL work, but moving data around inside a DB or aggregation work is best carried out using T-SQL in stored procs. Easier to write, control and you know you're not going to have any RBAR shenanigans you can happen upon in a DFT.
YMMV