SSIS Combining CSV files - ssis

Im kinda new to all the SSIS stuff. And im stuck with it. i want to combine multiple CSV files and then put them into a database. The files all have the same info. Examples:
File 1
Week Text1
22-10-2018 58
29-10-2018 12
File 2
Week Text2
22-10-2018 55
29-10-2018 48
File 3
Week Text3
22-10-2018 14
29-10-2018 99
Expected result:
Result in DB
Week Text1 Text2 Text3
22-10-2018 58 55 14
29-10-2018 12 48 99
I got this far by selecting the documents, use a sort and then a join merge. For 3 documents this took me 3 sorts and 2 join merge's. I have to do this for about 86 documents.. there has to be an easier way.
Thanks in advance.

I agree with KeithL, I recommend that your final table look like this:
Week Outcome Value DateModified
=======================================================
22-10-2018 AI 58 2018-10-23 20:49
29-10-2018 AI 32 2018-10-23 20:49
22-10-2018 Agile 51 2018-10-23 20:49
29-10-2018 Agile 22 2018-10-23 20:49
If you want to pivot Weeks or outcomes, do it in your reporting tool.
Don't create tables with dynamic named columns - that's a bad idea
Anyway here is an approach that uses a staging table.
Create a staging table that your file will be inserted into:
Script 1:
CREATE TABLE Staging (
[Week] VARCHAR(50),
Value VARCHAR(50),
DateModified DATETIME2(0) DEFAULT(GETDATE())
)
Import the entire file in, including headings. In other words, when defining the file format, don't tick 'columns in first row'
We do this for two reasons:
SSIS can't import files with with different heading names using the same data flow
We need to capture the heading name in our staging table
After you import a file your staging table looks like this:
Week Value DateModified
=======================================
Week Agile 2018-10-23 20:49
22-10-2018 58 2018-10-23 20:49
29-10-2018 32 2018-10-23 20:49
Now select out the data in the shape we want to load it in. Run this in your database after importing the data to check:
Script 2:
SELECT Week, Value,
(SELECT TOP 1 Value FROM Staging WHERE Week = 'Week') Outcome
FROM staging
WHERE Week <> 'Week'
Now add an INSERT and some logic to stop duplicates. Put this into an execute SQL task after the data import
Script 3:
WITH SRC As (
SELECT Week, Value,
(SELECT TOP 1 Value FROM Staging WHERE Week = 'Week') Outcome
FROM staging As SRC
WHERE Week <> 'Week'
)
INSERT INTO FinalTable (Week,Value, Outcome)
select Week, Value, Outcome
FROM SRC
WHERE NOT EXISTS (
SELECT * FROM FinalTable TGT
WHERE TGT.Week = SRC.Week
AND TGT.Outcome = SRC.Outcome
)
Now you wrap this up in a for each file loop that repeats this for each file in the folder. Don't forget that you need to TRUNCATE TABLE staging before importing each file.
In Summary:
Set up a for each file iterator
Inside this goes:
A SQL Task with TRUNCATE TABLE Staging;
A data flow to import the text file from the iterator into the staging table
A SQL Task with Script 3 in it
I've put the DateModified columns in the tables to help you troubleshoot.
Good things: you can run this over and over and reimport the same file and you won't get duplicates
Bad thing: Possibility of cast failures when inserting VARCHAR into DATE or INT

You can read your file(s) using a simple C# script component (Source).
You need to add your 3 columns to output0.
Week as DT_Date
Type as DT_STR
Value as DT_I4
string[] lines = System.IO.File.ReadAllLines([filename]);
int ctr = 0;
string type;
foreach(string line in lines)
{
string[] col = line.Split(',');
if(ctr==0) //First line is header
{
type = col[1];
}
else
{
Output0Buffer.AddRow();
Output0Buffer.Week = DateTime.Parse(col[0]);
Output0Buffer.Type = type;
Output0Buffer.Value = int.Parse(col[1]);
}
ctr++;
}
After you load to a table you can always create a view with a dynamic pivot.

Related

Nested sort in SELECT followed by Conditional INSERT based upon results of SELECT inquiry

I have been struggling with the following for some time.
The server I am using has MySQL ver 5.7 installed.
The issue:
I wish to take recorded tank level readings from one table, find the difference between the last two records for a particular tank, and multiply this by a factor to get a quantity used.
The extracted quantity, if it is +ve, else 0 , then to be inserted into another table for further use.
The Quant value extracted may be +ve or -ve as tanks fill and empty. I only require the used quantity -ie falling level.
The two following tables are used:
Table 'tf_rdgs' sample;
value 1 is content height.
id
location
value1
reading_time
1
18
1500
2
18
1340
3
9
1600
4
18
1200
5
9
1400
6
18
1765
yyyy
7
18
1642
xxxx
Table 'flow' example
id
location
Quant
reading_time
1
18
5634
dd-mm: HH-mm
2
18
0
dd-mm: HH-mm
3
18
123
current time
I do not require to go back over history and am only interested in the latest level readings as a new level reading is inserted.
I can get the following to work with a table of only one location.
INSERT INTO flow (location, Quant)
SELECT t1.location, (t2.value1 - t1.value1) AS Quant
FROM tf_rdgs t1 cross join tf_rdgs t2 on t1.reading_time > t2.reading_time
ORDER BY t2.reading_time DESC limit 1
It is not particularly efficient but works and gives the following return from the above table.
location
Quant
18
123
for a table with mixed locations including a WHERE t1.location = ... statement does not work.
The problems i am struggling with are
How to nest the initial sorting by location for the subsequent inquiry of difference between the last two tank level readings.
A singular location search is ok rather than all tanks.
A Conditional INSERT to insert the 'Quant' value only if it is +ve or else insert a 0 if it is -ve (ie filling)
I have tried many permutations on these without success.
Once the above has been achieved it needs to run on a conditional trigger - based upon location of inserted data - in the tf_rdgs table activated upon each new reading inserted from the sensors on a particular tank.
I can achieve the above with the exception of the conditional insert if each tank had a dedicated table but unfortunately I cant go there due existing data structure and usage.
Any direction or assitance on parts or whole of this much appreciated.

T-SQL query procedure-insert

I am wondering if any of you would be able to help me. I am trying to loop through table 1 (which has duplicate values of the plant codes) and based on the unique plant codes, create a new record for the two other tables. For each unique Plant code I want to create a new row in the other two tables and regarding the non unique PtypeID I link any one of the PTypeID's for all inserts it doesnt matter which I choose and for the rest of the fields like name etc. I would like to set those myself, I am just stuck on the logic of how to insert based on looping through a certain table and adding to another. So here is the data:
Table 1
PlantCode PlantID PTypeID
MEX 1 10
USA 2 11
USA 2 12
AUS 3 13
CHL 4 14
Table 2
PTypeID PtypeName PRID
123 Supplier 1
23 General 2
45 Customer 3
90 Broker 4
90 Broker 5
Table 3
PCreatedDate PRID PRName
2005-03-21 14:44:27.157 1 Classification
2005-03-29 00:00:00.000 2 Follow Up
2005-04-13 09:27:17.720 3 Step 1
2005-04-13 10:31:37.680 4 Step 2
2005-04-13 10:32:17.663 5 General Process
Any help at all would be greatly appreciated
I'm unclear on what relationship there is between Table 1 and either of the other two, so this is going to be a bit general.
First, there are two options and both require a select statement to get the unique values of PlantCode out of table1, along with one of the PTypeId's associated with it, so let's do that:
select PlantCode, min(PTypeId)
from table1
group by PlantCode;
This gets the lowest valued PTypeId associated with the PlantCode. You could use max(PTypeId) instead which gets the highest value if you wanted: for 'USA' min will give you 11 and max will give you 12.
Having selected that data you can either write some code (C#, C++, java, whatever) to read through the results row by row and insert new data into table2 and table3. I'm not going to show that, but I'll show how the do it using pure SQL.
insert into table2 (PTypeId, PTypeName, PRID)
select PTypeId, 'YourChoiceOfName', 24 -- set PRID to 24 for all
from
(
select PlantCode, min(PTypeId) as PTypeId
from table1
group by PlantCode
) x;
and follow that with a similar insert.... select... for table3.
Hope that helps.

How to use SSIS to export data from multiple tables into one flat file?

I have 2 tables with different number of columns, and I need to export the data using SSIS to a text file. For example, I have customer table, tblCustomers; order table, tblOrders
tblCustomers (id, name, address, state, zip)
id name address state zip’
100 custA address1 NY 12345
99 custB address2 FL 54321
and
tblOrders(id, cust_id, name, quantity, total, date)
id cust_id name quantity total date
1 100 candy 10 100.00 04/01/2014
2 99 veg 1 2.00 04/01/2014
3 99 fruit 2 0.99 04/01/2014
4 100 veg 1 3.99 04/05/2014
The result file would be as following
“custA”, “100”, “recordtypeA”, “address1”, “NY”, “12345”
“custA”, “100”, “recordtypeB”, “candy”, “10”, “100.00”, “04/01/2014”
“custA”, “100”, “recordtypeB”, “veg”, “1”, “3.99”, “04/05/2014”
“custB”, “99”, “recordtypeA”, “address2”, “FL”, “54321”
“custB”, “99”, “recordtypeB”, “veg”, “1”, “2.00”, “04/01/2014”
“custB”, “99”, “recordtypeB”, “fruit”, “2”, “0.99”, “04/01/2014”
Can anyone please guild me as how to do this?
I presume you meant "guide", not "guild" - I hope your typing is more careful when you code?
I would create a Data Flow Task in an SSIS package. In that I would first add an OLE DB Source and point it at tblOrders. Then I would add a Lookup to add the data from tblCustomers, by matching tblOrders.Cust_id to tblCustomers.id.
I would use a SQL Query that joins the tables, and sets up the data, use that as a source and export that.
Note that the first row has 6 columns and the second one has 7. It's generally difficult (well not as easy as a standard file) to import these types of header/detail files. How is this file being used once created? If it needs to be imported somewhere you'd be better of just joining the data up and having 10 columns, or exporting them seperately.

LOAD DATA INFILE selectively with external file

I have a file called /tmp/files.txt in the following structure:
652083 8 -rw-r--r-- 1 david staff 1055 Mar 15 2012 ./Highstock-1.1.5/examples/scrollbar-disabled/index.htm
652088 0 drwxr-xr-x 3 david staff 102 May 31 2012 ./Highstock-1.1.5/examples/spline
652089 8 -rw-r--r-- 1 david staff 1087 Mar 15 2012 ./Highstock-1.1.5/examples/spline/index.htm
652074 0 drwxr-xr-x 3 david staff 102 May 31 2012 ./Highstock-1.1.5/examples/step-line
652075 8 -rw-r--r-- 1 david staff 1103 Mar 15 2012 ./Highstock-1.1.5/examples/step-line/index.htm
I want to insert the filename (col 9), filesize (col 7), and last_modified (col 8)into a mysql table, paths.
To insert the entire line, I can do something like:
LOAD DATA INFILE '/tmp/files.txt' INTO TABLE path
How would I selectively insert the required information into the necessary columns here?
Specify dummy MySQL user variables (e.g. #dummy1) as the target for the unwanted values.
LOAD DATA INFILE '/tmp/files.txt'
INTO TABLE path
(#d1, #d2, #d3, #d4, #d5, #d6, filesize, #mon, #day, #ccyy_or_hhmi, filename)
SET last_modified = CONCAT(#mon,' ',#day,' ',#ccyy_or_hhmi)
With that, the first six values from the input line are ignored (the values are assigned to the specified user variables, which we disregard.) The seventh value gets assigned to the filesize column, the eighth through tenth values (the month day and year/time are assigned to user variables, and then the eleventh value is assigned to the filename column.
Finally, we use an expression to concatenate the month, day and year/time values together, and assign it to the last_modified column. (NOTE: the resulting string is not guaranteed to be suitable for assigning to a DATE or DATETIME column, since that last value can either be a year, or it can be a time.)
(I've made the assumption that table path has columns named filesize, last_modified, and filename, and that there aren't other other columns in the table that need to be set.)
Followup
If the data to be loaded is the output of a find command, I would be tempted to use the -printf action of find, rather than -ls, so I would have control over the output produced. For example:
find . -type f -printf "%b\t%TY-%Tm-%Td %TH:%TM\t%p\n" >/tmp/myfiles.txt
That would give you three fields, separated by tabs:
size_in_blocks modified_yyyy_mm_dd_hh_mi filename
That would be very easy to load into a MySQL table:
LOAD DATA INFILE '/tmp/myfiles.txt'
INTO TABLE path
(filesize, last_modified, filename)

Compare two source and update sql server table in SSIS?

I have excel source and sql server table .
Excel Source column is
Mno Price1 Price2
111 10 20
222 30 25
333 40 30
444 34 09
555 23 abc
Sql server Table
Product table name
PId Mno Sprice BPrice
1 111 3 50
2 222 14 23
3 444 32 34
4 555 43 45
5 666 21 67
I want to compare excel source Mno(Model number) with sql server Product table Mno (Model number), and if it is same i want to update Sql server Product table SPrice and Bprice.
Please tell me what are the steps i want to do?
I want to validate that excel sheet also, because in excel Price2 column have string values
if it's string value i want to send mail which row data are wrong.
I am new for SSIS so please give me details.
Read your new data in a source, use a lookup component for existing data. Direct row matches to a oledb command for update, and a destination for your non-matches for inserts (if you want to enter new products).
Personally I think the simplest way to do this is to use a dataflow to bring the excel file into a staging table and do any clean up if need be. Then as the next step inmteh control flow have an Execute SQl task that does the update. Or if you need either an update or an insert if the record is new, use a Merge statement in the Execute SQl task.
You can use a Merge Join Transformation with a full outer join (remember to sort your datasets before they input to the Merge Join Transformation), then have the output go to a Conditional Split Transformation. The Conditional Split Transformation can determine whether or not a row needs to be updated, inserted, or deleted and direct the flow to the appropriate transform to do that.
This was off the top of my head, and there may be a simpler transform for this. I haven't had the opportunity to work with SSIS in almost a year, so I might be getting a bit rusty.