Load incomplete text file into using SSIS - ssis

I have a large text file with 7 columns with delimited pipe symbol (|). I need to load into table using SSIS. There are few rows in text file has no data for last column. Since it is not having data, it is just loading the last column as entire row.
For example
Id|name|lastname|firstname|sal|Email
1 |AABB|AA|BB|20|abc#gmail.com
2|XYZ|X|YZ|30
In this 2nd row I don't have data for email column and not ended with | symbol.
After loaded the data into table and data looks like
1 AABB AA BB 20 abc#gmail.com
2 XYZ X YZ 30 XYZ|X|YZ|30
Ideally it should load as blank or NULL for the 2nd row. But it is not happening.
Can anyone suggest me how to resolve this issue.

Related

Replace entire row based on duplicates columns in csv file

I have two csv files, each of which has two columns. File A is the master file which contains the order of the items, which is important. File B has some (but not all) updated information that needs to replace the old information in file A.
How do I replace the old values in column 2 of file A with the new values from column 2 of file B, but only where the values in column 1 are duplicates?
For example:
File A
Name
Number
Bob Smith
12
Mary West
67
Joe Soap
77
Edith Little
41
File B
Name
Number
Mary West
83
Edith Little
16
Desired result
Name
Number
Bob Smith
12
Mary West
83
Joe Soap
77
Edith Little
16
I feel like there should be a simple solution to this that I'm just missing, but I haven't had any luck with searching for a method.
Edit:
I attempted to solve the problem using replace duplicates in google sheets, which resulted in the correct values, but the order was lost. I ran up against the same problem using Sublime Text in that I can keep the new values quite easily, but I can't seem to find a way to keep them in the position of the old values.
Try the following
=INDEX(IFNA({Q2:Q7,IFERROR(VLOOKUP(Q2:Q7,T2:U5,2,0),R2:R9)}))
(Do adjust the formula according to your ranges and locale)

Create txt file with 3 columns from csv file

I am trying to create a txt file with 3 columns to use as a timing file for fMRI analysis.
I need to extract the information from one column in a csv file, and add two numbers that are constant for all rows.
The data are organised with titles for each columns representing a variable each, and otherwise numbers.
I'm having some trouble with this:
In the csv file, many rows are blank. I'm not a great coder, so everything I have tried in unix does not give me a txt file with 3 columns. I only get 2 columns in the rows that are blank in the csv file. Naively, I've tried stuff like:
awk -F, '{OFS=",";print $40, "2", "1"}' infile.csv > outfile.txt
I would prefer choosing columns to extract based on its title (e.g. "reactiontime") instead of its number ($40). Is it possible to do this somehow?
In my attempts, the data is still comma separated in the txt file, and I would prefer to change them to tab separated.
Does anyone have a general tip or piece of code on solving this kind of problem?
I would learn a lot from that and it would also help my immediate issue.
Best,
Andreas
The output I want is something like this:
2 2 1
4 2 1
5.5 2 1
7 2 1
8.8 2 1
9 2 1
11 2 1
12.2 2 1
15.5 2 1
20 2 1
21 2 1
25.5 2 1
27 2 1
The input looks like this:
trials_demo.thisRepN,trials_demo.thisTrialN,trials_demo.thisN,trials_demo.thisIndex,trials.thisRepN,trials.thisTrialN,trials.thisN,trials.thisIndex,img.started,img.stopped,key_resp.keys,key_resp.rt,key_resp.started,key_resp.stopped,startClick.started,startClick.stopped,textGender.started,textGender.stopped,choiceGender.response,choiceGender.rt,choiceGender.started,choiceGender.stopped,click.started,click.stopped,key_resp_3.keys,key_resp_3.rt,key_resp_3.started,key_resp_3.stopped,nextClick.started,nextClick.stopped,key_resp_2.keys,key_resp_2.rt,key_resp_2.started,key_resp_2.stopped,infoText.started,infoText.stopped,demo.response,demo.rt,demo.started,demo.stopped,foler.started,foler.stopped,instruksjoner.started,instruksjoner.stopped,happy.started,happy.stopped,sad.started,sad.stopped,fix_test1.started,fix_test1.stopped,fix_test2.started,fix_test2.stopped,text.started,text.stopped,key_scannerwait.keys,key_scannerwait.rt,key_scannerwait.started,key_scannerwait.stopped,profileImg_4.started,profileImg_4.stopped,you_2.started,you_2.stopped,yblack_2.started,yblack_2.stopped,nblack_2.started,nblack_2.stopped,fix1_2.started,fix1_2.stopped,fix2_2.started,fix2_2.stopped,profileImg_2.started,profileImg_2.stopped,word_2.started,word_2.stopped,tilbakemelding_2.started,tilbakemelding_2.stopped,attraktiv_2.response,attraktiv_2.rt,attraktiv_2.started,attraktiv_2.stopped,attraktivTekst_2.started,attraktivTekst_2.stopped,profileImg_3.started,profileImg_3.stopped,moodRating.response,moodRating.rt,moodRating.history,moodRating.started,moodRating.stopped,foler_2.started,foler_2.stopped,happy_2.started,happy_2.stopped,sad_2.started,sad_2.stopped,instruksjoner_2.started,instruksjoner_2.stopped,fix1r.started,fix1r.stopped,fix2r.started,fix2r.stopped,participant,session,date,expName,psychopyVersion,frameRate,
,,,,,,,,9.808291642460063,None,s,1.6973256938272243,10.258865118647009,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
,,,,,,,,,,,,,,9.808291642460063,None,12.015022674224383,None,Menn,0.9867124325364784,12.015022674224383,None,12.910541409969483,None,s,0.6776523915204962,12.910541409969483,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,13.612686069261144,None,r,23.58497321998675,16.580722495828013,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,13.612686069261144,None,None,None,3.002709451733608,None,43.17218368482372,49.17307505178451,43.17218368482372,49.17307505178451,43.17218368482372,49.17307505178451,43.17218368482372,49.17307505178451,40.19522619822328,42.705546306123324,42.67223233843197,43.17218368482372,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
1,0,1,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,None,None,3.0083721569972113,None,52.18802669584511,58.18914552356,52.18802669584511,58.18914552356,52.18802669584511,58.18914552356,52.18802669584511,58.18914552356,49.20497252200221,51.70471324137725,51.688084382191846,52.18802669584511,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
2,0,2,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,16,5.703,3.003153287607347,None,61.187218265838055,None,61.187218265838055,None,61.187218265838055,None,61.187218265838055,None,58.204176409363754,60.703914254686424,60.68726815118316,61.187218265838055,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,66.92143940778078,None,s,52.06480291182652,66.92143940778078,None,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,143,1,2020_Feb_07_1705,findr,3.2.4,59.768517041775716,
etc.

SSIS reading flatfile header column

Can you guys help me (point me in the right direction) on how I can achieve the following in SSIS.
So, I have a flatfile that looks like this
ColumnA ColumnB ColumnC ColumnD ColumnN
1 x APPLE Random1 MoreRandomData1
2 y ORANGE Random2 MoreRandomData2
3 z OTHER Random3 MoreRandomData3
... and I need to store these data into a table in the following format
ColumnA, ColumnB, BigBlurColumn
1 x ColumnC:APPLE, ColumnD:Random1, ColumnN:MoreRandomData1
2 y ColumnC:ORANGE, ColumnD:Random2, ColumnN:MoreRandomData2
3 z ColumnC:OTHER, ColumnD:Random3, ColumnN:MoreRandomData3
Here's my question:
1. How can i read the header/column of a flatfile?
2. Is it possible to pivot the result of #1
If I can managed to manipulate both #1 and #2 the reset will be fairly easy for me to do in SSIS, obviously I can script these however my client insist on using SSIS as this is there standard ETL tool.
Any ideas on how I can achieve above scenario?
Thanks
In the flat file connection manager, uncheck First row contains header option. Then go to Advanced Tab, delete all column and leave one and change its length to 4000.
In the data flow task, add a script component that split each row and:
Read the columns headers from the first row
Generate the desired output columns in all remaining rows
The following answers (different situations but they are helpful) will give you some insights:
SSIS ragged file not recognized CRLF
how to check column structure in ssis?
SSIS reading LF as terminator when its set as CRLF
Try dumping the data into a staging table and then use STRINGAGG() function to concatenate the data into the format you want and move it to the destination table.

comparing rows inside 2 text files ssis

I have 2 files as below. TextFile1.txt is the result after ssis package execution. But some rows are missing in this file. The original file which i should get is TextFile2.txt. So i want to compare these 2 files. I want to know the missing rows. File description is as below.
"TextFile1.txt contains 20 columns and 31449 rows
TextFile2.txt containts also 20 columns and 32447 rows."
your conditional split should test for the null keys coming from the merge join and direct them to the output to show which rows are missing.

How to parse data, split into multiple rows, and save to a flat file using SSIS?

I have source data as follows:
ID Data
1 text text text
2 text text text
In SSIS, I need to make a transformation which will result in
ID Data
1 text
1 text
1 text
2 text
2 text
2 text
The destination file needs to be a flat file. Is it possible to do this transformation? How? I tried a Derived column with ID + REPLACE((DT_WSTR,4000)Data," ","\n"), but that seems to be an incorrect approach.
I worked out this problem with the Derived Column and it's REPLACE string function so the spaces in texts are replaced by | + guide at Split multi value column into multiple records
In SSIS this is achieved using the UnPivot Transformation:
http://technet.microsoft.com/en-us/library/ms141723.aspx