in my csv file data is like this
************* file format***************************
filename, abc
date,20141112
count,456765
id,1234
,,
,,
,,
name,address,occupation,id,customertype
sam,hjhjhjh,dr,1,s
michael,dr,2,m
tina,dr,4,s
*********************more than 30000 records in each load *************************************
i have got the file in above format and i want to take date and count from 2nd and 3rd row and than the data starts from 9th row. is it possible without script task i am not so good with scripting
can anyone plz help how t get this.
With out using a script task also it is possible to do. The flow is like...
Pull 2 DFT into your package, 1 to reformat your text file and split it to 2 separated text file. 1 for your 2nd & 3rd row and another 1 for more the 9th row. The another DFT will do your rest operation which is quite simple.
1st DFT--> Flat file source--> Row Number Transformation (You can get this new transformation from this link as per your sql version <http://microsoft-ssis.blogspot.in/p/ssis-addons.html>) -->conditional split (1-->RowNumber == 2 || RowNumber == 3,2-->RowNumber > 8)-->Put the result into 2 different flat files _1 & _2 as per your convenience naming.
Now you are ready with your required 2 flat files as source to your 2nd DFT...
*If it solves your problem, mark it as answer.
Related
how can we create a block size in JMeter with CSV config?
I have 5 multiple users and one Bulkuser.csv file with 4 columns,
The file has around 2000 values.
I wish to create a block of 400 values for my 5threads[users].
1st USER WILL USE 1st – 400 VALUES (Values in ROW 1-400)
2nd USER WILL USE NEXT 5 VALUES (Values in ROW 401-800)
and so on..
How can we implement this? is there a beanshell pre-processor script for each data read and decide to read the specific file as per thread number?
As of JMeter 5.3 this functionality is not supported, the only stable option I can think of is splitting your Bulkuser.csv into 5 separate files like user1.csv, user2.csv, etc. and use __threadNum() and __CSVRead() functions combination for accessing the data like:
${__CSVRead(user${__threadNum}.csv,0)} - reads the value from column 1 from user1.csv file for 1st thread (for 2nd thread it will be user2.csv file, etc)
${__CSVRead(user${__threadNum}.csv,1)} - reads the value from column 2
.....
${__CSVRead(user${__threadNum}.csv,next)} - proceeds to the next row
More information: How to Pick Different CSV Files at JMeter Runtime
I'm working on a flow where I get CSV files. I want to put the records into different directories based on the first field in the CSV record.
For ex, the CSV file would look like this
country,firstname,lastname,ssn,mob_num
US,xxxx,xxxxx,xxxxx,xxxx
UK,xxxx,xxxxx,xxxxx,xxxx
US,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
JP,xxxx,xxxxx,xxxxx,xxxx
I want to get the field value of the first field i.e, country. Put those records into a particular directory. US records goes to US directory, UK records goes to UK directory, and so on.
The flow that I have right now is:
GetFile ----> SplitText(line split count = 1 & header line count = 1) ----> ExtractText (line = (.+)) ----> PutFile(Directory = \tmp\data\${line:getDelimitedField(1)}). I need the header file to be replicated across all the split files for a different purpose. So I need them.
The thing is, the incoming CSV file gets split into multiple flow files with the header successfully. However, the regex that I have given in ExtractText processor evaluates it against the splitted flow files' CSV header instead of the record. So instead of getting US or UK in the "line" attribute, I always get "country". So all the files go to \tmp\data\country. Help me how to resolve this.
I believe getDelimitedField will only work off a singular line and is likely not moving past the newline in your split file.
I would advocate for a slightly different approach in which you could alter your ExtractText to find the country code through a regular expression and avoid the need to include the contents of the file as an attribute.
Using a regex of ^.*\n+(\w+) will capture the first line and the first set of word characters up to the comma and place them in the attribute name you specify in capture group 1. (e.g. country.1).
I have created a template that should get the value you are looking for available at https://github.com/apiri/nifi-review-collateral/blob/master/stackoverflow/42022249/Extract_Country_From_Splits.xml
I have a column with multiple items, separated by semicolons, in each row but i would like only the first item in each row. My data looks like this:
1 mmSM7.3.54;IGHV14-3*01;musIGHV236
2 mm7183.20.37;IGHV5-17*01;musIGHV219
3 mmIGHV5-9-1*02;musIGHV207;7183.14.25
4 mm7183.20.37;IGHV5-17*01;musIGHV219
5 mmIGHV7-1*03;S107.1.42
6 mmIGHV9-2*01;VH9.13;musIGHV242;VGAM3.8-2-59
7 mmmusIGHV231;SM7.2.49;IGHV14-2*01
I would like a column that has just the first item of each row that looks like this:
1 mmSM7.3.54
2 mm7183.20.37
3 mmIGHV5-9-1*02
4 mm7183.20.37
5 mmIGHV7-1*03
6 mmIGHV9-2*01
7 mmmusIGHV231
Does anyone know a way to do this? Any help would be great. Thank you.
Do you mean the first item in each record up to the first semicolon ; ? You did not mention file type, but if it is text, csv, rtf or xlsx, you can do this quickly and easily in Excel or most other spreadsheet applications.
1) Launch Excel, use File > Open (change file type to All Files .) and open your file
2) Select the column or all the cells that contain your data
3)Click the DATA tab and choose Text to Columns > Delimited > Next > check the Semicolon box > Finish
4) The first item in each record will now be in its own column. You can copy this column and save it in a new file or just delete all the stuff you don't want and save the original file in the same file format as before.
I have SSIS package that is supposed to load data into excel destination (template file).
The destination has first row is Title , 2nd row has headers so I do as follows,
Select * from [TemplateName$A2:$AD10000]
But what happens is it inserts first set of data (SQL source) into second row of template which contains header names and overwrites but if I select A3 istead, it gives error since mapping needs column names.
Please suggest, thanks.
I just did some testing with this and I get the same exact behavior:
MyCol
<----- Data should start here
1 <----- Data actually starts here
2
3
My suggestion would be to remove the column headings in the sheet, uncheck the "First row has column names" in your excel connection and then add the actually headings into your SQL data (Converting everything to text). Probably would be good to put in a Connect for this as well.
I have 2 CSV files almost identical with the following differences:
The first has a column, "date".
The second doesn't have "date" and also has 50 rows less than the 1st ("email").
They are a list of subscribers with date created. The second, however, is the updated list with subscribers who wanted to be removed, but this no longer has the date created.
Is there any way to import column "date" from 1st CSV into the 2nd CSV by making a reference to the "email" column so I can get the correct date of that subscriber?
Sorry, there seems to be not a ready made (probably an evening's worth of effort) command line tool available.
You could look at different ways, one complex way is to load it in tables, to the merge (using a select and join on the two tables) and export it back as csv.
The simplest I could think of was to use R (given that you have header names, in your CSV?):
csv1_data <- read.csv('/path/to/csv1.csv')
csv2_data <- read.csv('/path/to/csv2.csv')
merged_csv <- merge(csv1_data, csv2_data)
write.table(merged_csv,file="/path/to/merged_csv.csv",sep=",",row.names=T)
The first 2 lines load the data in R, the 3 line merges them using the default S3 method, the final line exports the result as a csv file, with the headers.
Hope this helps!