How to Parse the Text file in SSIS - ssis

Iam new to SSIS , Iam facing the below issue while parsing a text file which contains the below sample data
Below is the requirement
-> Need to Capture the number after IH1(454756567) and insert into one column as
InvoiceNumber
-> Need to insert the data between ABCD1234 to ABCD2345 into another column as
TotalRecord .
Many thanks for the help .
ABCD1234
IH1 454756567 686575634
IP2 HJKY TXRT
IBG 23455GHK
ABCD2345
IH1 689343256 686575634
IP2 HJKY TXRT
IBG 23455GHK
ABCD5678

This is the script component to process the entire file. You need to create your output and they are currently being processed as strings.
This assumes your file format is consistent. If you don't have 2 columns in IH1 and IP2 ALL the time. I would recommend a for loop from 1 to len -1 to process. And send the records to their own output.
public string recordID = String.Empty;
public override void CreateNewOutputRows()
{
string filePath = ""; //put your filepath here
using (System.IO.StreamReader sr = new System.IO.StreamReader(filePath))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
if (line.Substring(0, 4) == "ABCD") //Anything that identifies the start of a new record
// line.Split(' ').Length == 1 also meets your criteria.
{
recordID = line;
Output0Buffer.AddRow();
Output0Buffer.RecordID = line;
}
string[] cols = line.Split(' ');
switch (cols[0])
{
case "IH1":
Output0Buffer.InvoiceNumber = cols[1];
Output0Buffer.WhatEverTheSecondColumnIs = cols[2];
break;
case "IP2":
Output0Buffer.ThisRow = cols[1];
Output0Buffer.ThisRow2 = cols[2];
break;
case "IBG":
Output0Buffer.Whatever = cols[1];
break;
}
}
}
}

You'll need to do this in a script component.

Related

Read CSV first line using CsvBeanReader

I have a CSV file which has 3 columns and file does not have any header column but it has fixed pattern (like for first column, It will have url, Second and third column will have checksum). To process individual column values, I am using CSVBeanReader.
CSVBeanReader reads values from 2nd line with below code:
ICsvBeanReader beanReader = new CsvBeanReader(new FileReader(path),
CsvPreference.EXCEL_NORTH_EUROPE_PREFERENCE);
String[] header = beanReader.getHeader(true);
header = new String[] { "docURL", "shaCheckSum", null };
CellProcessor[] processors = new CellProcessor[3];
processors = getChecksumProcessors();
ValueObj docRecord;
while ((docRecord = beanReader.read(ValueObj.class, header, processors)) != null) {
docRecordList.add(docRecord);
}
private static CellProcessor[] getChecksumProcessors() {
return new CellProcessor[] { new NotNull(), new NotNull(), null };
}
How should I read first line of csv file using CSVBeanReader which contains data?
CSV file contains data from first line like below:
ftp://folder_struc/filename.pdf;checksum1;checksum2
Please let me know.
I guess you should omit the String[] header = beanReader.getHeader(true); line. Try just var header = new String[] {...}

Kettle PDI - Modified JavaScript - Json function not available

I'm using Kettle PDI 6.0 running on Windows Server 2012. I need to use the Modified Java Script Value to handle on Json object. I try something like this:
var jsondata = JSON.parse(result);
And get that:
"TypeError: Cannot find function parse in object test value test value test value test value test value test value test value test value test value test value. (script#3)"
I already try to looking for a solution on google, but not looks like that. I think that can be something wrong with my installation.
Note: I already try to use the command:
import java.util.*;
But that command is not recognized (Is not marked in bold).
I get:
missing ; before statement (script#2)
Maybe the Java functions not available.
I made my own function to resolve the problem. I will post here to help who has the same problem. If anyone want to help to solve the initial problem, I am still interested.
You can paste the code bellow on your "Modified Java Script Value" step after receive the Json response from service or get that on file. Note that you need to change the name of variables that you want to find on Json.
Result field is a Json Value.
//Script here
function findInArray(myValue, myArray){
var myResult='';
if(myArray.indexOf(myValue) > -1){
myResult = true;
} else {
myResult = false;
}
return myResult;
}
function getAttributeValue(Atribute, Object)
{
start = indexOf(Object,Atribute);
for (i= start; i < Object.length; i++)
{
if (substr(Object,i,1) == ":")
{
start_value = i+1;
break;
}
}
for (i= start_value; i < Object.length; i++)
{
end_value = i;
if (substr(Object,i,1) == ",")
{
break;
}
}
AttributeValue = replace(substr(Object, start_value, end_value-start_value),'"','');
if (indexOf(AttributeValue, "null") >= 0)
{
AttributeValue = null;
}
return AttributeValue ;
}
// Recupera Status
if (findInArray("status",result))
{
var status = getAttributeValue("status", result);
}
else
{
var status = "";
}
// Recupera _ID
if (findInArray("_id",result))
{
var mandrill_id = getAttributeValue("_id", result);
}
else
{
var mandrill_id = "";
}
// Recupera reject_reason
if (findInArray("reject_reason",result))
{
var reject_reason = replace(getAttributeValue("reject_reason", result),"}","");
}
else
{
var reject_reason = "";
}
yes, the parse json function is not available on the ex4 ecmascript of js rhino engine build in kettle, but you can handle json in kettle using eval.
var resultObj = eval('('+result+')');
//now you can iterate the foo elements of result original json
for(i=0;i< resultObj.length;i++){
Alert('foo number ' + i ' value = ' + resultObj[i].foo);
}
This is not javascript for the browser so eval is perfectly safe.

Exiting While Loop

I am trying to read values from a MySql database and check whether the id that needs to be updated, already exists in database or not. I have been able to make everything else in the program work except the part of checking database. Here is some of my code:
public void updateStatement() throws SQLException{
try
{
connnectDatabse();
}
catch (ClassNotFoundException e)
{
System.out.println("Could not connect to database..");
}
System.out.println("How many entries would you like to update?");
kb=new Scanner(System.in);
int numEntries = kb.nextInt();
int counter =0;
String newName=null, newDepartment =null;
int newSalary=0, newId =0;
int counterValues =0;
while(counterValues != numEntries){
System.out.println("Please enter 5 to view current entries in database\n");
selectStatement();
int idToUpdate =0;
boolean idVerify =false;
//Check if the user id exists in database or not
while(!(idVerify)){
System.out.println("\nPlease enter the ID of the record to update");
idToUpdate = kb.nextInt();
idFoundInDatabase(idArrayList, idToUpdate);
}
System.out.println("Please choose the number of column to update from below options.\n1.ID\n2.Name\n3.Salary\n4.Department");
int columnToUpdate = kb.nextInt();
switch(columnToUpdate){
case 1:
System.out.println("What will be the new id value for the selected ID?");
newId = kb.nextInt();
query = "update employee set ID = ? where ID = ?";
break;
case 2:
System.out.println("What will be the new name for the selected ID?");
newName = kb.next();
query = "update employee set Name = ? where ID = ?";
break;
case 3:
System.out.println("What will be the new salary for the selected ID?");
newSalary = kb.nextInt();
query = "update employee set Salary = ? where ID = ?";
break;
case 4:
System.out.println("What will be the new department for the selected ID?");
newDepartment = kb.next();
query = "update employee set Department = ? where ID = ?";
break;
default:
System.out.println("Correct option not chosen");
}
PreparedStatement st = conn.prepareStatement(query);
if(columnToUpdate ==1){
st.setInt(1, newId);
st.setInt(2, idToUpdate);
}
else if(columnToUpdate ==2){
st.setString(1, newName);
st.setInt(2, idToUpdate);
}
else if(columnToUpdate ==3){
st.setInt(1, newSalary);
st.setInt(2, idToUpdate);
}
else{
st.setString(1, newDepartment);
st.setInt(2, idToUpdate);
}
//execute the prepared statement
st.executeUpdate();
System.out.println("Record successfully updated..");
counterValues++;
}
}
//Code that I am unable to exit. This is
a separate method outside of updateStatement() method.
ArrayList contains the list of ids that are already in the database. ArrayList has been populated successfully.
public boolean idFoundInDatabase(ArrayList<String> arrayList, int id){
boolean validId = false;
while(validId == true){
String idRead = String.valueOf(id);
for (int i=0; i<arrayList.size(); i++){
String elementRead =arrayList.get(i);
if(elementRead.equals(idRead)){
validId = true;
break;
}
else{
validId= false;
}
}
}
return validId;
}
}
If required I am also posting the lines of code where I get the result set to make the array list of ids.
while(result.next()){
String id =result.getString("ID");
idArrayList.add(id);
if(choice == 1 || choice == 2 || choice == 3 || choice == 4){
resultIs = result.getString(columnName);
System.out.println(resultIs);
}
else{
System.out.println(result.getString("ID")+"\t"+result.getString("Name")+
"\t"+result.getString("Salary")+"\t"+result.getString("Department") );
}
}
Problem is exiting the above idFoundInDatabase method. It identifies if the id which user wants to update is in the database or not. On finding even the correct id which exists in the database, it just keeps on reading the values in array list, instead of going to return statement. Am I also doing something wrong where I call this method? Any help will be appreciated. Have been stuck on it for almost a day now. Have done debugging many many times. I am doing this just to get acquainted with jdbc and parameterized queries, so that I can follow the similar thing in a bigger project.Any help is appreciated. Thank You.
A few things:
Your while loop inside of idFoundInDatabase() is not necessary - one simple iteration through the for loop over arrayList is enough.
You are returning a boolean value from that function, but your calling method does not capture and use it, so idVerify is never being changed from false to true, so your input loop repeats forever.
All you need to do sort that out is change idFoundInDatabase(idArrayList, idToUpdate); to idVerify = idFoundInDatabase(idArrayList, idToUpdate); and then your input loop should successfully terminate when a valid id is found.

Sampling a datetimestamp and voltage from 1st line only of multiple.csv files

I wish to take selected data from a collection of csv files, i have written code but confused on its behaviour, it reads them all, what am i doing wrong please.
string[] array1 = Directory.GetFiles(WorkingDirectory, "00 DEV1 2????????????????????.csv"); //excludes "repaired" files from array, and "Averaged" logs, if found, note: does not exclude duplicate files if they exist (yet)
Console.WriteLine(" Number of Files found with the filter applied = {0,6}", (array1.Length));
int i = 1;
foreach (string name in array1)
{
// sampling engine loop here, take first line only, first column DateTimeStamp and second is Voltage
Console.Write("\r Number of File currently being processed = {0,6}", i);
i++;
var reader = new StreamReader(File.OpenRead(name)); // Static for testing only, to be replaced by file filter code
reader.ReadLine();
reader.ReadLine(); // skip headers, read and do nothing
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(',');
using (StreamWriter outfile = new StreamWriter(#"C:\\SampledFileResults.txt",true))
{
string content = "";
{
content = content + values[0] + ",";
content = content + values[9] + ",";
}
outfile.WriteLine(content);
Console.WriteLine(content);
}
}
} Console.WriteLine("SAMPLING COMPLETED");
Console.ReadLine();
Console.WriteLine("Test ended on {0}", (DateTime.Now));
Console.ReadLine();
}
}
You are using a while loop to read through all lines of the file. If you only want a single line, you can remove this loop.
Just delete the line:
while (!reader.EndOfStream)
{
And the accompanying close bracket
}

Checking for blank csv file, loop query,

im trying to stop empty csv files causing errors in my simple sampling program, just grab 2 values from each .csv file in folder,
i have null check, which now catches it, but im unsure how to re-structure my code so it skips file in array to next one, any assistance greatly welcomed,
foreach (string name in array1)
{
// sampling engine loop here, take first line only, first column DateTimeStamp and second is Voltage
Console.Write("\r Number of File currently being processed = {0,6}", i);
i++;
var reader = new StreamReader(File.OpenRead(name)); // Static for testing only, to be replaced by file filter code
var line = reader.ReadLine();
if (line == null)
{
Console.WriteLine("Null value detected");
Console.ReadKey();
break;
}
var values = line.Split(',');
reader.ReadLine();
if (values.Length == 89)
{
using (StreamWriter outfile = new StreamWriter(#"C:\\SampledFileResults.txt", true))
{
string content = "";
{
content = content + values[0] + ",";
content = content + values[9] + ",";
}
outfile.WriteLine(content);
Console.WriteLine(content);
}
}
}
Console.WriteLine("SAMPLING COMPLETED");