Import Flat File containing multiline fields in SSIS - csv

I would like to import a flat file *.csv in SSIS. But one field is a multiline text. I do not have special record delimiter (and there is no way to get one), which is therefore the carriage return \r\n or CRLF.
The problem is : when SSIS meets a CRLF in a multiline field, he passes to the next line instead of continuing as the multiline field.
Here is the header and some first lines :
"name", "firstname", "description", "age"
"John", "Smith", "blablablablablabla", 25
"Fred", "Gordon", "blablabla
blablablabla", 33
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
This example above contains 1 header and 3 records. SSIS understands it as 1 header and 6 records and then get errors, of course.
I don't know how can i handle that problem.
Hope you should help me.

According to your example, the Description field values can contain multiple carriage returns that is causing the creation of new lines.
The following record appearing on multiple lines...
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
should appear like that below for SSIS to see the expected number of columns.
"Bill", "Buffalo", "bllllllllllllaaaaaaa blaaaaaaa blaalalalaaaaaaaaaa", 44
There are a couple of approaches to resolving the formatting issue.
If possible, the easiest approach is to follow up with the person who created the file and have them do it correctly. For example, assuming they're using SQL Server, then they can apply the following in their TSQL statement for the description field to replace the carriage returns with a blank. (Oracle also has a similar function.)
REPLACE(Description, CHAR(13),' ')
If you need to replace a line feed, then use CHAR(10).
Otherwise, I understand that contacting the source of the file is not always possible. In this case, you can modify the text file programmatically before feeding it into SSIS. The following link discusses how to apply Excel to do this where you can then save to a new csv file and then import that through SSIS.
http://www.mrexcel.com/forum/excel-questions/304939-importing-text-data-carriage-returns-into-excel.html
If you are looking at setting up the SSIS package in a job, then you can write a script task in the early part of your control flow that will do the same thing and bypass Excel. The VB code provided in the link can be easily adapted to a script task.
Hope this helps.

Given that the source of the text files cannot be contacted and that the number of columns in each csv will vary, the best option for performing an import is to proceed on a variation of option 2 of Answer #1. This will require some customization and the application of a script task in the control flow.
On the server where the SSIS package will be running, create a bucket folder where a temporary text file will be saved. Each time a CSV file is processed, a temporary file called "destFile.csv" will be created from it and this is what you will import. Each time a different csv file is processed by the script task, it will save to this temporary file and location.
Create two variables in the SSIS package. One for the source file and the second for the destination file.
Create a script task and define the two variables being sent to it.
Add the following C# to the script task and remember to replace at the top the assignments for source File and destination File. They should be set equal to the new user variables just created.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.IO;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string sourceFile = #"C:\test\tempfile.csv";
string line;
int count = 0;
int commaCount = 0;
int HeaderCommaCount = 0;
string templine;
string destinationFile = #"C:\test\destFile.csv";
List lines = new List();
// Delete temporary destination file if it already exists
if (File.Exists(destinationFile))
{
File.Delete(destinationFile);
}
// Create temporary destination file
File.Create(destinationFile).Dispose();
if (File.Exists(sourceFile))
{
StreamReader file = null;
try
{
file = new StreamReader(sourceFile);
while ((line = file.ReadLine()) != null)
{
// If Header line, get the number of commas. This is the base by which all following rows will be compared.
if (count == 0)
{
HeaderCommaCount = line.Split(',').Length - 1;
lines.Add(line); //save to a string array
count++;
}
else // This is any row following header row
{
commaCount = line.Split(',').Length - 1;
if (commaCount == HeaderCommaCount) //Row following header contains the correct number of columns
{
lines.Add(line); //save to a string array
count++;
}
else
{
templine = line;
// If comma count is less than that of Header row, continue reading rows until it does and then write.
while (commaCount != HeaderCommaCount)
{
line = file.ReadLine();
templine = templine + " " + line;
commaCount = templine.Split(',').Length - 1;
line = templine;
if (commaCount == HeaderCommaCount)
{
lines.Add(line); //save to a string array
}
}
}
}
}
}
finally
{
if (file != null)
file.Close();
}
}
File.WriteAllLines(destinationFile, lines); //send contents of string array to destination file.
//Console.ReadLine();
}
}
}
I wrote this quickly as a console application so that it would be easier to convert over to a C# script task. The file tested successfully where I applied your initial file example. It will iterate through the source text file and concatenate the lines together that have been split apart and then save to a destination file. The destination file is recreated and populated each time it is run. You can test this out first as a console application in Visual Studio and also apply a console.writeline(line) command just above or below where you see the lines.Add(line) in the code.
After this, all you need to do is import from the temporary destination file to your database.
Hope this helps.

Related

Parse a CSV after a PreProcessor script on JMeter

I'm trying to create a performance test on JMeter where I need to have a variable number of parameters.
This is the CSV file I'm using, so in this case I need 2 variables
inputParameter,var
7,v5
-2,v8
I found that it can be done by using JSR223 PreProcessor so I tried using this script
{
BufferedReader reader = new BufferedReader(new FileReader("path"));
String row = reader.readLine();
String[] header = row.split(",");
row = reader.readLine();
String[] values = row.split(",");
for (int i = 0; i < header.length; i++) {
String name = header[i];
String value = value[i];
sampler.addArgument(name, value);
}
}
This script creates the variables as it should and puts the value of the first row on it. But the problem I have is that I can't find a way to parse a CSV file after the script to change the varibales value.
I tried this
String value = "${"+name+"}";
But it does not get the value of ${imputParameter} that I get from the CSV Data Set Config, it just adds the value %24%7inputParameter%24%7
Is there any way to parse the CSV file after the script runs to modify the value of the variables created by it?
Thanks in advance!
Use vars
String value = vars.get(name);
vars - JMeterVariables - e.g.vars.get("VAR1");
Unfortunately your explanation doesn't make a lot of sense (at least for me), going forward consider:
Providing first 3 rows of your CSV file
Configuration of your CSV Data Set Config
Actual output of the HTTP Request sampler (Request -> Request Body) tab of the View Results Tree listener
Expected output of the HTTP Request sampler
Output of the Debug Sampler (Response Data -> Response Body tab of the View Results Tree listener)

Read a csv file that has a JSON column in SSIS?

I have the following CSV file that has 4 columns. The last column addresses holds 2 addresses history in a JSON format. I have tried to read it in SSIS but it splits the JSON along with the comma(,) instead of grouping all the addresses under one column.
I am using a flat-file connector for this. Is there any other source component for this type of content? How can I parse this in SSIS so that there are just 4 columns and the addresses appear all under one column?
id,title,name,addresses
J44011,Mr,James,"{""address_line_1"": 45, ""post_code"": ""XY7 10PG""},{""address_line_1"": 15, ""post_code"": ""AB7 1HG""}"
You can use a script component to process the JSON into its own detail table.
I created the following dataflow:
Here are the steps to the script component:
On inputs add ID and Address columns:
On inputs and outputs: add a new output and create columns (remember to program the datatypes:
The script:
public class Addresses
{
public int address_line_1 { get; set; }
public string post_code { get; set; }
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
//Test if addresses exist, if not leave the Row processing
if (string.IsNullOrEmpty(Row.addresses)) return;
//Fix Json to make it an array of objects
string json = string.Format("[{0}]", Row.addresses);
//Load into an array of Addressses
Addresses[] adds = new System.Web.Script.Serialization.JavaScriptSerializer().Deserialize<Addresses[]>(json);
//Process the array
foreach (var a in adds)
{
rowsAddressesBuffer.AddRow();
rowsAddressesBuffer.ID = Row.id;
rowsAddressesBuffer.Address1 = a.address_line_1;
rowsAddressesBuffer.PostalCode = a.post_code;
}
}
Notes:
The class added to store results.
The JSON had to be fixed to create an array of objects.
You need to add a reference to System.Web.Extensions.
This goes to the load. Make sure text qualifier is defined as a double quote (")
I have tried to read it in SSIS but it splits the JSON along the comma(,) instead of grouping all the addresses under one column.
In order to force SSIS to read the flat file row in 4 columns, you should open the flat file connection manager, go to Advanced Tab, and Add only 4 columns. Make sure the last column length is equal to 4000. This will force reading the 4th column without splitting it.
After importing data to SQL Server, you can parse the JSON content using OPENJSON() function
Parse and Transform JSON Data with OPENJSON (SQL Server)

SSIS Script Component or Task to check File Line Terminators and Fail if NOT CRLF

I'm a little new to using scripts for my ETL work and I couldn't find anything related to this other than to use a script to replace LF or CRLF with a value. Is it possible to use a script or something else to validate that my file uses CRLF line terminators only, and if it is anything but CRLF it fails the job.
I'm looking to fail this job so then I can report to the agency sending files that they need to follow specific format and so the only files loaded are CRLF files.
Thanks,
Found out a way to handle what I was asking. I ended up creating a script task before my Data flow to check the file to see if it contained "\r\n". With this I used two package variables and passed them through my script. Those variables were "FileName" (could be file path but I used the same name as what was being used in the package), "ErrorMessage" and "IsCrLf". The "IsCrLf" variable is a boolean variable which basically just checks to see if "\r\n" exists in the file. If not, the ErrorMessage will get populated and passed through to an e-mail alert.
Here is my code for my task:
public void Main()
{
using (StreamReader r = new StreamReader(Dts.Variables["User::FileName"].Value.ToString()))
{
string s1 = r.ReadToEnd();
string s2 = "\r\n";
bool b = s1.Contains(s2);
if (b)
{
Dts.Variables["User::IsCrLf"].Value = true;
}
else
{
Dts.Variables["User::ErrorMessage"].Value = Dts.Variables["User::FileName"].Value.ToString()+Environment.NewLine+"File does not contain the expected CRLF format.";
Dts.Events.FireError(0, "Error", "File does not contain the expected CRLF format.", string.Empty, 0);
Dts.TaskResult = (int)ScriptResults.Failure;
}
}
}

Using CSV file to read test data from

I need to test a various links of a site (no need to login) with 100's of users and loop it for some number of times using JMeter. I want to put those links in a "CSV file", so that all the links to be tested are read from file.
How do I accomplish this task?
Prepare kind of csv-file with list of your test-params and use it to parametrize your test-samplers, using at least the following:
CSV Data Set Config
Look into the following links for details:
How to get Jmeter to use CSV data for GET parameters?
Use jmeter to test multiple Websites
use csv parameters in jmeter httprequest path
Force a thread to use same input line when using CSV Data Set Config
Jmeter functions:
__CSVRead,
__StringFromFile.
Variables From CSV sampler from jmeter-plugins.
1. Prepare your test-urls in csv-file, e.g. in the following format:
url1
url2
...
urlN
Ensure that test-URLs don't contain http:// prefix (as per HTTP Request params -> Server).
2. Use schema for your script as below:
CSV Data Set Config:
Filename: [path to your csv-file with test-urls]
Variable Names: testURL
Recycle on EOF?: True
Stop thread on EOF?: False
Sharing mode: Current thread
Thread Group:
Number of Threads: N
Loop Count: M
HTTP Request // your http call
Server Name or IP: ${testURL} // use variable with extracted URL
This will start N users, each users will read M entries from list of test-urls. If M > number of entries in list of test-urls then user will recycle the list on EOF.
In one of the comments, it's mentioned that you can't read the CSV more than once per loop. You can go and have multiple threads, each reading the CSV file once, but then the file is close and won't be read on the next loop. Also, if you set the CSV to recycle, then CSV file is read over and over again indefinitely. So the question becomes how do you loop a CSV file a certain number of times as opposed to indefinitely?
I posted my answer to that in another post (https://stackoverflow.com/a/64086009/4832515), but I'll copy & paste it incase that link doesn't work in the future.
I couldn't find a simple solution to this. I ended up using beanshell scripts, which let you use code very similar to java to do some custom stuff. I made an example JMeter project to demonstrate how to do this (yes it's ridiculously complicated, considering all I want to do is repeat the CSV read):
Files:
my file structure:
JMeterExample
|
⊢--JMeterTests.jmx // the JMeter file
⊢--example.csv // the CSV file
contents of my CSV:
guest-id-1,"123 fake street",
guest-id-2,"456 fake street",
guest-id-3,"789 fake street",
so in this thread group, I'm going to just have 1 user, and I'll loop 2 times. I intend to send 1 request per CSV line. So there should be 6 requests sent total.
Thread Group
User Defined Variables
This is kind of optional, but the filepath is subject to change, and I don't like changing my scripts just for a change in configuration. So I store the CSV filename in a "User Defined Variables" node.
If you are storing the CSV file in the same directory as your JMeter test, you can just specify the filename only.
If you are saving the CSV in a folder other than the directory containing your JMeter file, you will need to supply an absolute path, and then slightly modify the beanshell script below: you'll need to comment out the line that loads the file relatively, and comment in the line that loads from an absolute path.
BeanShell Sampler to parse and store CSV lines
Add a Beanshell Sampler which will basically take in a path, and parse & store each line as a variable. The first line will be stored as a variable called csv_line_0, the 2nd line will be csv_line_1 and so on. I know it's not a clean solution but... I can't find any clean simple way of doing this clean simple task. I copied and pasted my code below.
import org.apache.jmeter.services.FileServer;
import java.text.*;
import java.io.*;
import java.util.*;
String temp = null;
ArrayList lines = new ArrayList();
BufferedReader bufRdr;
ArrayList strList = new ArrayList();
// get the file
try {
// you can use this line below if your csvFilePath is an absolute path
// File file = new File(${csvFilePath});
// you can use this line below if your csvFilepath is a relative path, relative to where you saved this JMeter file
File file = new File(org.apache.jmeter.services.FileServer.getFileServer().getBaseDir() + "/" + ${csvFilePath});
if (!file.exists()) {
throw new Exception ("ERROR: file " + filename + " not found");
}
bufRdr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF8"));
} catch(Exception e){
log.error("failed to load file");
log.error(e.getMessage());
return;
}
// For each CSV line, save it to a variable
int counter = 0;
while(true){
try{
temp = bufRdr.readLine();
if(temp == null || temp.equals("<EOF>")){
break;
}
lines.add(temp);
vars.put("csv_line_" + String.valueOf(counter), temp);
counter++;
} catch(Exception e){
log.error("failed to get next line");
log.error(e.getMessage());
break;
}
}
// store the number of CSV lines there are for the loop counter
vars.put("linesCount", String.valueOf(lines.size()));
Loop Controller
Add a Loop Controller that loops once for each CSV line. ${linesCount} is a count of the number of CSV lines and is calculated from the above beanShell script.
Beanshell script to extract data from current CSV Line
This script will run once per CSV line. It will go and grab the current line, and parse out whatever data is on it. You'll have to modify this script to get the data you want. In my example, I only had 2 columns, where column 1 is a "guestId", and column 2 is an "address".
__jm__loopController__idx is a variable JMeter defines for you, and is the index of the loop controller. The variable name is __jm__{loop controller name}__idx.
String index = vars.get("__jm__loopController__idx");
String line = vars.get("csv_line_" + index);
String [] tokens = line.split(",");
vars.put("guestId", tokens[0]);
vars.put("address", tokens[1]);
Http request sampler
Here's the HTTP request that's using the data extracted.
result
When running this, as desired, I end up sending 6 http requests over to the endpoint I defined.

how to insert excel data in a database with java

i want to insert data from an excel file into a local database in a UNIX server with java without any manipulation of data.
1- someone told me that i've to convert the excel file extension into .csv to conform with unix. i created a CSV file for each sheet (i've 12) with a macro. the problem is it changed the date format from DD-MM-YYYY to MM-DD-YYYY. how to avoid this?
2- i used LOAD DATA command to insert data from the CSV files to my database. there's a date colonne that is optionnaly specified in the excel file. so in CSV it become ,, so the load data doesn't work (an argument is needed). how can i fix this?
thanks for your help
It should be quite easy to read out the values from Excel with Apache POI. Then you save yourself the extra step of converting to another format and possible problems when your data contains comma and you convert to CSV.
Save the EXCEL file as CSV (comma separated values) format. It will make it easy to read and parse with fairly simple use of StringTokenizer.
Use MySQL (or SQLite depending on your needs) and JDBC to load data into the database.
Here is a CSVEnumeration class I developed:
package com.aepryus.util;
import java.util.*;
public class CSVEnumeration implements Enumeration {
private List<String> tokens = new Vector<String>();
private int index=0;
public CSVEnumeration (String line) {
for (int i=0;i<line.length();i++) {
StringBuffer sb = new StringBuffer();
if (line.charAt(i) != '"') {
while (i < line.length() && line.charAt(i) != ',') {
sb.append(line.charAt(i));
i++;
}
tokens.add(sb.toString());
} else {
i++;
while(line.charAt(i) != '"') {
sb.append(line.charAt(i));
i++;
}
i++;
tokens.add(sb.toString());
}
}
}
// Enumeration =================================================================
public boolean hasMoreElements () {
return index < tokens.size();
}
public Object nextElement () {
return tokens.get(index++);
}
}
If you break the lines of the CSV file up using split and then feed them one by one into the CSVEnumeration class, you can then step through the fields. Or here is some code I have lying around that uses StringTokenizer to parse the lines. csv is a string that contains the entire contents of the file.
StringTokenizer lines = new StringTokenizer(csv,"\n\r");
lines.nextToken();
while (lines.hasMoreElements()) {
String line = lines.nextToken();
Enumeration e = new CSVEnumeration(line);
for (int i=0;e.hasMoreElements();i++) {
String token = (String)e.nextElement();
switch (i) {
case 0:/* do stuff */;break;
}
}
}
I suggest MySQL for its performance and obviously open source.
Here comes two situations:
If you want just to store the excel cell values into the database. You can convert the excel to CSV format, so that you can simply LOAD DATA command in MySQL command.
If you have to do some manipulation before the values to get into the tables, I suggest Apache POI. I've used, that works so fine, whatever you're format of Excel you just have to use the correct implementation.
We are using SQLite in our java application. It's serveless, really simple to use and very efficient.