Apache camel + csv + header - csv

I have csv file as follows:
A;B;C
1;test;22
2;test2;33
where first line is a kind of header, and others are data. I have an issue to import all data rows with respect to header and report how many rows are correct and how many are not.
My first idea is to split source file to multiple files in the form of:
file1:
A;B;C
1;test;22
file2:
A;B;C
2;test2;33
How can I do this in camel, and how can I collect data necessary to print a summary report?

Take a look at Bean IO, and the Camel BeanIO component.
Looks like a good fit for your scenario.
You could probably build upon the example code on the first page of bean IO
BeanIO
http://beanio.org/
Camel BeanIO component
http://camel.apache.org/beanio.html
You should not need to split your incoming file if the only thing you need to do is collect and count successful and unsuccessful records.

If the CSV is not too big and fits in memory, I would read and convert the CSV file to a list of Java objects. The latest Camel CSV component can convert a CSV file into a List<Map>, before Camel 2.13 it produced List<List>. After having read converted CSV file into List of something you can write your own processor to iterate over the List and check its content.

You can unmarshall the file as a CSV file, remove the first line (header) and then do your validations as desired. Follow an example of camel route implementation
from("file:mydir/filename?noop=true")
.unmarshal()
.csv()
.process(validateFile())
.to("log:my.package?multiline=true")
Then you need to define the validateFile() method using the camel Processor
class like this:
public Processor validateFile() {
return new Processor() {
#override
public void process(Exchange exchange) throws Exception {
List<List<String>> data = (List<List<String>>) exchange.getIn().getBody();
String headerLine = data.remove(0);
System.out.println("header: "+headerLine);
System.out.println("total lines: "+data.size());
// iterate over each line
for( List<String> line : data) {
System.out.println("Total columns: "+line.size());
System.out.println(line.get(0)); // first column
}
}
};
}
In this method you can validate each file line/columns as you wish and then print it out or even write this report in other output file
Use as reference the File and CSV component page from Apache camel docs;
http://camel.apache.org/file.html
http://camel.apache.org/csv.html

Related

Parsing csv data format in apache Camel

I followed an example from a book Camel in action. how to marshal and unmarshal csv data format. However, I want to unmarshal a csv file with (comma seperated delimiter) and split body. Then, I will use content based .choice to distribute messages according to required tasks.
In fact, The first and simple example didn't work for me. I used camel 2.15.6 (camel-core, camel-context, camel-csv, commons-csv) and java 7.
public void configure()
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal().csv()
.split(body())
.to("file:out");
}
Please find below the stack trace.
Can you try by removing noop=true? Actually, if noop is true, the file is not moved or deleted in any way. This option is good for readonly data, or for ETL type requirements.
Pass csv as a parameter like this:
public void configure()throws Exception
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal(csv)
.split(body())
.to("file:out");
}
Or it will help you to set contain based routing:I filter according to header of CSV:
//Route 1 for filter CSV based on header
from("file:/home/r2/Desktop/csvFile?noop=true")
.choice().when(body().contains("partyName"))
.to("direct:partyNameCSV")
.when(body().contains("\"stuffName\""))
.to("direct:stuffNameCSV")
.otherwise().endChoice();
//Route 2 partyNameCSV
from("direct:partyNameCSV")
.unmarshal(csv)
.process(new PartyNameCSVProcessor())
.end();
//Route 3 stuffNameCSV
from("direct:stuffNameCSV")
.unmarshal(csv)
.process(new StuffCSVProcessor())
.end();

YAML or JSON library that supports inheritance

We are building a service. It has to read config from a file. We are currently using YAML and Jackson for deserializing the YAML. We have a situation where our YAML file needs to inherit/extend another YAML file(s). E.g., something like:
extends: base.yaml
appName: my-awesome-app
...
thus part of the config is stored in base.yaml. Is there any library that has support for this? Bonus points if it allows to inherit from more than one file. We could change to using JSON instead of YAML.
Neither JSON nor YAML have the ability to include files. Whatever you do will be a pre-processing step where you will be putting the base.yaml and your actual file together.
A crude way of doing this would be:
#include base.yaml
appName: my-awesome-app
Let this be your file. Upon loading, you first read the first line, and if it starts with #include, you replace it with the content of the included file. You need to do this recursively. This is basically what the C preprocessor does with C files and includes.
Drawbacks are:
even if both files are valid YAML, the result may not.
if either files includes a directive end or document end marker (--- or ...), you will end up with two separate documents in one file.
you cannot replace any values from base.yaml inside your file.
So an alternative would be to actually operate on the YAML structure. For this, you need the API of the YAML parser (SnakeYAML in your case) and parse your file with that. You should use the compose API:
private Node preprocess(final Reader myInput) {
final Yaml yaml = new Yaml();
final Node node = yaml.compose(myInput);
processIncludes(node);
return node;
}
private void processIncludes(final Node node) {
if (node instanceof MappingNode) {
final List<NodeTuple> values = ((MappingNode) node).getValue();
for (final NodeTuple tuple: values) {
if ("!include".equals(tuple.getKeyNode().getTag().getValue())) {
final String includedFilePath =
((ScalarNode) tuple.getValueNode()).getValue();
final Node content = preprocess(new FileReader(includedFilePath));
// now merge the content in your preferred way into the values list.
// that will change the content of the node.
}
}
}
}
public String executePreprocessor(final Reader source) {
final Node node = preprocess(source);
final StringWriter writer = new StringWriter();
final DumperOptions dOptions = new DumperOptions()
Serializer ser = new Serializer(new Emitter(writer, dOptions),
new Resolver(), dOptions, null);
ser.open();
ser.serialize(node);
ser.close();
return writer.toString();
}
This code would parse includes like this:
!include : base.yaml
appName: my-awesome-app
I used the private tag !include so that there will not be name clashes with any normal mapping key. Mind the space behind !include. I didn't give code to merge the included file because I did not know how you want to handle duplicate mapping keys. It should not be hard to implement though. Be aware of bugs, I have not tested this code.
The resulting String can be the input to Jackson.
Probably for the same desire, I have created this tool: jq-front.
You can do it by following syntax and combinating with yq command.
extends: [ base.yaml ]
appName: my-awesome-app
...
$ yq -j . your.yaml | jq-front | yq -y .
Note that you need to place file names to be extended in an array since the tool supports multiple inheritance.
Points potentially you don't like are
It's quite a bit slow. (But for configuration information, it might be ok since you can convert it to an expanded file once and you will never not the original one after that for your system)
Objects inside an array cannot behave as expected since the tool relies on * operator of jq.

Import Flat File containing multiline fields in SSIS

I would like to import a flat file *.csv in SSIS. But one field is a multiline text. I do not have special record delimiter (and there is no way to get one), which is therefore the carriage return \r\n or CRLF.
The problem is : when SSIS meets a CRLF in a multiline field, he passes to the next line instead of continuing as the multiline field.
Here is the header and some first lines :
"name", "firstname", "description", "age"
"John", "Smith", "blablablablablabla", 25
"Fred", "Gordon", "blablabla
blablablabla", 33
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
This example above contains 1 header and 3 records. SSIS understands it as 1 header and 6 records and then get errors, of course.
I don't know how can i handle that problem.
Hope you should help me.
According to your example, the Description field values can contain multiple carriage returns that is causing the creation of new lines.
The following record appearing on multiple lines...
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
should appear like that below for SSIS to see the expected number of columns.
"Bill", "Buffalo", "bllllllllllllaaaaaaa blaaaaaaa blaalalalaaaaaaaaaa", 44
There are a couple of approaches to resolving the formatting issue.
If possible, the easiest approach is to follow up with the person who created the file and have them do it correctly. For example, assuming they're using SQL Server, then they can apply the following in their TSQL statement for the description field to replace the carriage returns with a blank. (Oracle also has a similar function.)
REPLACE(Description, CHAR(13),' ')
If you need to replace a line feed, then use CHAR(10).
Otherwise, I understand that contacting the source of the file is not always possible. In this case, you can modify the text file programmatically before feeding it into SSIS. The following link discusses how to apply Excel to do this where you can then save to a new csv file and then import that through SSIS.
http://www.mrexcel.com/forum/excel-questions/304939-importing-text-data-carriage-returns-into-excel.html
If you are looking at setting up the SSIS package in a job, then you can write a script task in the early part of your control flow that will do the same thing and bypass Excel. The VB code provided in the link can be easily adapted to a script task.
Hope this helps.
Given that the source of the text files cannot be contacted and that the number of columns in each csv will vary, the best option for performing an import is to proceed on a variation of option 2 of Answer #1. This will require some customization and the application of a script task in the control flow.
On the server where the SSIS package will be running, create a bucket folder where a temporary text file will be saved. Each time a CSV file is processed, a temporary file called "destFile.csv" will be created from it and this is what you will import. Each time a different csv file is processed by the script task, it will save to this temporary file and location.
Create two variables in the SSIS package. One for the source file and the second for the destination file.
Create a script task and define the two variables being sent to it.
Add the following C# to the script task and remember to replace at the top the assignments for source File and destination File. They should be set equal to the new user variables just created.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.IO;
using System.Data;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string sourceFile = #"C:\test\tempfile.csv";
string line;
int count = 0;
int commaCount = 0;
int HeaderCommaCount = 0;
string templine;
string destinationFile = #"C:\test\destFile.csv";
List lines = new List();
// Delete temporary destination file if it already exists
if (File.Exists(destinationFile))
{
File.Delete(destinationFile);
}
// Create temporary destination file
File.Create(destinationFile).Dispose();
if (File.Exists(sourceFile))
{
StreamReader file = null;
try
{
file = new StreamReader(sourceFile);
while ((line = file.ReadLine()) != null)
{
// If Header line, get the number of commas. This is the base by which all following rows will be compared.
if (count == 0)
{
HeaderCommaCount = line.Split(',').Length - 1;
lines.Add(line); //save to a string array
count++;
}
else // This is any row following header row
{
commaCount = line.Split(',').Length - 1;
if (commaCount == HeaderCommaCount) //Row following header contains the correct number of columns
{
lines.Add(line); //save to a string array
count++;
}
else
{
templine = line;
// If comma count is less than that of Header row, continue reading rows until it does and then write.
while (commaCount != HeaderCommaCount)
{
line = file.ReadLine();
templine = templine + " " + line;
commaCount = templine.Split(',').Length - 1;
line = templine;
if (commaCount == HeaderCommaCount)
{
lines.Add(line); //save to a string array
}
}
}
}
}
}
finally
{
if (file != null)
file.Close();
}
}
File.WriteAllLines(destinationFile, lines); //send contents of string array to destination file.
//Console.ReadLine();
}
}
}
I wrote this quickly as a console application so that it would be easier to convert over to a C# script task. The file tested successfully where I applied your initial file example. It will iterate through the source text file and concatenate the lines together that have been split apart and then save to a destination file. The destination file is recreated and populated each time it is run. You can test this out first as a console application in Visual Studio and also apply a console.writeline(line) command just above or below where you see the lines.Add(line) in the code.
After this, all you need to do is import from the temporary destination file to your database.
Hope this helps.

groovy parse json and save to database

I want to parse a json file into objects, and save it to database. I just create a groovy script that runs in grails console(typing grails console in cmd line). I did not create grails app or domain class. Inside this small script, When I call save, I have
groovy.lang.MissingMethodException: No signature of method: Blog.save()
is applicable for argument types: () values: []
Possible solutions: wait(), any(), wait(long), isCase(java.lang.Object),
sleep(long), any(groovy.lang.Closure)
Am I missing something?
I'm also confused that if I do save, is it going to save data to a table called Blog? Should I build any database connection here? (Because I grails domain class, we don't need to. But is it different using pure groovy?)
Many Thanks!
import grails.converters.*
import org.codehaus.groovy.grails.web.json.*;
class Blog {
String title
String body
static mapping = {
body type:"text"
attachment type:"text"
}
Blog(title,body,slug){
this.title = title
this.body=body
}
}
here parse the json
// parse json
List parsedList =JSON.parse(new FileInputStream("c:/ning-blogs.json"), "UTF-8")
def blogs = parsedList.collect {JSONObject jsonObject ->
new Blog(jsonObject.get("title"),jsonObject.get("description"),"N/A");
}
loop blogs and save each object
for (i in blogs){
// println i.title; I'll get the information needed.
i.save();
}
I don't have large experience with grails, but from a quick googling seems like that for a class be treated like a model class, it will need to be either on the correct convention-package/dir or a legacy jar with hibernate mapping/JPA annotation. Thus your example can't work. Why not define that model in your model package?

Using CSV file to read test data from

I need to test a various links of a site (no need to login) with 100's of users and loop it for some number of times using JMeter. I want to put those links in a "CSV file", so that all the links to be tested are read from file.
How do I accomplish this task?
Prepare kind of csv-file with list of your test-params and use it to parametrize your test-samplers, using at least the following:
CSV Data Set Config
Look into the following links for details:
How to get Jmeter to use CSV data for GET parameters?
Use jmeter to test multiple Websites
use csv parameters in jmeter httprequest path
Force a thread to use same input line when using CSV Data Set Config
Jmeter functions:
__CSVRead,
__StringFromFile.
Variables From CSV sampler from jmeter-plugins.
1. Prepare your test-urls in csv-file, e.g. in the following format:
url1
url2
...
urlN
Ensure that test-URLs don't contain http:// prefix (as per HTTP Request params -> Server).
2. Use schema for your script as below:
CSV Data Set Config:
Filename: [path to your csv-file with test-urls]
Variable Names: testURL
Recycle on EOF?: True
Stop thread on EOF?: False
Sharing mode: Current thread
Thread Group:
Number of Threads: N
Loop Count: M
HTTP Request // your http call
Server Name or IP: ${testURL} // use variable with extracted URL
This will start N users, each users will read M entries from list of test-urls. If M > number of entries in list of test-urls then user will recycle the list on EOF.
In one of the comments, it's mentioned that you can't read the CSV more than once per loop. You can go and have multiple threads, each reading the CSV file once, but then the file is close and won't be read on the next loop. Also, if you set the CSV to recycle, then CSV file is read over and over again indefinitely. So the question becomes how do you loop a CSV file a certain number of times as opposed to indefinitely?
I posted my answer to that in another post (https://stackoverflow.com/a/64086009/4832515), but I'll copy & paste it incase that link doesn't work in the future.
I couldn't find a simple solution to this. I ended up using beanshell scripts, which let you use code very similar to java to do some custom stuff. I made an example JMeter project to demonstrate how to do this (yes it's ridiculously complicated, considering all I want to do is repeat the CSV read):
Files:
my file structure:
JMeterExample
|
⊢--JMeterTests.jmx // the JMeter file
⊢--example.csv // the CSV file
contents of my CSV:
guest-id-1,"123 fake street",
guest-id-2,"456 fake street",
guest-id-3,"789 fake street",
so in this thread group, I'm going to just have 1 user, and I'll loop 2 times. I intend to send 1 request per CSV line. So there should be 6 requests sent total.
Thread Group
User Defined Variables
This is kind of optional, but the filepath is subject to change, and I don't like changing my scripts just for a change in configuration. So I store the CSV filename in a "User Defined Variables" node.
If you are storing the CSV file in the same directory as your JMeter test, you can just specify the filename only.
If you are saving the CSV in a folder other than the directory containing your JMeter file, you will need to supply an absolute path, and then slightly modify the beanshell script below: you'll need to comment out the line that loads the file relatively, and comment in the line that loads from an absolute path.
BeanShell Sampler to parse and store CSV lines
Add a Beanshell Sampler which will basically take in a path, and parse & store each line as a variable. The first line will be stored as a variable called csv_line_0, the 2nd line will be csv_line_1 and so on. I know it's not a clean solution but... I can't find any clean simple way of doing this clean simple task. I copied and pasted my code below.
import org.apache.jmeter.services.FileServer;
import java.text.*;
import java.io.*;
import java.util.*;
String temp = null;
ArrayList lines = new ArrayList();
BufferedReader bufRdr;
ArrayList strList = new ArrayList();
// get the file
try {
// you can use this line below if your csvFilePath is an absolute path
// File file = new File(${csvFilePath});
// you can use this line below if your csvFilepath is a relative path, relative to where you saved this JMeter file
File file = new File(org.apache.jmeter.services.FileServer.getFileServer().getBaseDir() + "/" + ${csvFilePath});
if (!file.exists()) {
throw new Exception ("ERROR: file " + filename + " not found");
}
bufRdr = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF8"));
} catch(Exception e){
log.error("failed to load file");
log.error(e.getMessage());
return;
}
// For each CSV line, save it to a variable
int counter = 0;
while(true){
try{
temp = bufRdr.readLine();
if(temp == null || temp.equals("<EOF>")){
break;
}
lines.add(temp);
vars.put("csv_line_" + String.valueOf(counter), temp);
counter++;
} catch(Exception e){
log.error("failed to get next line");
log.error(e.getMessage());
break;
}
}
// store the number of CSV lines there are for the loop counter
vars.put("linesCount", String.valueOf(lines.size()));
Loop Controller
Add a Loop Controller that loops once for each CSV line. ${linesCount} is a count of the number of CSV lines and is calculated from the above beanShell script.
Beanshell script to extract data from current CSV Line
This script will run once per CSV line. It will go and grab the current line, and parse out whatever data is on it. You'll have to modify this script to get the data you want. In my example, I only had 2 columns, where column 1 is a "guestId", and column 2 is an "address".
__jm__loopController__idx is a variable JMeter defines for you, and is the index of the loop controller. The variable name is __jm__{loop controller name}__idx.
String index = vars.get("__jm__loopController__idx");
String line = vars.get("csv_line_" + index);
String [] tokens = line.split(",");
vars.put("guestId", tokens[0]);
vars.put("address", tokens[1]);
Http request sampler
Here's the HTTP request that's using the data extracted.
result
When running this, as desired, I end up sending 6 http requests over to the endpoint I defined.