Parsing csv data format in apache Camel - csv

I followed an example from a book Camel in action. how to marshal and unmarshal csv data format. However, I want to unmarshal a csv file with (comma seperated delimiter) and split body. Then, I will use content based .choice to distribute messages according to required tasks.
In fact, The first and simple example didn't work for me. I used camel 2.15.6 (camel-core, camel-context, camel-csv, commons-csv) and java 7.
public void configure()
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal().csv()
.split(body())
.to("file:out");
}
Please find below the stack trace.

Can you try by removing noop=true? Actually, if noop is true, the file is not moved or deleted in any way. This option is good for readonly data, or for ETL type requirements.

Pass csv as a parameter like this:
public void configure()throws Exception
{
CsvDataFormat csv = new CsvDataFormat();
csv.setDelimiter(",");
from("file:test?noop=true")
.unmarshal(csv)
.split(body())
.to("file:out");
}
Or it will help you to set contain based routing:I filter according to header of CSV:
//Route 1 for filter CSV based on header
from("file:/home/r2/Desktop/csvFile?noop=true")
.choice().when(body().contains("partyName"))
.to("direct:partyNameCSV")
.when(body().contains("\"stuffName\""))
.to("direct:stuffNameCSV")
.otherwise().endChoice();
//Route 2 partyNameCSV
from("direct:partyNameCSV")
.unmarshal(csv)
.process(new PartyNameCSVProcessor())
.end();
//Route 3 stuffNameCSV
from("direct:stuffNameCSV")
.unmarshal(csv)
.process(new StuffCSVProcessor())
.end();

Related

How to split the data of NodeObject in Apache Flink

I'm using Flink to process the data coming from some data source (such as Kafka, Pravega etc).
In my case, the data source is Pravega, which provided me a flink connector.
My data source is sending me some JSON data as below:
{"key": "value"}
{"key": "value2"}
{"key": "value3"}
...
...
Here is my piece of code:
PravegaDeserializationSchema<ObjectNode> adapter = new PravegaDeserializationSchema<>(ObjectNode.class, new JavaSerializer<>());
FlinkPravegaReader<ObjectNode> source = FlinkPravegaReader.<ObjectNode>builder()
.withPravegaConfig(pravegaConfig)
.forStream(stream)
.withDeserializationSchema(adapter)
.build();
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<ObjectNode> dataStream = env.addSource(source).name("Pravega Stream");
dataStream.map(new MapFunction<ObjectNode, String>() {
#Override
public String map(ObjectNode node) throws Exception {
return node.toString();
}
})
.keyBy("word") // ERROR
.timeWindow(Time.seconds(10))
.sum("count");
As you see, I used the FlinkPravegaReader and a proper deserializer to get the JSON stream coming from Pravega.
Then I try to transform the JSON data into a String, KeyBy them and count them.
However, I get an error:
The program finished with the following exception:
Field expression must be equal to '*' or '_' for non-composite types.
org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:342)
org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:340)
myflink.StreamingJob.main(StreamingJob.java:114)
It seems that KeyBy threw this exception.
Well, I'm not a Flink expert so I don't know why. I've read the source code of the official example WordCount. In that example, there is a custtom splitter, which is used to split the String data into words.
So I'm thinking if I need to use some kind of splitter in this case too? If so, what kind of splitter should I use? Can you show me an example? If not, why did I get such an error and how to solve it?
I guess you have read the document about how to specify keys
Specify keys
The example codes use keyby("word") because word is a field of POJO type WC.
// some ordinary POJO (Plain old Java Object)
public class WC {
public String word;
public int count;
}
DataStream<WC> words = // [...]
DataStream<WC> wordCounts = words.keyBy("word").window(/*window specification*/);
In your case, you put a map operator before keyBy, and the output of this map operator is a string. So there is obviously no word field in your case. If you actually want to group this string stream, you need to write it like this .keyBy(String::toString)
Or you can even implement a customized keySelector to generate your own key.
Customized Key Selector

How can I continuously read a CSV file in Flink and remove the header

I am working with Flink streaming API and I want to continuously read CSV files from a folder, ignore the header and convert each row in the CSV file into a Java class (POJO). After all this processing, I should obtain a stream of Java objects(POJOs).
So far, I do the following to partially achieve the behavior(code below):
read the CSV files as regular text files, continuously
get a stream of strings from the CSV files
convert the stream of strings to a stream of Java objects
String path = "/home/cosmin/Projects/flink_projects/flink-java-project/data/";
TextInputFormat format = new TextInputFormat(
new org.apache.flink.core.fs.Path(path));
DataStream<String> inputStream = streamEnv.readFile(format, path, FileProcessingMode.PROCESS_CONTINUOUSLY, 100);
DataStream<MyEvent> parsedStream = inputStream
.map((line) -> {
String[] cells = line.split(",");
MyEvent event = new MyEvent(cells[1], cells[2], cells[3]);
return event;
});
However, with this I don't manage to remove the header line in each CSV file.
I have read that I can build a custom connector for reading CSV files by using createInput() or addSource () methods on the StreamExecutionEnvironment class.
Can you help with some guidance on how to achieve this, as I haven't found any examples beyond the Javadoc?
You could chain a filter function before your map function to filter out header lines
inputStream.filter(new FilterFunction<String>() {
public boolean filter(String line) {
if (line.contains("some header identifier")) return false;
else return true;
}
}).map(...) <Your map function as before>

Apache camel + csv + header

I have csv file as follows:
A;B;C
1;test;22
2;test2;33
where first line is a kind of header, and others are data. I have an issue to import all data rows with respect to header and report how many rows are correct and how many are not.
My first idea is to split source file to multiple files in the form of:
file1:
A;B;C
1;test;22
file2:
A;B;C
2;test2;33
How can I do this in camel, and how can I collect data necessary to print a summary report?
Take a look at Bean IO, and the Camel BeanIO component.
Looks like a good fit for your scenario.
You could probably build upon the example code on the first page of bean IO
BeanIO
http://beanio.org/
Camel BeanIO component
http://camel.apache.org/beanio.html
You should not need to split your incoming file if the only thing you need to do is collect and count successful and unsuccessful records.
If the CSV is not too big and fits in memory, I would read and convert the CSV file to a list of Java objects. The latest Camel CSV component can convert a CSV file into a List<Map>, before Camel 2.13 it produced List<List>. After having read converted CSV file into List of something you can write your own processor to iterate over the List and check its content.
You can unmarshall the file as a CSV file, remove the first line (header) and then do your validations as desired. Follow an example of camel route implementation
from("file:mydir/filename?noop=true")
.unmarshal()
.csv()
.process(validateFile())
.to("log:my.package?multiline=true")
Then you need to define the validateFile() method using the camel Processor
class like this:
public Processor validateFile() {
return new Processor() {
#override
public void process(Exchange exchange) throws Exception {
List<List<String>> data = (List<List<String>>) exchange.getIn().getBody();
String headerLine = data.remove(0);
System.out.println("header: "+headerLine);
System.out.println("total lines: "+data.size());
// iterate over each line
for( List<String> line : data) {
System.out.println("Total columns: "+line.size());
System.out.println(line.get(0)); // first column
}
}
};
}
In this method you can validate each file line/columns as you wish and then print it out or even write this report in other output file
Use as reference the File and CSV component page from Apache camel docs;
http://camel.apache.org/file.html
http://camel.apache.org/csv.html

groovy parse json and save to database

I want to parse a json file into objects, and save it to database. I just create a groovy script that runs in grails console(typing grails console in cmd line). I did not create grails app or domain class. Inside this small script, When I call save, I have
groovy.lang.MissingMethodException: No signature of method: Blog.save()
is applicable for argument types: () values: []
Possible solutions: wait(), any(), wait(long), isCase(java.lang.Object),
sleep(long), any(groovy.lang.Closure)
Am I missing something?
I'm also confused that if I do save, is it going to save data to a table called Blog? Should I build any database connection here? (Because I grails domain class, we don't need to. But is it different using pure groovy?)
Many Thanks!
import grails.converters.*
import org.codehaus.groovy.grails.web.json.*;
class Blog {
String title
String body
static mapping = {
body type:"text"
attachment type:"text"
}
Blog(title,body,slug){
this.title = title
this.body=body
}
}
here parse the json
// parse json
List parsedList =JSON.parse(new FileInputStream("c:/ning-blogs.json"), "UTF-8")
def blogs = parsedList.collect {JSONObject jsonObject ->
new Blog(jsonObject.get("title"),jsonObject.get("description"),"N/A");
}
loop blogs and save each object
for (i in blogs){
// println i.title; I'll get the information needed.
i.save();
}
I don't have large experience with grails, but from a quick googling seems like that for a class be treated like a model class, it will need to be either on the correct convention-package/dir or a legacy jar with hibernate mapping/JPA annotation. Thus your example can't work. Why not define that model in your model package?

how to insert excel data in a database with java

i want to insert data from an excel file into a local database in a UNIX server with java without any manipulation of data.
1- someone told me that i've to convert the excel file extension into .csv to conform with unix. i created a CSV file for each sheet (i've 12) with a macro. the problem is it changed the date format from DD-MM-YYYY to MM-DD-YYYY. how to avoid this?
2- i used LOAD DATA command to insert data from the CSV files to my database. there's a date colonne that is optionnaly specified in the excel file. so in CSV it become ,, so the load data doesn't work (an argument is needed). how can i fix this?
thanks for your help
It should be quite easy to read out the values from Excel with Apache POI. Then you save yourself the extra step of converting to another format and possible problems when your data contains comma and you convert to CSV.
Save the EXCEL file as CSV (comma separated values) format. It will make it easy to read and parse with fairly simple use of StringTokenizer.
Use MySQL (or SQLite depending on your needs) and JDBC to load data into the database.
Here is a CSVEnumeration class I developed:
package com.aepryus.util;
import java.util.*;
public class CSVEnumeration implements Enumeration {
private List<String> tokens = new Vector<String>();
private int index=0;
public CSVEnumeration (String line) {
for (int i=0;i<line.length();i++) {
StringBuffer sb = new StringBuffer();
if (line.charAt(i) != '"') {
while (i < line.length() && line.charAt(i) != ',') {
sb.append(line.charAt(i));
i++;
}
tokens.add(sb.toString());
} else {
i++;
while(line.charAt(i) != '"') {
sb.append(line.charAt(i));
i++;
}
i++;
tokens.add(sb.toString());
}
}
}
// Enumeration =================================================================
public boolean hasMoreElements () {
return index < tokens.size();
}
public Object nextElement () {
return tokens.get(index++);
}
}
If you break the lines of the CSV file up using split and then feed them one by one into the CSVEnumeration class, you can then step through the fields. Or here is some code I have lying around that uses StringTokenizer to parse the lines. csv is a string that contains the entire contents of the file.
StringTokenizer lines = new StringTokenizer(csv,"\n\r");
lines.nextToken();
while (lines.hasMoreElements()) {
String line = lines.nextToken();
Enumeration e = new CSVEnumeration(line);
for (int i=0;e.hasMoreElements();i++) {
String token = (String)e.nextElement();
switch (i) {
case 0:/* do stuff */;break;
}
}
}
I suggest MySQL for its performance and obviously open source.
Here comes two situations:
If you want just to store the excel cell values into the database. You can convert the excel to CSV format, so that you can simply LOAD DATA command in MySQL command.
If you have to do some manipulation before the values to get into the tables, I suggest Apache POI. I've used, that works so fine, whatever you're format of Excel you just have to use the correct implementation.
We are using SQLite in our java application. It's serveless, really simple to use and very efficient.