Camel Bindy Streaming Payload and Writing to File - csv

I have a route which supposes to read a huge XML file and then write a CSV file with a header. XML Record needs to be transformed first so I map it to java POJO and then marshal it again to write into a csv file.
I can't load all of the records in memory as the file contains more 200k records.
Issue: I am only seeing the last record being added to the CSV file. Not sure why it's not appending the data into the existing file.
Any idea how to make it work. The header is required in CSV.I am not seeing any other option to directly transform the stream and write headers along with to CSV without unmarshalling it to Pojo first. I tried using BeanIO as well, which requires me to add a Header record and not sure how that can be injected into a stream.
from("{{xml.files.route}}")
.split(body().tokenizeXML("EMPLOYEE", null))
.streaming()
.unmarshal().jacksonXml(Employee.class)
.marshal(bindyDataFormat)
.to("file://C:/Files/Test/emp/csv/?fileName=test.csv")
.end();
If I try to append into the existing file then CSV file appends headers to each iteration of records.
.to("file://C:/Files/Test/emp/csv/?fileName=test.csv&fileExist=append")

Your problem here is related to camel-bindy and not the file-component. It kinda expects you to marshal collection objects instead of individual objects hence if you marshal each object individually and have #CsvRecord(generateHeaderColumns = true ) on your Employee class then you'll get headers every time you marshal an individual Employee object.
You could set generateHeaderColumns to false and start the file with headers string manually. One way to obtain headers for Bindy annotated class is to get fields annotated with DataField using org.apache.commons.lang3.reflect.FieldUtils from apache-commons and construct headers string based on position, columnName and fieldName.
I usually prefer camel-stream over file-component when I need to stream something to a file but using file-component with appends probably works just as well.
Example:
package com.example;
import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import org.apache.camel.RoutesBuilder;
import org.apache.camel.builder.RouteBuilder;
import org.apache.camel.dataformat.bindy.annotation.DataField;
import org.apache.camel.dataformat.bindy.csv.BindyCsvDataFormat;
import org.apache.camel.test.junit4.CamelTestSupport;
import org.apache.commons.lang3.reflect.FieldUtils;
import org.junit.Test;
public class ExampleTest extends CamelTestSupport {
#Test
public void testStreamEmployeesToCsvFile(){
List<Employee> body = new ArrayList<>();
body.add(new Employee("John", "Doe", 1965));
body.add(new Employee("Mary", "Sue", 1987));
body.add(new Employee("Gary", "Sue", 1991));
template.sendBody("direct:streamEmployeesToCSV", body);
}
#Override
protected RoutesBuilder createRouteBuilder() throws Exception {
return new RouteBuilder(){
#Override
public void configure() throws Exception {
BindyCsvDataFormat csvDataFormat = new BindyCsvDataFormat(Employee.class);
System.out.println(getCSVHeadersForClass(Employee.class, ","));
from("direct:streamEmployeesToCSV")
.setProperty("Employees", body())
// a bit hacky due to camel writing first entry and headers
// on the same line for some reason with (camel 2.25.2)
.setBody().constant("")
.to("file:target/testoutput?fileName=test.csv&fileExist=Override")
.setBody().constant(getCSVHeadersForClass(Employee.class, ","))
.to("stream:file?fileName=./target/testoutput/test.csv")
.split(exchangeProperty("Employees"))
.marshal(csvDataFormat)
.to("stream:file?fileName=./target/testoutput/test.csv")
.end()
.log("Done");
}
private String getCSVHeadersForClass(Class clazz, String separator ) {
Field[] fieldsArray = FieldUtils.getFieldsWithAnnotation(clazz, DataField.class);
List<Field> fields = new ArrayList<>(Arrays.asList(fieldsArray));
fields.sort(new Comparator<Field>(){
#Override
public int compare(Field lhsField, Field rhsField) {
DataField lhs = lhsField.getAnnotation(DataField.class);
DataField rhs = rhsField.getAnnotation(DataField.class);
return lhs.pos() < rhs.pos() ? -1 : (lhs.pos() > rhs.pos()) ? 1 : 0;
}
});
String[] fieldHeaders = new String[fields.size()];
for (int i = 0; i < fields.size(); i++) {
DataField dataField = fields.get(i).getAnnotation(DataField.class);
if(dataField.columnName().equals(""))
fieldHeaders[i] = fields.get(i).getName();
else
fieldHeaders[i] = dataField.columnName();
}
String csvHeaders = "";
for (int i = 0; i < fieldHeaders.length; i++) {
csvHeaders += fieldHeaders[i];
csvHeaders += i < fieldHeaders.length - 1 ? separator : "";
}
return csvHeaders;
}
};
}
}
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>${apache-commons.version}</version>
</dependency>

Related

Issue in Date handling for JSON Object

[enter image description here][1]I am facing some issue in Drools I want to pass date as a date type but currently we don't have any method in JSONObject to handle dates .My JSONObject looks like this.
{"id":600,"city":"Gotham","age":25,"startDate":"29-DEC-2017","endDate":"2014-08-31"}
My Drool condition looks like this.
package com.rules
import org.drools.core.spi.KnowledgeHelper;
import org.json.JSONObject;
rule "ComplexRule1"
salience 100
dialect "mvel"
date-effective "16-Jan-2018 00:00"
no-loop
when
$cdr : JSONObject( $cdr.optString("startDate") >= '28-Dec-2017')
then
$cdr.put("Action_1" , new JSONObject().put("actionName","Complex_Rule1_Action1").put("actionTypeName","SEND OFFER").put("channelName","SMS").put("messageTemplateName","SMSTemplate").put("#timestamp",(new java.text.SimpleDateFormat("yyyy/MM/dd HH:mm:ss")).format(new java.util.Date())).put("ruleFileName","ComplexRule1.drl").put("ruleName","ComplexRule1"));
end
I am currently using .optString Because we dont have any methods like optString/optInt/optBoolean for date. So how can I handle date in Drools?
Any help will be appreciated.
Regards Puneet
My new DRL looks like this :
package com.rules
import com.aravind.drools.SuperJSONObject;
import org.drools.core.spi.KnowledgeHelper;
import org.json.JSONObject;
rule "Convert to SuperJSONObject"
when
$cdr: JSONObject()
then
insert(new SuperJSONObject($cdr));
end
rule "ComplexRule1"
salience 100
dialect "mvel"
date-effective "16-Jan-2018 00:00"
no-loop
when
$cdr : SuperJSONObject( $cdr.getAsDate("startDate") == '28-Dec-2017')
then
$cdr.getObject().put("Action_1" , new JSONObject().put("actionName","Complex_Rule1_Action1").put("actionTypeName","SEND OFFER").put("channelName","SMS").put("messageTemplateName","SMSTemplate").put("#timestamp",(new java.text.SimpleDateFormat("yyyy/MM/dd HH:mm:ss")).format(new java.util.Date())).put("ruleFileName","ComplexRule1.drl").put("ruleName","ComplexRule1"));
end
Class look like this :
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import org.json.*;
public class SuperJSONObject {
public final JSONObject obj;
SimpleDateFormat sdfmt2= new SimpleDateFormat("yyyy/MM/dd");
public SuperJSONObject(JSONObject obj){
this.obj = obj;
}
public Date getAsDate(String field) throws ParseException{
return sdfmt2.parse(this.obj.optString(field));
}
public JSONObject getObject(){
return this.obj;
}
}
Another Class is like this
import java.io.File
import java.io.FileReader
import org.drools.KnowledgeBase
import org.drools.KnowledgeBaseFactory
import org.drools.builder.KnowledgeBuilder
import org.drools.builder.KnowledgeBuilderFactory
import org.drools.builder.ResourceType
import org.drools.io.ResourceFactory
import org.drools.runtime.StatefulKnowledgeSession
import org.json.JSONObject
object RunStandAloneDrools {
def main(args: Array[String]): Unit = {
var jsonObjectArray: Array[JSONObject] = new Array(1)
jsonObjectArray(0) = new JSONObject("{\"id\":600,\"city\":\"Gotham\",\"age\":25,\"startDate\":\"28-Dec-2017\",\"endDate\":\"2014-08-01\"}")
var file: String = "/home/puneet/Downloads/ComplexRule1.drl"
var kbuilder: KnowledgeBuilder = KnowledgeBuilderFactory.newKnowledgeBuilder()
kbuilder.add(ResourceFactory.newReaderResource(new FileReader(new File(file))), ResourceType.DRL)
println("Errors? " + kbuilder.getErrors.size())
var iter = kbuilder.getErrors.iterator()
while(iter.hasNext()){
println(iter.next().getMessage)
}
var kbase: KnowledgeBase = KnowledgeBaseFactory.newKnowledgeBase()
kbase.addKnowledgePackages(kbuilder.getKnowledgePackages)
var session: StatefulKnowledgeSession = kbase.newStatefulKnowledgeSession()
callRulesEngine(jsonObjectArray,session)
println("Done")
}
def callRulesEngine(data: Array[JSONObject], knowledgeSession: StatefulKnowledgeSession): Unit = {
data.map ( x => callRulesEngine(x,knowledgeSession) )
}
def callRulesEngine(data: JSONObject, knowledgeSession: StatefulKnowledgeSession): Unit = {
try {
println("Input data " + data.toString())
knowledgeSession.insert(data)
knowledgeSession.fireAllRules()
println("Facts details " + knowledgeSession.getFactCount)
println("Enriched data " + data.toString())
} catch {
case (e: Exception) => println("Exception", e);
}
}
`
Output is not coming as per expectations
There are multiple ways to deal with this, but the fundamental thing is for you to understand that this is NOT a Drools issue at all. Your question is more on how do get a Date from a JSONObject.
One way this could be achieved is by using a function in Drools to make the conversion.
But I don't like functions, so I'll give you another, more elaborated, way to deal with this situation (and many others where a type conversion is required).
The idea is to create a wrapper class for your JSONObject- a SuperJSONObject- that will expose all the functionality you need. For the implementation of this class I will be using composition, but you can use inheritance (or a proxy) if you want.
public class SuperJSONObject {
public final JSONObject obj;
public SuperJSONObject(JSONObject obj){
this.obj = obj;
}
//expose all the methods from JSONObject you want/need
public Date getAsDate(String field){
return someDateParser.parse(this.obj.optString(field));
}
public JSONObject getObject(){
return this.obj;
}
}
So now we have a getAsDate() method that we can use in our rules. But we first need to convert a JSONObject into a SuperJSONObject before we can even use that method. You can do this in multiple ways and places. I'll be showing how to do it in DRL.
rule "Convert to SuperJSONObject"
when
$jo: JSONObject() //you may want to filter which objects are converted by adding constraints to this pattern
then
insert(new SuperJSONObject($jo));
end
And now we are good to go. We can now write a rule using this new class as follows:
rule "ComplexRule1"
salience 100
dialect "mvel"
date-effective "16-Jan-2018 00:00"
no-loop
when
$cdr : SuperJSONObject( getAsDate("startDate") >= '28-Dec-2017')
then
$cdr.getObject().put("Action_1" , ...);
end
After I have written all this code, I might reconsider the option of a simple function in DRL... :P
Hope it helps,

Camel bindy marshal to file creates multiple header row

I have the following camel route:
from(inputDirectory)
.unmarshal(jaxb)
.process(jaxb2CSVDataProcessor)
.split(body()) //because there is a list of CSVRecords
.marshal(bindyCsvDataFormat)
.to(outputDirectory); //appending to existing file using "?autoCreate=true&fileExist=Append"
for my CSV model class I am using annotations:
#CsvRecord(separator = ",", generateHeaderColumns = true)
...
and for properties
#DataField(pos = 0)
...
My problem is that the headers are appended every time a new csv record is appended.
Is there a non-dirty way to control this? Am I missing anything here?
I made a work around which is working quite nicely, creating the header by querying the columnames of the #DataField annotation. This is happening once the first time the file is written. I wrote down the whole solution here:
How to generate a Flat file with header and footer using Camel Bindy
I ended up adding a processor that checks if the csv file exists just before the "to" clause. In there I do a manipulation of the byte array and remove the headers.
Hope this helps anyone else. I needed to do something similar where after my first split message I wanted to supress the header output. Here is a complete class (the 'FieldUtils' is part of the apache commons lib)
package com.routes;
import java.io.OutputStream;
import org.apache.camel.Exchange;
import org.apache.camel.dataformat.bindy.BindyAbstractFactory;
import org.apache.camel.dataformat.bindy.BindyCsvFactory;
import org.apache.camel.dataformat.bindy.BindyFactory;
import org.apache.camel.dataformat.bindy.FormatFactory;
import org.apache.camel.dataformat.bindy.csv.BindyCsvDataFormat;
import org.apache.commons.lang3.reflect.FieldUtils;
public class StreamingBindyCsvDataFormat extends BindyCsvDataFormat {
public StreamingBindyCsvDataFormat(Class<?> type) {
super(type);
}
#Override
public void marshal(Exchange exchange, Object body, OutputStream outputStream) throws Exception {
final StreamingBindyModelFactory factory = (StreamingBindyModelFactory) super.getFactory();
final int splitIndex = exchange.getProperty(Exchange.SPLIT_INDEX, -1, int.class);
final boolean splitComplete = exchange.getProperty(Exchange.SPLIT_COMPLETE, false, boolean.class);
super.marshal(exchange, body, outputStream);
if (splitIndex == 0) {
factory.setGenerateHeaderColumnNames(false); // turn off header generate after first exchange
} else if(splitComplete) {
factory.setGenerateHeaderColumnNames(true); // turn on header generate when split complete
}
}
#Override
protected BindyAbstractFactory createModelFactory(FormatFactory formatFactory) throws Exception {
BindyCsvFactory bindyCsvFactory = new StreamingBindyModelFactory(getClassType());
bindyCsvFactory.setFormatFactory(formatFactory);
return bindyCsvFactory;
}
public class StreamingBindyModelFactory extends BindyCsvFactory implements BindyFactory {
public StreamingBindyModelFactory(Class<?> type) throws Exception {
super(type);
}
public void setGenerateHeaderColumnNames(boolean generateHeaderColumnNames) throws IllegalAccessException {
FieldUtils.writeField(this, "generateHeaderColumnNames", generateHeaderColumnNames, true);
}
}
}

Processing JSON using java Mapreduce

I am new to hadoop mapreduce
I have input text file where data has been stored as follow. Here are only a few tuples (data.txt)
{"author":"Sharīf Qāsim","book":"al- Rabīʻ al-manshūd"}
{"author":"Nāṣir Nimrī","book":"Adīb ʻAbbāsī"}
{"author":"Muẓaffar ʻAbd al-Majīd Kammūnah","book":"Asmāʼ Allāh al-ḥusná al-wāridah fī muḥkam kitābih"}
{"author":"Ḥasan Muṣṭafá Aḥmad","book":"al- Jabhah al-sharqīyah wa-maʻārikuhā fī ḥarb Ramaḍān"}
{"author":"Rafīqah Salīm Ḥammūd","book":"Taʻlīm fī al-Baḥrayn"}
This is my java file that I am supposed to write my code in (CombineBooks.java)
package org.hwone;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.GenericOptionsParser;
//TODO import necessary components
/*
* Modify this file to combine books from the same other into
* single JSON object.
* i.e. {"author": "Tobias Wells", "books": [{"book":"A die in the country"},{"book": "Dinky died"}]}
* Beaware that, this may work on anynumber of nodes!
*
*/
public class CombineBooks {
//TODO define variables and implement necessary components
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: CombineBooks <in> <out>");
System.exit(2);
}
//TODO implement CombineBooks
Job job = new Job(conf, "CombineBooks");
//TODO implement CombineBooks
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
My task is to create a Hadoop program in “CombineBooks.java”
returned in the “question-2” directory. The program should do
the following: Given the input author-book tuples, map-reduce
program should procude a JSON object which contains all the
books from same author in a JSON array, i.e.
{"author": "Tobias Wells", "books":[{"book":"A die in the country"},{"book": "Dinky died"}]}
Any idea how it can be done ?
First, the JSON objects you are trying to work with are not available for you. To solve this:
Go here and download as zip: https://github.com/douglascrockford/JSON-java
Extract to your sources folder in subdirectory org/json/*
Next, the first line of your code makes a package "org.json", which is incorrect, you shold create a separate package, for instance "my.books".
Third, using combiner here is useless.
Here's the code I ended up with, it works and solves your problem:
package my.books;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.json.*;
import javax.security.auth.callback.TextInputCallback;
public class CombineBooks {
public static class Map extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
String author;
String book;
String line = value.toString();
String[] tuple = line.split("\\n");
try{
for(int i=0;i<tuple.length; i++){
JSONObject obj = new JSONObject(tuple[i]);
author = obj.getString("author");
book = obj.getString("book");
context.write(new Text(author), new Text(book));
}
}catch(JSONException e){
e.printStackTrace();
}
}
}
public static class Reduce extends Reducer<Text,Text,NullWritable,Text>{
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException{
try{
JSONObject obj = new JSONObject();
JSONArray ja = new JSONArray();
for(Text val : values){
JSONObject jo = new JSONObject().put("book", val.toString());
ja.put(jo);
}
obj.put("books", ja);
obj.put("author", key.toString());
context.write(NullWritable.get(), new Text(obj.toString()));
}catch(JSONException e){
e.printStackTrace();
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
if (args.length != 2) {
System.err.println("Usage: CombineBooks <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "CombineBooks");
job.setJarByClass(CombineBooks.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Here's the folder structure of my project:
src
src/my
src/my/books
src/my/books/CombineBooks.java
src/org
src/org/json
src/org/json/zip
src/org/json/zip/BitReader.java
...
src/org/json/zip/None.java
src/org/json/JSONStringer.java
src/org/json/JSONML.java
...
src/org/json/JSONException.java
Here's the input
[localhost:CombineBooks]$ hdfs dfs -cat /example.txt
{"author":"author1", "book":"book1"}
{"author":"author1", "book":"book2"}
{"author":"author1", "book":"book3"}
{"author":"author2", "book":"book4"}
{"author":"author2", "book":"book5"}
{"author":"author3", "book":"book6"}
The command to run:
hadoop jar ./bookparse.jar my.books.CombineBooks /example.txt /test_output
Here's the output:
[pivhdsne:CombineBooks]$ hdfs dfs -cat /test_output/part-r-00000
{"books":[{"book":"book3"},{"book":"book2"},{"book":"book1"}],"author":"author1"}
{"books":[{"book":"book5"},{"book":"book4"}],"author":"author2"}
{"books":[{"book":"book6"}],"author":"author3"}
You can use on of the three options to put the org.json.* classes into your cluster:
Pack the org.json.* classes into your jar file (can easily be done using GUI IDE). This is the option I used in my answer
Put the jar file containing org.json.* classes on each of the cluster nodes into one of the CLASSPATH directories (see yarn.application.classpath)
Put the jar file containing org.json.* into HDFS (hdfs dfs -put <org.json jar> <hdfs path>) and use job.addFileToClassPath call for this jar file to be available for all of the tasks executing your job on the cluster. In my answer you should add job.addFileToClassPath(new Path("<jar_file_on_hdfs_location>")); to the main
Refer for splittable multi-line JSON:
https://github.com/alexholmes/json-mapreduce

Extract first line of CSV file in Pig

I have several CSV files and the header is always the first line in the file. What's the best way to get that line out of the CSV file as a string in Pig? Preprocessing with sed, awk etc is not an option.
I've tried loading the file with regular PigStorage and the Piggy bank CsvLoader, but its not clear to me how I can get that first line, if at all.
I'm open to writing an UDF, if that's what it takes.
Disclaimer: I'm not great with Java.
You are going to need a UDF. I'm not sure exactly what you are asking for, but this UDF will take a series of CSV files and turn them into maps, where the keys are the values at the top of the file. This should hopefully be enough of a skeleton so that you can change it into what you want.
The couple of tests I've done remotely and locally indicate that this will work.
package myudfs;
import java.io.IOException;
import org.apache.pig.LoadFunc;
import java.util.Map;
import java.util.HashMap;
import java.util.ArrayList;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.pig.PigException;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
public class ExampleCSVLoader extends LoadFunc {
protected RecordReader in = null;
private String fieldDel = "" + '\t';
private Map<String, String> outputMap = null;
private TupleFactory mTupleFactory = TupleFactory.getInstance();
// This stores the fields that are defined in the first line of the file
private ArrayList<Object> topfields = null;
public ExampleCSVLoader() {}
public ExampleCSVLoader(String delimiter) {
this();
this.fieldDel = delimiter;
}
#Override
public Tuple getNext() throws IOException {
try {
boolean notDone = in.nextKeyValue();
if (!notDone) {
outputMap = null;
topfields = null;
return null;
}
String value = in.getCurrentValue().toString();
String[] values = value.split(fieldDel);
Tuple t = mTupleFactory.newTuple(1);
ArrayList<Object> tf = new ArrayList<Object>();
int pos = 0;
for (int i = 0; i < values.length; i++) {
if (topfields == null) {
tf.add(values[i]);
} else {
readField(values[i], pos);
pos = pos + 1;
}
}
if (topfields == null) {
topfields = tf;
t = mTupleFactory.newTuple();
} else {
t.set(0, outputMap);
}
outputMap = null;
return t;
} catch (InterruptedException e) {
int errCode = 6018;
String errMsg = "Error while reading input";
throw new ExecException(errMsg, errCode,
PigException.REMOTE_ENVIRONMENT, e);
}
}
// Applies foo to the appropriate value in topfields
private void readField(String foo, int pos) {
if (outputMap == null) {
outputMap = new HashMap<String, String>();
}
outputMap.put((String) topfields.get(pos), foo);
}
#Override
public InputFormat getInputFormat() {
return new TextInputFormat();
}
#Override
public void prepareToRead(RecordReader reader, PigSplit split) {
in = reader;
}
#Override
public void setLocation(String location, Job job)
throws IOException {
FileInputFormat.setInputPaths(job, location);
}
}
Sample output loading a directory with:
csv1.in csv2.in
------- ---------
A|B|C D|E|F
Hello|This|is PLEASE|WORK|FOO
FOO|BAR|BING OR|EVERYTHING|WILL
BANG|BOSH BE|FOR|NAUGHT
Produces this output:
A: {M: map[]}
()
([D#PLEASE,E#WORK,F#FOO])
([D#OR,E#EVERYTHING,F#WILL])
([D#BE,E#FOR,F#NAUGHT])
()
([A#Hello,B#This,C#is])
([A#FOO,B#BAR,C#BING])
([A#BANG,B#BOSH])
The ()s are the top lines of the file. getNext() requires that we return something, otherwise the file will stop being processed. Therefore they return a null schema.
If your CSV comply with CSV conventions of Excel 2007 you can use already available loader from Piggybank http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java?view=markup
It has an option to skip the CSV header SKIP_INPUT_HEADER

Using json with Play 2

I'm trying to create a simple application that allows me to create, read, update and delete various users. I have a basic UI-based view, controller and model that work, but wanted to be more advanced than this and provide a RESTful json interface.
However, despite reading everything I can find in the Play 2 documentation, the Play 2 Google groups and the stackoverflow website, I still can't get this to work.
I've updated my controller based on previous feedback and I now believe it is based on the documentation.
Here is my updated controller:
package controllers;
import models.Member;
import play.*;
import play.mvc.*;
import play.libs.Json;
import play.data.Form;
public class Api extends Controller {
/* Return member info - version to serve Json response */
public static Result member(Long id){
ObjectNode result = Json.newObject();
Member member = Member.byid(id);
result.put("id", member.id);
result.put("email", member.email);
result.put("name", member.name);
return ok(result);
}
// Create a new body parser of class Json based on the values sent in the POST
#BodyParser.Of(Json.class)
public static Result createMember() {
JsonNode json = request().body().asJson();
// Check that we have a valid email address (that's all we need!)
String email = json.findPath("email").getTextValue();
if(name == null) {
return badRequest("Missing parameter [email]");
} else {
// Use the model's createMember class now
Member.createMember(json);
return ok("Hello " + name);
}
}
....
But when I run this, I get the following error:
incompatible types [found: java.lang.Class<play.libs.Json>] [required: java.lang.Class<?extends play.mvc.BodyParser>]
In /Users/Mark/Development/EclipseWorkspace/ms-loyally/loyally/app/controllers/Api.java at line 42.
41 // Create a new body parser of class Json based on the values sent in the POST
42 #BodyParser.Of(Json.class)
43 public static Result createMember() {
44 JsonNode json = request().body().asJson();
45 // Check that we have a valid email address (that's all we need!)
46 String email = json.findPath("email").getTextValue();
As far as I can tell, I've copied from the documentation so I would appreciate any help in getting this working.
There appear to be conflicts in the use of the Json class in the Play 2 documentation. To get the example above working correctly, the following imports are used:
import play.mvc.Controller;
import play.mvc.Result;
import play.mvc.BodyParser;
import play.libs.Json;
import play.libs.Json.*;
import static play.libs.Json.toJson;
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.node.ObjectNode;
#BodyParser.Of(play.mvc.BodyParser.Json.class)
public static index sayHello() {
JsonNode json = request().body().asJson();
ObjectNode result = Json.newObject();
String name = json.findPath("name").getTextValue();
if(name == null) {
result.put("status", "KO");
result.put("message", "Missing parameter [name]");
return badRequest(result);
} else {
result.put("status", "OK");
result.put("message", "Hello " + name);
return ok(result);
}
}
Note the explicit calling of the right Json class in #BodyParser
I'm not sure if this is a bug or not? But this is the only way I could get the example to work.
Import those two
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.node.ObjectNode;
According to this documentation: http://fasterxml.github.io/jackson-databind/javadoc/2.0.0/com/fasterxml/jackson/databind/node/ObjectNode.html
Try this:
import play.*;
import play.mvc.*;
import org.codehaus.jackson.JsonNode; //Fixing "error: cannot find symbol" for JsonNode
// Testing JSON
#BodyParser.Of(BodyParser.Json.class) //Or you can import play.mvc.BodyParser.Json
public static Result sayHello() {
JsonNode json = request().body().asJson();
String name = json.findPath("name").getTextValue();
if(name==null) {
return badRequest("Missing parameter [name]");
} else {
return ok("Hello " + name);
}
}
AFAIK, the code you are using has not reached any official Play version (neither 2.0 or 2.0.1) according to this: https://github.com/playframework/Play20/pull/212
Instead, you can do this (not tested):
if(request().getHeader(play.mvc.Http.HeaderNames.ACCEPT).equalsIgnoreCase("application/json")) {
Did you try checking out the documentation for it?
Serving a JSON response looks like:
#BodyParser.Of(Json.class)
public static index sayHello() {
JsonNode json = request().body().asJson();
ObjectNode result = Json.newObject();
String name = json.findPath("name").getTextValue();
if(name == null) {
result.put("status", "KO");
result.put("message", "Missing parameter [name]");
return badRequest(result);
} else {
result.put("status", "OK");
result.put("message", "Hello " + name);
return ok(result);
}
}
You have imported play.libs.Json and then use the BodyParser.Of annotation with this Json.class.
The above annotation expects a class which extends a play.mvc.BodyParser. So simply replace #BodyParser.Of(Json.class) by #BodyParser.Of(BodyParser.Json.class).