Apache NiFi: Changing Date and Time format in csv

Apache NiFi: Changing Date and Time format in csv - csv

I have a csv which contains a column with a date and time. I want to change the format of the date-time column. The first 3 rows of my csv looks like the following.
Dater,test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11
20011018182036,,,,,166366183,,,,,,
20191018182037,,27,94783564564,,162635463,817038655446,,,0,,
I want to change the csv to look like this.
Dater,test1,test2,test3,test4,test5,test6,test7,test8,test9,test10,test11
2001-10-18-18-20-36,,,,,166366183,,,,,,
2019-10-18-18-20-37,,27,94783564564,,162635463,817038655446,,,0,,
How is this possible?
I tried using the UpdateRecord Processor.
My properties look like this:
But this approach doesn't work since the data gets routed as a failure from the UpdateRecord Processor. Suggest me a method to complete the task.

I was able to accomplish this using the UpdateRecord Processor. The expression language I used is ${field.value:toDate('yyyyMMddHHmmss'):format('yyyy-MM-dd HH:mm:ss')}.
Just this didn't work since every time, the data was routed towards the failure path from the UpdateRecord Processor.
To fix this error I changed the configuration of the CSVRecordSetWriter. The Schema Access Strategy must be changed to Use String Fields from Header. This is by default Use Schema Name Property

Strategy: use UpdateRecord to manipulate the timestamp value using expression language:
${field.value:toDate():format('ddMMyyyy')}
Flow:
GenerateFlowFile:
UpdateRecord:
Setup reader and writer to inherit schema. Include header line. Leave other properties untouched.
Result:
However this solution might not satisfy you because of a strange problem. When you format the date like that:
${field.value:toDate():format('dd-MM-yyyy')}
ConvertRecord routes to the failure relationship:
Type coercion does not work properly. Maybe it is a bug. I could not find a solution for this problem.

Related

Python: Dump JSON Data Following Custom Format

I'm working on some Python code for my local billiard hall and I'm running into problems with JSON encoding. When I dump my data into a file I obviously get all the data in a single line. However, I want my data to be dumped into the file following the format that I want. For example (Had to do picture to get point across),
My custom JSON format
. I've looked up questions on custom JSONEncoders but it seems they all have to do with datatypes that aren't JSON serializable. I never found a solution for my specific need which is having everything laid out in the manner that I want. Basically, I want all of the list elements to on a separate row but all of the dict items to be in the same row. Do I need to write my own custom encoder or is there some other approach I need to take? Thanks!

Solr CSV responses writer: How to get no encapsulator

I need to get back CSV output from my Solr queries, so I am using Solr's CSV responses writer.
All works fine using wt=csv without changing default values for CSV output, but I have one requirement: I need tab-separated CSV with no text value quoting at all.
The tab-separation is easy as I can specify a tab as csv.separator in the Solr csv responses writer.
The problem is how to get rid of encapsulation:
The default values for encapsulation of csv fields is ".
But setting encapsulator='' or encapsulator=None returns the error Invalid encapsulator.
There seems to be no documentation for this in the Solr Wiki.
How can I suppress encapsulation at all?

You are not going to be able to, the java source expects a 1 char length encapsulator:
String encapsulator = params.get(CSV_ENCAPSULATOR);
String escape = params.get(CSV_ESCAPE);
if (encapsulator!=null) {
if (encapsulator.length()!=1) throw new SolrException( SolrException.ErrorCode.BAD_REQUEST,"Invalid encapsulator:'"+encapsulator+"'");
strat.setEncapsulator(encapsulator.charAt(0));
}
What you can do:
Write your own custom NoEncapsulatorCSVResponseWriter, by inheriting from CSVResponseWriter probably, and modify the code so it does not use the encapsulator. Not difficult, but mostly a hassle.
Use some unique encapsulator (for example ø) and then add a postprocess step on your client side that just removes it. Easier but you need that extra step...

Passing a path as a parameter in Pentaho

In a Job I am checking if the file that I want to read is available or not. If this csv exists I want to read the data and save them in a database table within a transformation.
This is what I have done so far:
1) I have create the job, 2) I have implemented some parameters, one of them with the path for the file, 3) I have indicated that I am going to pass this value to the transformation.
Now, the thing is, I am sure this is should be something very simple to implement, but even when I have follow some blogs, I have not succeeded with this part of the process. I've tried to follow this example:
http://diethardsteiner.blogspot.com.co/2011/03/pentaho-data-integration-scheduling-and.html
My question remains the same. How can I indicate to the transformation that it has to use the parameter that I am given him from the job?

You just mixed up the columns
Parameter should be the name of the parameter in the transformation you are running.
Value is the value you are passing.
Since you are passing a variable, and not a constant value you use the ${} syntax to indicate this.

Non-technical terms on Elasticsearch, Logstash and Kibana

I have a doubt. I do know that Logstash allows us to input csv/log files and filter it using separators and columns. And it will output into elasticsearch for it to be used by Kibana. However, after writing the conf file, do I need to specify index pattern by using the command:
CURL -XPUT 'http://localhost:5601/test' d
Because I do know that when you have a JSON file, you will have to define the mapping etc. Do I need to do this step for csv files and other non json files? Sorry for asking, I need to clear my doubt.

When you insert documents into a new elasticsearch index, a mapping is created for you. This may not be a good thing, as it's based on the initial value of each field. Imagine a field that normally contains a string, but the initial document contains an integer - now your mapping is wrong. This is a good case for creating a mapping.
If you insert documents through logstash into an index named logstash-YYYY-MM-DD (the default), logstash will apply its own mapping. It will use any pattern hints you gave it in grok{}, e.g.:
%{NUMBER:bytes:int}
and it will also make a "raw" (not analyzed) version of each string, which you can access as myField.raw. This may also not be what you want, but you can make your own mapping and provide it as an argument in the elasticsearch{} output stanza.
You can also make templates, which elasticsearch will apply when an index pattern matches the template definition.
So, you only need to create a mapping if you don't like the default behaviors of elasticsearch or logstash.
Hope that helps.

Gemfire pdxInstance datatype

I am writing pdxInstances to GemFire using the sequence: rabbitmq => springxd => gemfire.
If I put this JSON into rabbitmq {'ID':11,'value':5}, value appears as a byte value in GemFire. If I put {'ID':11,'value':500}, value appears as a word and if I put {'ID':11,'value':50000} it appears as an Integer.
A problem arises when I query data from GemFire and order them. For example, if I use a query such as select * from /my_region order by value it fails, saying it cannot compare a byte with a word (or byte with an integer).
Is there any way to declare the data type in JSON? Or any other method to get rid of this problem?

To add a bit of insight into this problem... in reviewing GemFire/Geode source code, it would seem it is not possible to configure the desired value type and override GemFire/Geode's default behavior, which can be seen in JSONFormatter.setNumberField(..).
I will not explain how GemFire/Geode involves the JSONFormatter during a Region.put(key, value) operation as it is rather involved and beyond the scope of this discussion.
However, one could argue that the problem is not necessarily with the JSONFormatter class, since storing a numeric value in a byte is more efficient than storing the value in an integer, especially when the value would indeed fit into a byte. Therefore, the problem is really that the Comparator used in the Query processor should be able to compare numeric values in the same type family (byte, short, int, long), upcasting where appropriate.
If you feel so inclined, feel free to file a JIRA ticket in the Apache Geode JIRA repository at https://issues.apache.org/jira/browse/GEODE-72?jql=project%20%3D%20GEODE
Note, Apache Geode is the open source "core" of Pivotal GemFire now. See the Apache Geode website for more details.
Cheers!

Your best bet would be to take care of this with a custom module or a groovy script. You can either write a custom module in Java to do the conversion and then upload the custom module into SpringXD, then you could reference your custom module like any other processor. Or you could write a script in Groovy and pass the incoming data through a transform processor.
http://docs.spring.io/spring-xd/docs/current/reference/html/#processors
The actual conversion probably won't be too tricky, but will vary depending on which method you use. The stream creation would look something like this when you're done.
stream create --name myRabbitStream --definition "rabbit | my-custom-module | gemfire-json-server etc....."
stream create --name myRabbitStream --definition "rabbit | transform --script=file:/transform.groovy | gemfire-json-server etc...."
It seems like you have your source and sink modules set up just fine, so all you need to do is get your processor module setup to do the conversion and you should be all set.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Apache NiFi: Changing Date and Time format in csv - csv

Related

Python: Dump JSON Data Following Custom Format

Solr CSV responses writer: How to get no encapsulator

Passing a path as a parameter in Pentaho

Non-technical terms on Elasticsearch, Logstash and Kibana

Gemfire pdxInstance datatype

Categories

Resources