JSON logfile into Solr via Apache nifi - json

I'm tring to read a json log file and insert into solr collection using apache nifi.logfile is in following format(one json object perline)
{"#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
{"#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
{ "#timestamp": "2017-02-18T02:16:50.496+04:00","message": "hello"}
I was able to load the file and split by lines using different processes. How can i proceed further ?

You can use the PutSolrContentStream processor to write content to Solr from Apache NiFi. If each flowfile contains a single JSON record (and you should ensure you are splitting the JSON correctly even if it covers multiple lines, so examine SplitJSON vs. SplitText), each will be written to Solr as a different document. You can also use MergeContent to write in batches and be more efficient.
Bryan Bende wrote a good article on the Apache site on how to use this processor.

Related

Importing Well-Structured JSON Data into ElasticSearch via Cloud Watch

Is is there known science for getting JSON data logged via Cloud Watch imported into an Elasticsearch instance as well structured JSON?
That is -- I'm logging JSON data during the execution of an Amazon Lambda function.
This data is available via Amazon's Cloud Watch service.
I've been able to import this data into an elastic search instance using functionbeat, but the data comes in as an unstructured message.
"_source" : {
"#timestamp" : "xxx",
"owner" : "xxx",
"message_type" : "DATA_MESSAGE",
"cloud" : {
"provider" : "aws"
},
"message" : ""xxx xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx INFO {
foo: true,
duration_us: 19418,
bar: 'BAZ',
duration_ms: 19
}
""",
What I'm trying to do is get a document indexed into elastic that has a foo field, duration_us field, bar field, etc. Instead of one that has a plain text message field.
It seems like there are a few different ways to do this, but I'm wondering if there's a well trod path for this sort of thing using elastic's default tooling, or if I'm doomed to one more one-off hack.
Functionbeat is a good starting point and will allow you to keep it as "serverless" as possible.
To process the JSON, you can use the decode_json_fields processor.
The problem is that your message isn't really JSON though. Possible solutions I could think of:
A dissect processor that extracts the JSON message to pass it on to the decode_json_fields — both in the Functionbeat. I'm wondering if trim_chars couldn't be abused for that — trim any possible characters except for curly braces.
If that is not enough, you could do all the processing in Elasticsearch's Ingest pipeline where you probably stitch this together with a Grok processor and then the JSON processor.
Only log a JSON message if you can to make your life simpler; potentially move the log level into the JSON structure.

Does nifi supports Transfering single Flow file to multiple Relationships in a Processor?

I am exploring options for Transferring single Flowfile to two or more Relationships in my custom processor,i didn't found any help in documentation for this, Does this feature supported by NIFI?
Example Code:
session.transfer(flowFile, REL_SUCCESS_1);
session.transfer(flowFile, REL_SUCCESS_2);
you can use session.clone() method:
FlowFile flowFile2 = session.clone(flowFile);
session.transfer(flowFile, REL_SUCCESS_1);
session.transfer(flowFile2, REL_SUCCESS_2);
to create a full clone of the flow file.
the content and attributes will be the same...

How to send a .csv file from Local Machine(Getfile) to Hive(Puthiveql) in Apache Nifi using CURL?

I want to send .csv file or mysql table from Local Machine(GetFile) to Hive(PutHiveql) in Apache Nifi using CURL.Please let me know if there is any command to do this Using Curl.
The question doesn't make sense as formed. If you want to ingest the content of a CSV file into Apache NiFi, route and transform it, and eventually write it to a Hive table, your flow would be as follows:
GetFile -> ConvertRecord (CSVReader to AvroRecordSetWriter) -> [Optional processors] -> PutHiveStreaming
PutHiveStreaming expects the incoming flowfile content to be in Avro format, so the ConvertRecord processor will translate the ingested data into the correct syntax.
I am unsure of how cURL fits into this question at all. NiFi does provide the InvokeHTTP processor to allow arbitrary outgoing HTTP requests, as well as the ExecuteStreamCommand processor to invoke arbitrary command-line activity, including cURL. I don't know why you would need to invoke either in this flow. If you are asking how you could trigger the entire flow via an external cURL command, NiFi provides both ListenHTTP and HandleHTTPRequest processors which start local web servers and listen for incoming HTTP requests. You can connect these processors to a pair of Wait/Notify processors to control the flow of the ingested file data, as GetFile is a source processor, and does not allow incoming flowfiles to trigger it.

Importing a json file into Cassandra

Hi is it possible to import any random json file into cassandra.
The json file is not exported from sstable2json. The json file is from a different website and needs to be imported into cassandra. Please could anyone advise whether this is possible
JSON support won't be introduced until Cassandra 3.0 (see CASSANDRA-7970) and in this case you still need to define a schema for your json data to map to. You do have some other options:
Use maps which sort of map to JSON. Maps can be indexed as of Cassandra 2.1 (CASSANDRA-4511) There is also a good Stack Exchange post about this.
You mention 'any random json file'. You could just have a string column that contains the raw JSON, but then you lose any query-ability of that data.
Come up with some kind of schema for your JSON data and map it to a CQL table and write some code that parses the JSON and writes it to the CQL table mapping to that data. This doesn't sound like an option for you since you want to be able to import any random JSON file.
If you are looking to only do json document storage, you might want to look at more document-oriented solutions instead of a column-oriented solution like cassandra.

Bulk loading JSON object as document into elasticsearch

Is there a way to bulk load the data below into elasticsearch without modifying the original content? I POST each object to be a single document. At the moment I'm using Python to parse through individual objects and POST them one at a time.
{
{"name": "A"},
{"name": "B"},
{"name": "C"},
{"name": "D"},
}
Doing this type of processing in production from REST servers into elasticsearch is taking a lot of time.
Is there a single POST/curl command that can upload the file above at once and elasticsearch parses it and makes each object into its own document?
We're using elasticsearch 1.3.2
Yes, you can do bulk api via curl by using the _bulk endpoint. But not custom parsing. Whatever process that creates the file can format it to ES specification if that is an option. See here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
There is also bulk support in python via helper. See here:
http://elasticsearch-py.readthedocs.org/en/master/helpers.html