Importing CSV File in Elasticsearch

Importing CSV File in Elasticsearch - csv

I am new to elasticsearch. Trying to import CSV file by following the guide
And successfully imported the file and it has created an index with the documents also.
But what I found that in every docs _id contains random unique id as a value. I want to have value of _id from the CSV file field (the CSV file which I'm importing contains a field with unique id for every row) using query or any other ways. And I do not know how to do that.
Even in docs it is not explained. A sample example of document of elasticsearch index is shown below
{
"_index" : "sample_index",
"_type" : "_doc",
"_id" : "nGHXgngBpB_Kjkqcxfj",
"_score" : 1.0,
"_source" : {
"categoryid" : "34128b58-9148-11eb-a8b3-0242ac130003",
"categoryname" : "Blogs",
"isdeleted" : "False"
}
while adding ingest pipeline with the following query
{
"set": {
"field": "_id",
"value": "{{categoryid}}"
}
}
it throwing an error with this message

You can achieve this by modifying the ingest pipeline used to ingest your CSV file.
In the Ingest pipeline area (Advanced section), simply add the following processor at the end of the pipeline and the document ID will be set accordingly:
...
{
"set": {
"field": "_id",
"value": "{{categoryid}}"
}
}
It should look like this:

Added following processor in ingest pipeline section and it works..
{
"processors": [
{
"set": {
"field": "_id",
"value": "{{categoryid}}"
}
}
]
}

Related

How to configure the path in LookupRecord NiFi

I have a Json file in my database, the structure is like this:
{
"employeeId": "ref-123",
"name": "Danie",
"manager": {
"employeeId": "ref-456",
"name": "John"
}
}
I want to do a lookup in the lookupRecord to let it find the manager name based on the manager Id. Here is the configuration of the lookupRecord.
I added a customer property which is employeeId, and the path is /manager/employeeId. But I cannot get the the name (which is John).
So how can I configure the path, because the value I want to find is embedded in the "manager".

Assuming you are using mongoDB, for the lookupService.
I reproduced your flow:
generateFlowFile : { "manager": { "employeeId": "ref-123" } }
LookupRecord
ResultRecord Path, where you want to put the lookup result in your flowfile.
employeeid, what to look for in your flowFile and send it to the lookup service.
On mongodb side, I have the following document :
{ "_id" : { "$oid" : "5fb6932c01e6ef0027e0af5b" },
"employeeId" : "ref-123",
"name" : "Danie",
"manager" :
{ "employeeId" : "ref-456", "name" : "John" } }
Lookup service is configured as the following (if you use mongodb):
Finally I have this output :
[{"manager":{"employeeId":"ref-123","name":"Danie"}}]
Tell me if it helps you.

Reading JSON data from BLOB

Edit - oracle version 19c
I am uploading a json file using Browse file type in APEX and then storing it in a table as BLOB.
The Table looks like this -
File_ID Filename Mime_type created_on blob_content
1 file_new.json application/json 9/1/2020 (BLOB)
Now i want to parse this and read the contents of blob as a table in oracle. How can i do it?
The Json file looks like this but has hundred of rows.
[{"Id":"50021","eName":"random123", "Type":"static","Startdate":"07/03/2020","Enddate":"08/02/2020,"nominations":[{"nominationId":"152","nominationMaxCount":7500,"offer":[{"Id":"131","Type":"MONEY","clientId":41,
"stateExclusions":[],"divisionInclusions":["111","116","126","129"]]}]

Step One - add an IS JSON check constraint to your BLOB_CONTENT column.
ALTER TABLE CLOBS
ADD CONSTRAINT CLOB_JSON CHECK
(CLOBS IS JSON)
ENABLE; -- yes my table name and my column are both named CLOBS
Step Two - Add some data.
The database provides native SQL calls to parse/query JSON content in your BLOB.
My data, a single row. This JSON document has a couple of simple arrays.
{
"results" : [
{
"columns" : [
{
"name" : "REGION_ID",
"type" : "NUMBER"
},
{
"name" : "REGION_NAME",
"type" : "VARCHAR2"
}
],
"items" : [
{
"region_id" : 1,
"region_name" : "Europe"
},
{
"region_id" : 2,
"region_name" : "Americas"
},
{
"region_id" : 3,
"region_name" : "Asia"
},
{
"region_id" : 4,
"region_name" : "Middle East and Africa"
}
]
}
]
}
I can use the jsonv_value() function if I want to pull a single attribute out, and I can reference those using $. notation. I reference arrays as you'd expect.
select json_value(CLOBS,'$.results.columns[0].name') FIRST_COLUMN,
json_value(CLOBS,'$.results.columns[1].name') SECOND_COLUMN
from CLOBS
where ID = 1;
The results -
Our product architect (Beda) has a great blog series with much better examples than this.

Jolt Transformation to add new index json record before each json record

Input JSON
[
{
"timestamp":"2020-01-28 12:13:43,561",
"threadno":"5",
"loglevel":"DEBUG",
"class":"someclassname",
"nanoseconds":"587800052",
"message":null,
"stackTrace":null
},
{
"timestamp":"2020-01-28 12:33:57,328",
"threadno":"12",
"loglevel":"DEBUG",
"class":"someclassname",
"nanoseconds":"6419049968",
"message":null,
"stackTrace":null
}
]
Output JSON
[
{
"index":{
"_index":"test",
"_type":"doc",
"_id":"20200128121343561"
}
},
{
"timestamp":"2020-01-28 12:13:43,561",
"threadno":"5",
"loglevel":"DEBUG",
"class":"someclassname",
"nanoseconds":"587800052",
"message":null,
"stackTrace":null
},
{
"index":{
"_index":"test",
"_type":"doc",
"_id":"20200128123357328"
}
},
{
"timestamp":"2020-01-28 12:33:57,328",
"threadno":"12",
"loglevel":"DEBUG",
"class":"someclassname",
"nanoseconds":"6419049968",
"message":null,
"stackTrace":null
}
]
i need to add this index record before each json record { "index" : { "_index" : "test", "_type" : "doc", "_id" : "20200128121343561" } } and _id value is derived from timestamp. Can we also add new line after each json record using jolt transformation

The processing of your timestamp field to create an _id is a data transformation. According to the jolt docs, you would have to write this as custom Java code.
Currently, all the Stock transforms just effect the "structure" of the
data. To do data manipulation, you will need to write Java code. If
you write your Java "data manipulation" code to implement the
Transform interface, then you can insert your code in the transform
chain.
see https://github.com/bazaarvoice/jolt for details.

Is it possible to combine two Json with Ruby in MongoDB?

I have to insert documents in my Mongo database by using Ruby (not on Rails, on Notepad++), many documents have duplicates with some modifications.
I want to write a script which use a json file, read it, import it in MongoDB by checking if each documents do not have any duplicate in the database, and if there is a duplicate I want to combine it if it contains any additional information:
Such as :
Document 1
{ "Name" : "Lila",
"Files":
[
{ "Name": "File1", "Date" : "05-11-2017"},
{ "Name": "File2", "Date" : "26-03-2018"}
]
}
Document 2
{ "Name" : "Lila",
"Files":
[
{ "Name": "File3", "Date" : "26-03-2018"}
]
}
Combine them to have:
{ "Name" : "Lila",
"Files":
[
{ "Name": "File1", "Date" : "05-11-2017"},
{ "Name": "File2", "Date" : "26-03-2018"},
{ "Name": "File3", "Date" : "26-03-2018"}
]
}
I founded that it is possible in mongo shell thanks to the aggregation-accumulation $mergeObjects, but in Ruby it do not seems to exists.

You can use all the operators in ruby, too. You need to get the underlying collection object first.
require 'mongo'
db = Mongo::Connection.new.db("mydb")
coll = db.collection('posts')
coll.aggregate([
{"$project" => {"last_name" => 1, "first_name" => 1 }},
{"$match" => {"last_name" => "Jones"}}
])
This is an example pipeline. You can give the same aggregation pipeline that worked for you on the mongo shell to aggregate.
For more information, refer to the MongoDB Ruby driver documentation:
http://www.rubydoc.info/gems/mongo/1.8.2/Mongo%2FCollection%3Aaggregate

Create the structure of an empty collection in mongoDB

Is there any possibility to create the structure of an empty collection in MongoDB using mongoimport from a JSON file like the one below?
"Users" : {
"name" : "string",
"telephone" : {
"personal": { "type": "number" },
"job": { "type" : "number" }
},
"loc" : "array",
"friends" : "object"
}
My goal is to create a mongoDB schema from JSON files.

Yes, you can mongoimport a JSON file and if you clear out the values of those field (set them to ""), importing your JSON file should do just that.
However, MongoDB is a NoSQL database, and creating a schema in the MongoDB database doesn't really make sense. What will happen is that you'll have one record with fields whose values are empty.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Importing CSV File in Elasticsearch - csv

Added following processor in ingest pipeline section and it works.. { "processors": [ { "set": { "field": "_id", "value": "{{categoryid}}" } } ] }

Related

How to configure the path in LookupRecord NiFi

Reading JSON data from BLOB

Jolt Transformation to add new index json record before each json record

Is it possible to combine two Json with Ruby in MongoDB?

Create the structure of an empty collection in mongoDB

Categories

Resources