How to use JSON.parse with bucket_script? - json

I have a data field saved with JSON string and I need to count the average value of price in "{price: 10}", how do I use JSON.parse with bucket_script to compute this with elastic search?

There is no JSON parsing class in painless So you cannot perform this while querying. You should parse JSOn while indexing this will make your search queries faster.
Ingest
You can use JSON Precessor
{
"json" : {
"field" : "string_source",
"target_field" : "json_target"
}
}
Pipeline
PUT _ingest/pipeline/my-pipeline
{
"description": "describe pipeline",
"processors": [
{
"json": {
"field": "string_source",
"target_field": "json_target"
}
}
]
}
Index document using ingest pipeline
POST json_index/_doc?pipeline=my-pipeline
{
"string_source":"{\"price\":10}"
}
Document
"hits" : [
{
"_index" : "json_index",
"_type" : "_doc",
"_id" : "m7t3gXEB1B5aJp__0oos",
"_score" : 1.0,
"_source" : {
"json_target" : {
"price" : 10
},
"string_source" : """{"price":10}"""
}
}
]
If you don't want to keep original string in index you can use
PUT _ingest/pipeline/my-pipeline
{
"description": "describe pipeline",
"processors": [
{
"json": {
"field": "string_source",
"target_field": "json_target"
},
"remove": {
"field": "string_source"
}
}
]
}
2. Logstash
This is a JSON parsing filter. It takes an existing field which contains JSON and expands it into an actual data structure within the Logstash event.
filter {
json {
source => "message"
}
}

Related

Method to assign object IDs to imported JSON in Firebase

Firebase is organizing an imported JSON file in the following way:
But the imported file (and exported file from Firebase) is organized this way:
{
"features" : [ {
"geometry" : {
"coordinates" : [ -77.347191, 36.269321 ],
"type" : "Point"
},
"properties" : {
"name" : "Branch Chapel",
"osm_id" : "262661",
"religion" : "christian"
},
"type" : "Feature"
},
...
It appears that Firebase assigns an internal number to each object in the array of "features". This is nice, but it makes it hard to reference each object without knowing how Firebase is naming it- and I have 400k+ objects.
Is there a way to assign an id to each object to prevent Firebase from generating its own? Or is there a way to programmatically rename/reorganize the data after it's been imported? The optimal outcome would have the object named by its osm_id, rather than some arbitrary number Firebase assigns.
Any help is appreciated.
get rid of the square brackets and replace with squiggley brackets
this
{
"flags": {
"1": {
"information": "blah",
},
"2": {
"information": "It is great!",
},
"3": {
"information": "Amazing!",
}
}
}
not this
[
{
"1": {
"information": "blah",
}
},
{
"2": {
"information": "It is great!",
}
},
{
"3": {
"information": "Amazing!",
}
}
]

How to create an index with integer fields in Elasticsearch for the JSON file of format?

I am trying to create an index in Elasticsearch for the JSON file of format:
{ "index" : { "_index" : "entity", "_type" : "type1", "_id" : "0" } }
{ "eid":"guid of Event autogenerated", "entityInfo": { "entityType":"qualityevent", "defaultLocale":"en-US" }, "systemInfo": { "tenantId":"67" }, "attributesInfo" : { "jobId":"21", "matchStatus": "new" } }
{ "index" : { "_index" : "entity", "_type" : "type1", "_id" : "1" } }
{ "eid":"guid of Event autogenerated", "entityInfo": { "entityType":"qualityevent", "defaultLocale":"en-US" }, "systemInfo": { "tenantId":"67" }, "attributesInfo" : { "jobId":"20", "matchStatus": "existing" } }
I want the fields jobId and tenantId to be integers.
I am giving the following mapping in curl command:
curl -XPUT http://localhost:9200/entity -d '
{
"mappings": {
"entityInfo":
{
"properties" : {
"entityType" : { "type":"string","index" : "not_analyzed"},
"defaultLocale":{ "type":"string","index" : "not_analyzed"}
}
},
"systemInfo":
{
"properties" : {
"tenantId": { "type" : "integer" }
}
},
"attributesInfo" :
{
"properties" : {
"jobId": { "type" : "integer" },
"matchStatus": { "type":"string","index" : "not_analyzed"}
}
}
}
}
';
This does not give me an error. However, it creates new empty fields jobId and tenantId as integers and it keeps the existing data into attributesInfo.jobId as string. Same is the case with systemInfo.tenantId. I want to use these two fields in Kibana for visualization. I currently cannot use them as they are empty.
I am new to Kibana and Elasticsearch so I am not sure if the mapping is correct.
I have tried couple of other mappings as well but they give errors. Above mapping does not give error.
This is how Discover Tab on Kibana looks like: 1
Please let me know where I am going wrong.
I tried as you mentioned but it didn't help. What I realised after a lot of trial and error that my mapping was incorrect. I finally wrote the correct mapping and now it works correctly. Jobid and TenantId are recognised as numbers by Kibana. I am new to JSON, kibana, Bulk, Elastic so it took time to understand how mapping works.

How to fetch the data in the mongodb

How to fetch the data from the json file using mongoshell
I want to fecch the Data by policyID
Say in the json file I sent the PolicyID is 3148
I tried could of ways to write the command but say 0 rows fetched.
db.GeneralLiability.find({"properties.id":"21281"})
db.GeneralLiability.find({properties:{_id:"21281"}})
Do i need to set any thing else?index,cursors etc?
Sample json
{
"session": {
"data": {
"account": {
"properties": {
"userName": "abc.com",
"_dateModified": "2014-10-01",
"_manuscript": "Carrier_New_Rules_2_1_0",
"_engineVersion": "2.0.0",
"_cultureCode": "en-US",
"_cultureName": "United States [english]",
"_context": "Underwriter",
"_caption": "Carrier New Rules (2.1.0)",
"_id": "p1CEB08012E51477C9CD0E89FE77F5E51"
},
"properties": {
"_xmlns:xsd": "http://www.w3.org/2001/XMLSchema",
"_xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
"_id": "3148",
"_HistoryID": "5922",
"_Type": "onset",
"_Datestamp": "2014-10-01T04:46:33",
"_TransactionType": "New",
"_EffectiveDate": "2014-01-01",
"_Charge": "1599",
"_TransactionGroup": "t4CE4FA751F9C400D9007E692A883DA66",
"_PolicyID": "3148",
"_Index": "1",
"_Count": "1",
"_Sequence": "1"
}
}
}
This will return the document with _PolicyID = "3148":
db.GeneralLiability.find({
"session._PolicyID": "3148"
}).pretty();
You have some issues in your document formatting. First off I am pretty sure that using underscores are reserved for mongo (I could be wrong). Either way it is bad form. I have restructured your data for you. I am not sure why you wanted to nest your data so much, but I am guessing you had a good reason for it.
You will notice that I am using the ObjectID from Mongo for my _id:
{
"_id" : ObjectId("56e1c1f53bac31a328e3682b"),
"session" : {
"data" : {
"account" : {
"properties" : {
"xmlns:xsd" : "http://www.w3.org/2001/XMLSchema",
"xmlns:xsi" : "http://www.w3.org/2001/XMLSchema-instance",
"HistoryID" : "5922",
"Type" : "onset",
"Datestamp" : "2014-10-01T04:46:33",
"TransactionType" : "New",
"EffectiveDate" : "2014-01-01",
"Charge" : "1599",
"TransactionGroup" : "t4CE4FA751F9C400D9007E692A883DA66",
"PolicyID" : "3148",
"Index" : "1",
"Count" : "1",
"Sequence" : "1"
}
}
}
}
}
Now if you run this command it will return your document:
{ "session.data.account.properties.PolicyID": "3148" }

Nested filter numerical range

I have the following json object:
{
"Title": "Terminator,
"Purchases": [
{"Country": "US", "Site": "iTunes", "Price": 4.99},
{"Country": "FR", "Site": "Google", "Price": 5.99}
]
}
I want to be able to find an object specifying a Country+Site+PriceRange. For example, the above should return True on Country=US&Price<5.00, but should return False on Country=FR&Price<5.00. How would the index and query look to do this? Here is another answer that this is a follow-up question to: Search within array object.
Simply add a Range query to your Bool query logic tree. This will return documents that match US for country and have the Price field with a numeric value less than 5.
{ "query":
{ "nested" : {
"path" : "Purchases",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"match" : {"Purchases.Country" : "US"}
},
{
"range" : "Purchases.Price":
{
"lte": 5
}
}
]
}
}
}
}
}

Sub-records in Avro with Morphlines

I'm trying to convert JSON into Avro using the kite-sdk morphline module. After playing around I'm able to convert the JSON into Avro using a simple schema (no complex data types).
Then I took it one step further and modified the Avro schema as displayed below (subrec.avsc). As you can see the schema consist of a subrecord.
As soon as I tried to convert the JSON to Avro using the morphlines.conf and the subrec.avsc it failed.
Somehow the JSON paths "/record_type[]/alert/action" are not translated by the toAvro function.
The morphlines.conf
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
# Read the JSON blob
{ readJson: {} }
{ logError { format : "record: {}", args : ["#{}"] } }
# Extract JSON
{ extractJsonPaths { flatten: false, paths: {
"/record_type[]/alert/action" : /alert/action,
"/record_type[]/alert/signature_id" : /alert/signature_id,
"/record_type[]/alert/signature" : /alert/signature,
"/record_type[]/alert/category" : /alert/category,
"/record_type[]/alert/severity" : /alert/severity
} } }
{ logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } }
{ extractJsonPaths { flatten: false, paths: {
timestamp : /timestamp,
event_type : /event_type,
source_ip : /src_ip,
source_port : /src_port,
destination_ip : /dest_ip,
destination_port : /dest_port,
protocol : /proto,
} } }
# Create Avro according to schema
{ logError { format : "WE GO TO AVRO"} }
{ toAvro { schemaFile : /etc/flume/conf/conf.empty/subrec.avsc } }
# Create Avro container
{ logError { format : "WE GO TO BINARY"} }
{ writeAvroToByteArray { format: containerlessBinary } }
{ logError { format : "DONE!!!"} }
]
}
]
And the subrec.avsc
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "event_type",
"type" : "string"
}, {
"name" : "source_ip",
"type" : "string"
}, {
"name" : "source_port",
"type" : "int"
}, {
"name" : "destination_ip",
"type" : "string"
}, {
"name" : "destination_port",
"type" : "int"
}, {
"name" : "protocol",
"type" : "string"
}, {
"name": "record_type",
"type" : ["null", {
"name" : "alert",
"type" : "record",
"fields" : [ {
"name" : "action",
"type" : "string"
}, {
"name" : "signature_id",
"type" : "int"
}, {
"name" : "signature",
"type" : "string"
}, {
"name" : "category",
"type" : "string"
}, {
"name" : "severity",
"type" : "int"
}
] } ]
} ]
}
The output on { logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } } I output the following:
[{
/record_type[]/alert / action = [allowed],
/record_type[]/alert / category = [],
/record_type[]/alert / severity = [3],
/record_type[]/alert / signature = [GeoIP from NL,
Netherlands],
/record_type[]/alert / signature_id = [88006],
_attachment_body = [{
"timestamp": "2015-03-23T07:42:01.303046",
"event_type": "alert",
"src_ip": "1.1.1.1",
"src_port": 18192,
"dest_ip": "46.231.41.166",
"dest_port": 62004,
"proto": "TCP",
"alert": {
"action": "allowed",
"gid": "1",
"signature_id": "88006",
"rev": "1",
"signature" : "GeoIP from NL, Netherlands ",
"category" : ""
"severity" : "3"
}
}],
_attachment_mimetype=[json/java + memory],
basename = [simple_eve.json]
}]
UPDATE 2017-06-22
you MUST populate the data in the structure in order for this to work, by using addValues or setValues
{
addValues {
micDefaultHeader : [
{
eventTimestampString : "2017-06-22 18:18:36"
}
]
}
}
after debugging the sources of morphline toAvro, it appears that the record is the first object to be evaluated, no matter what you put in your mappings structure.
the solution is quite simple, but unfortunately took a little extra time, eclipse, running the flume agent in debug mode, cloning the source code and lots of coffee.
here it goes.
my schema:
{
"type" : "record",
"name" : "co_lowbalance_event",
"namespace" : "co.tigo.billing.cboss.lowBalance",
"fields" : [ {
"name" : "dummyValue",
"type" : "string",
"default" : "dummy"
}, {
"name" : "micDefaultHeader",
"type" : {
"type" : "record",
"name" : "mic_default_header_v_1_0",
"namespace" : "com.millicom.schemas.root.struct",
"doc" : "standard millicom header definition",
"fields" : [ {
"name" : "eventTimestampString",
"type" : "string",
"default" : "12345678910"
} ]
}
} ]
}
morphlines file:
morphlines : [
{
id : convertJsonToAvro
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson {
outputClass : java.util.Map
}
}
{
addValues {
micDefaultHeader : [{}]
}
}
{
logDebug { format : "my record: {}", args : ["#{}"] }
}
{
toAvro {
schemaFile : /home/asarubbi/Development/test/co_lowbalance_event.avsc
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
}
}
{
writeAvroToByteArray {
format : containerlessJSON
codec : null
}
}
]
}
]
the magic lies here:
{
addValues {
micDefaultHeader : [{}]
}
}
and in the mappings:
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
explanation:
inside the code the first field name that is evaluated is micDefaultHeader of type RECORD. as there's no way to specify a default value for a RECORD (logically correct), the toAvro code evaluates this, does not get any value configured in mappings and therefore it fails at it detects (wrongly) that the record is empty when it shouldn't.
however, taking a look at the code, you may see that it requires a Map object, containing no values to please the parser and continue to the next element.
so we add a map object using the addValues and fill it with an empty map [{}]. notice that this must match the name of the record that is causing you an empty value. in my case "micDefaultHeader"
feel free to comment if you have a better solution, as this looks like a "dirty fix"