Dynamic logstash json mapping in elasticsearch - json

I need to implement rest service that will take any typed data in request and put it to elasticsearch through logstash.
Spring controller receives request body:
public class CustomData {
private String component;
private Object data;
}
Data is any custom json from PUT request.
I try to utilize net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder
Logstash config is as follows:
input {
tcp {
port => 5000
codec => json
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "%{indexName}-%{+YYYY.MM.dd}"
}
}
As you can see there is a indexName parameter - I set it with MDC. But after message goes through logstash, elastic says there is no mapping for my object:
{
"appId1-2016.03.15" : {
"aliases" : { },
"mappings" : {
"logs" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"#version" : {
"type" : "long"
},
"host" : {
"type" : "string"
},
"port" : {
"type" : "long"
}
}
},
"e1" : {
"properties" : {
"#timestamp" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"#version" : {
"type" : "long"
},
"applicationId" : {
"type" : "string"
},
"host" : {
"type" : "string"
},
"level" : {
"type" : "string"
},
"level_value" : {
"type" : "long"
},
"logger_name" : {
"type" : "string"
},
"message" : {
"type" : "string"
},
"port" : {
"type" : "long"
},
"thread_name" : {
"type" : "string"
},
"type" : {
"type" : "string"
},
"userId" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"creation_date" : "1458053563829",
"number_of_shards" : "5",
"number_of_replicas" : "1",
"uuid" : "w5y7GPd-Sk65gdpBQ_IKow",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
There's only message with string type.
stdout from logstash is:
elasticsearch_1 | [2016-03-16 04:38:49,018][INFO ][cluster.metadata ] [Ripfire] [json_encoder-events-2016.03.16] create_mapping [ef]
elasticsearch_1 | [2016-03-16 04:38:49,108][INFO ][cluster.metadata ] [Ripfire] [json_encoder-events-2016.03.16] update_mapping [logs]
elasticsearch_1 | [2016-03-16 04:38:49,174][INFO ][cluster.metadata ] [Ripfire] [json_encoder-events-2016.03.16] update_mapping [ef]
logstash_1 | 2016-03-16T04:38:48.197Z 10.27.13.228 Initializing Spring FrameworkServlet 'dispatcherServlet'
logstash_1 | 2016-03-16T04:38:48.197Z 10.27.13.228 FrameworkServlet 'dispatcherServlet': initialization started
logstash_1 | 2016-03-16T04:38:48.219Z 10.27.13.228 FrameworkServlet 'dispatcherServlet': initialization completed in 22 ms
logstash_1 | 2016-03-16T04:38:48.412Z 10.27.13.228 {"field1":"value","field2":40000}
logstash_1 | 2016-03-16T04:38:48.423Z 10.27.13.228 {"field1":"value","field2":40000}
logstash_1 | 2016-03-16T04:38:48.457Z 10.27.13.228
logstash_1 | request payload={
logstash_1 | "type": "ef",
logstash_1 | "userId": "ASD",
logstash_1 | "data": {
logstash_1 | "field1": "value",
logstash_1 | "field2": 40000
logstash_1 | }
logstash_1 | }
logstash_1 |
logstash_1 | response payload=null
Is there any way to get data field mapped according to it's structure? Thank you so much for suggestions...

Related

Nifi: Get values form a Json by a Jsonpath, those were defined in other Json

I have two Json:
Json1:
{
"level_1": [
{
"level_2_1": [
{
"key_2_1": "value_1",
"key_2_2": "value_2",
},
{
"key_2_1": "value_1",
"key_2_3": "value_3",
},
{
"key_2_1": "value_1",
"key_2_4": "value_4",
}
],
"level_2_2": {
"key_2_2_1": "2022-08-30T06:57:31.331Z",
"key_2_2_2": "2022"
}
}
]
}
Json2
{
"level_1" : {
"level_2_1" : {
"value" : "default value",
"type" : "String"
},
"level_2_2" : {
"value" : "level_1[0].level_2_1[0].key_2_2", // this Jsonpath of Json 1
"type" : "String"
},
"level_2_3" : {
"value" : "level_1[0].level_2_2.key_2_2_1", // this Jsonpath of Json 1
"type" : "String"
}
}
I want to get the result like this (these Values come from json1)
{
"level_1" : {
"level_2_1" : {
"value" : "default value",
"type" : "String"
},
"level_2_2" : {
"value" : "value_1", // this Value of Json 1
"type" : "String"
},
"level_2_3" : {
"value" : "2022-08-30T06:57:31.331Z", // this Value of Json 1
"type" : "String"
}
}
please, can you give me some advice. Thank for your help!!

Parse and Map 2 Arrays with jq

I am working with a JSON file similar to the one below:
{ "Response" : {
"TimeUnit" : [ 1576126800000 ],
"metaData" : {
"errors" : [ ],
"notices" : [ "query served by:1"]
},
"stats" : {
"data" : [ {
"identifier" : {
"names" : [ "apiproxy", "response_status_code", "target_response_code", "target_ip" ],
"values" : [ "IO", "502", "502", "7.1.143.6" ]
},
"metric" : [ {
"env" : "dev",
"name" : "sum(message_count)",
"values" : [ 0.0]
} ]
} ]
} } }
My object is to display a mapping of the identifier and values like :
apiproxy=IO
response_status_code=502
target_response_code=502
target_ip=7.1.143.6
I have been able to parse both names and values with
.[].stats.data[] | (.identifier.names[]) and .[].stats.data[] | (.identifier.values[])
but I need help with the jq way to map the values.
The whole thing can be done in jq using the -r command-line option:
.[].stats.data[]
| [.identifier.names, .identifier.values]
| transpose[]
| "\(.[0])=\(.[1])"

jq: group and key by property

I have a list of objects that look like this:
[
{
"ip": "1.1.1.1",
"component": "name1"
},
{
"ip": "1.1.1.2",
"component": "name1"
},
{
"ip": "1.1.1.3",
"component": "name2"
},
{
"ip": "1.1.1.4",
"component": "name2"
}
]
Now I'd like to group and key that by the component and assign a list of ips to each of the components:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
},{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
I figured it out myself. I first group by .component and then just create new lists of ips that are indexed by the component of the first object of each group:
jq ' group_by(.component)[] | {(.[0].component): [.[] | .ip]}'
The accepted answer doesn't produce valid json, but:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
]
}
{
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
name1 as well as name2 are valid json objects, but the output as a whole isn't.
The following jq statement results in the desired output as specified in the question:
group_by(.component) | map({ key: (.[0].component), value: [.[] | .ip] }) | from_entries
Output:
{
"name1": [
"1.1.1.1",
"1.1.1.2"
],
"name2": [
"1.1.1.3",
"1.1.1.4"
]
}
Suggestions for simpler approaches are welcome.
If human readability is preferred over valid json, I'd suggest something like ...
jq -r 'group_by(.component)[] | "IPs for " + .[0].component + ": " + (map(.ip) | tostring)'
... which results in ...
IPs for name1: ["1.1.1.1","1.1.1.2"]
IPs for name2: ["1.1.1.3","1.1.1.4"]
As a further example of #replay's technique, after many failures using other methods, I finally built a filter that condenses this Wazuh report (excerpted for brevity):
{
"took" : 228,
"timed_out" : false,
"hits" : {
"total" : {
"value" : 2806,
"relation" : "eq"
},
"hits" : [
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
{
"_source" : {
"agent" : {
"name" : "100360xx"
},
"data" : {
"vulnerability" : {
"severity" : "High",
"package" : {
"condition" : "less than 78.0",
"name" : "Mozilla Firefox 68.11.0 ESR (x64 en-US)"
}
}
}
}
},
...
Here is the jq filter I use to provide an array of objects, each consisting of an agent name followed by an array of names of the agent's vulnerable packages:
jq ' .hits.hits |= unique_by(._source.agent.name, ._source.data.vulnerability.package.name) | .hits.hits | group_by(._source.agent.name)[] | { (.[0]._source.agent.name): [.[]._source.data.vulnerability.package | .name ]}'
Here is an excerpt of the output produced by the filter:
{
"100360xx": [
"Mozilla Firefox 68.11.0 ESR (x64 en-US)",
"VLC media player",
"Windows 10"
]
}
{
"WIN-KD5C4xxx": [
"Windows Server 2019"
]
}
{
"fridxxx": [
"java-1.8.0-openjdk",
"kernel",
"kernel-headers",
"kernel-tools",
"kernel-tools-libs",
"python-perf"
]
}
{
"mcd-xxx-xxx": [
"dbus",
"fribidi",
"gnupg2",
"graphite2",
...

Manipulating JSON messages from Kafka topic using Logstash filter

I am using Logstash 2.4 to read JSON messages from a Kafka topic and send them to an Elasticsearch Index.
The JSON format is as below --
{
"schema":
{
"type": "struct",
"fields": [
{
"type":"string",
"optional":false,
"field":"reloadID"
},
{
"type":"string",
"optional":false,
"field":"externalAccountID"
},
{
"type":"int64",
"optional":false,
"name":"org.apache.kafka.connect.data.Timestamp",
"version":1,
"field":"reloadDate"
},
{
"type":"int32",
"optional":false,
"field":"reloadAmount"
},
{
"type":"string",
"optional":true,
"field":"reloadChannel"
}
],
"optional":false,
"name":"reload"
},
"payload":
{
"reloadID":"328424295",
"externalAccountID":"9831200013",
"reloadDate":1446242463000,
"reloadAmount":240,
"reloadChannel":"C1"
}
}
Without any filter in my config file, the target documents from the ES index look like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfcyTU4SyCFNFP2z5-l",
"_score" : 1.0,
"_source" : {
"schema" : {
"type" : "struct",
"fields" : [ {
"type" : "string",
"optional" : false,
"field" : "reloadID"
}, {
"type" : "string",
"optional" : false,
"field" : "externalAccountID"
}, {
"type" : "int64",
"optional" : false,
"name" : "org.apache.kafka.connect.data.Timestamp",
"version" : 1,
"field" : "reloadDate"
}, {
"type" : "int32",
"optional" : false,
"field" : "reloadAmount"
}, {
"type" : "string",
"optional" : true,
"field" : "reloadChannel"
} ],
"optional" : false,
"name" : "reload"
},
"payload" : {
"reloadID" : "155559213",
"externalAccountID" : "9831200014",
"reloadDate" : 1449529746000,
"reloadAmount" : 140,
"reloadChannel" : "C1"
},
"#version" : "1",
"#timestamp" : "2016-10-19T11:56:09.973Z",
}
}
But, I want only the value part of the "payload" field to move to my ES index as the target JSON body. So I tried to use the 'mutate' filter in the config file as below --
input {
kafka {
zk_connect => "zksrv-1:2181,zksrv-2:2181,zksrv-4:2181"
group_id => "logstash"
topic_id => "reload"
consumer_threads => 3
}
}
filter {
mutate {
remove_field => [ "schema","#version","#timestamp" ]
}
}
output {
elasticsearch {
hosts => ["datanode-6:9200","datanode-2:9200"]
index => "kafka_reloads"
}
}
With this filter, the ES documents now look like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"payload" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
}
But actually It should be like below --
{
"_index" : "kafka_reloads",
"_type" : "logs",
"_id" : "AVfch0yhSyCFNFP2z59f",
"_score" : 1.0,
"_source" : {
"reloadID" : "850846698",
"externalAccountID" : "9831200013",
"reloadDate" : 1449356706000,
"reloadAmount" : 30,
"reloadChannel" : "C1"
}
}
Is there a way to do this? Can anyone help me on this?
I also tried the below filter --
filter {
json {
source => "payload"
}
}
But that is giving me errors like --
Error parsing json {:source=>"payload", :raw=>{"reloadID"=>"572584696", "externalAccountID"=>"9831200011", "reloadDate"=>1449093851000, "reloadAmount"=>180, "reloadChannel"=>"C1"}, :exception=>java.lang.ClassCastException: org.jruby.RubyHash cannot be cast to org.jruby.RubyIO, :level=>:warn}
Any help will be much appreciated.
Thanks
Gautam Ghosh
You can achieve what you want using the following ruby filter:
ruby {
code => "
event.to_hash.delete_if {|k, v| k != 'payload'}
event.to_hash.update(event['payload'].to_hash)
event.to_hash.delete_if {|k, v| k == 'payload'}
"
}
What it does is:
remove all fields but the payload one
copy all payload inner fields at the root level
delete the payload field itself
You'll end up with what you need.
It's been a while but here there is a valid workaround, hope it would be useful.
json_encode {
source => "json"
target => "json_string"
}
json {
source => "json_string"
}

Sub-records in Avro with Morphlines

I'm trying to convert JSON into Avro using the kite-sdk morphline module. After playing around I'm able to convert the JSON into Avro using a simple schema (no complex data types).
Then I took it one step further and modified the Avro schema as displayed below (subrec.avsc). As you can see the schema consist of a subrecord.
As soon as I tried to convert the JSON to Avro using the morphlines.conf and the subrec.avsc it failed.
Somehow the JSON paths "/record_type[]/alert/action" are not translated by the toAvro function.
The morphlines.conf
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**"]
commands : [
# Read the JSON blob
{ readJson: {} }
{ logError { format : "record: {}", args : ["#{}"] } }
# Extract JSON
{ extractJsonPaths { flatten: false, paths: {
"/record_type[]/alert/action" : /alert/action,
"/record_type[]/alert/signature_id" : /alert/signature_id,
"/record_type[]/alert/signature" : /alert/signature,
"/record_type[]/alert/category" : /alert/category,
"/record_type[]/alert/severity" : /alert/severity
} } }
{ logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } }
{ extractJsonPaths { flatten: false, paths: {
timestamp : /timestamp,
event_type : /event_type,
source_ip : /src_ip,
source_port : /src_port,
destination_ip : /dest_ip,
destination_port : /dest_port,
protocol : /proto,
} } }
# Create Avro according to schema
{ logError { format : "WE GO TO AVRO"} }
{ toAvro { schemaFile : /etc/flume/conf/conf.empty/subrec.avsc } }
# Create Avro container
{ logError { format : "WE GO TO BINARY"} }
{ writeAvroToByteArray { format: containerlessBinary } }
{ logError { format : "DONE!!!"} }
]
}
]
And the subrec.avsc
{
"type" : "record",
"name" : "Event",
"fields" : [ {
"name" : "timestamp",
"type" : "string"
}, {
"name" : "event_type",
"type" : "string"
}, {
"name" : "source_ip",
"type" : "string"
}, {
"name" : "source_port",
"type" : "int"
}, {
"name" : "destination_ip",
"type" : "string"
}, {
"name" : "destination_port",
"type" : "int"
}, {
"name" : "protocol",
"type" : "string"
}, {
"name": "record_type",
"type" : ["null", {
"name" : "alert",
"type" : "record",
"fields" : [ {
"name" : "action",
"type" : "string"
}, {
"name" : "signature_id",
"type" : "int"
}, {
"name" : "signature",
"type" : "string"
}, {
"name" : "category",
"type" : "string"
}, {
"name" : "severity",
"type" : "int"
}
] } ]
} ]
}
The output on { logError { format : "EXTRACTED THIS : {}", args : ["#{}"] } } I output the following:
[{
/record_type[]/alert / action = [allowed],
/record_type[]/alert / category = [],
/record_type[]/alert / severity = [3],
/record_type[]/alert / signature = [GeoIP from NL,
Netherlands],
/record_type[]/alert / signature_id = [88006],
_attachment_body = [{
"timestamp": "2015-03-23T07:42:01.303046",
"event_type": "alert",
"src_ip": "1.1.1.1",
"src_port": 18192,
"dest_ip": "46.231.41.166",
"dest_port": 62004,
"proto": "TCP",
"alert": {
"action": "allowed",
"gid": "1",
"signature_id": "88006",
"rev": "1",
"signature" : "GeoIP from NL, Netherlands ",
"category" : ""
"severity" : "3"
}
}],
_attachment_mimetype=[json/java + memory],
basename = [simple_eve.json]
}]
UPDATE 2017-06-22
you MUST populate the data in the structure in order for this to work, by using addValues or setValues
{
addValues {
micDefaultHeader : [
{
eventTimestampString : "2017-06-22 18:18:36"
}
]
}
}
after debugging the sources of morphline toAvro, it appears that the record is the first object to be evaluated, no matter what you put in your mappings structure.
the solution is quite simple, but unfortunately took a little extra time, eclipse, running the flume agent in debug mode, cloning the source code and lots of coffee.
here it goes.
my schema:
{
"type" : "record",
"name" : "co_lowbalance_event",
"namespace" : "co.tigo.billing.cboss.lowBalance",
"fields" : [ {
"name" : "dummyValue",
"type" : "string",
"default" : "dummy"
}, {
"name" : "micDefaultHeader",
"type" : {
"type" : "record",
"name" : "mic_default_header_v_1_0",
"namespace" : "com.millicom.schemas.root.struct",
"doc" : "standard millicom header definition",
"fields" : [ {
"name" : "eventTimestampString",
"type" : "string",
"default" : "12345678910"
} ]
}
} ]
}
morphlines file:
morphlines : [
{
id : convertJsonToAvro
importCommands : ["org.kitesdk.**"]
commands : [
{
readJson {
outputClass : java.util.Map
}
}
{
addValues {
micDefaultHeader : [{}]
}
}
{
logDebug { format : "my record: {}", args : ["#{}"] }
}
{
toAvro {
schemaFile : /home/asarubbi/Development/test/co_lowbalance_event.avsc
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
}
}
{
writeAvroToByteArray {
format : containerlessJSON
codec : null
}
}
]
}
]
the magic lies here:
{
addValues {
micDefaultHeader : [{}]
}
}
and in the mappings:
mappings : {
"micDefaultHeader" : micDefaultHeader
"micDefaultHeader/eventTimestampString" : eventTimestampString
}
explanation:
inside the code the first field name that is evaluated is micDefaultHeader of type RECORD. as there's no way to specify a default value for a RECORD (logically correct), the toAvro code evaluates this, does not get any value configured in mappings and therefore it fails at it detects (wrongly) that the record is empty when it shouldn't.
however, taking a look at the code, you may see that it requires a Map object, containing no values to please the parser and continue to the next element.
so we add a map object using the addValues and fill it with an empty map [{}]. notice that this must match the name of the record that is causing you an empty value. in my case "micDefaultHeader"
feel free to comment if you have a better solution, as this looks like a "dirty fix"