I'm sending JSON to logstash with a config like so:
filter {
json {
source => "event"
remove_field => [ "event" ]
}
}
Here is an example JSON object I'm sending:
{
"#timestamp": "2015-04-07T22:26:37.786Z",
"type": "event",
"event": {
"activityRecord": {
"id": 68479,
"completeTime": 1428445597542,
"data": {
"2015-03-16": true,
"2015-03-17": true,
"2015-03-18": true,
"2015-03-19": true
}
}
}
}
Because of the arbitrary nature of the activityRecord.data object, I don't want logstash and elasticsearch to index all these date fields. As is, I see activityRecord.data.2015-03-16 as a field to filter on in Kibana.
Is there a way to ignore this sub-tree of data? Or at least delete it after it has already been parsed? I tried remove_field with wildcards and whatnot, but no luck.
Though not entirely intuitive it is documented that subfield references are made with square brackets, e.g. [field][subfield], so that's what you'll have to use with remove_field:
mutate {
remove_field => "[event][activityRecord][data]"
}
To delete fields using wildcard matching you'd have to use a ruby filter.
Related
I have a question about filtering entries in Logstash. I have two different logs coming into Logstash. One log is just a std format with a timestamp and message, but the other comes in as JSON.
I use an if statement to test for a certain host and if that host is present, then I use the JSON filter to apply to the message... the problem is that when it encounters the non-JSON stdout message it can't parse it and throws exceptions.
Does anyone know how to test to see if an entry is JSON coming in apply the filter and if not, just ignore it?
thanks
if [agent][hostname] == "some host"
# if an entry is not in json format how to ignore?
{
json {
source => "message"
target => "gpfs"
}
}
You can try with a grok filter as a first step.
grok {
match => {
"message" => [
"{%{GREEDYDATA:json_message}}",
"%{GREEDYDATA:std_out}"
]
}
}
if [json_message]
{
mutate {
replace => { "json_message" => "{%{json_message}}"}
}
json {
source => "json_message"
target => "gpfs"
}
}
Probably there is a more cleaner solution then this, but it will do the job.
Consider a subset of a sample output from http://demo.nginx.com/status:
{
"timestamp": 1516053885198,
"server_zones": {
"hg.nginx.org": {
... // Data for "hg.nginx.org"
},
"trac.nginx.org": {
... // Data for "trac.nginx.org"
}
}
}
The keys "hg.nginx.org" and "track.nginx.org" are quite arbitrary, and I would like to parse them into something meaningful for Elasticsearch. In other words, each key under "server_zones" should be transformed into a separate event. Logstash should thus emit the following events:
[
{
"timestamp": 1516053885198,
"server_zone": "hg.nginx.org",
... // Data for "hg.nginx.org"
},
{
"timestamp": 1516053885198,
"server_zone": "trac.nginx.org",
... // Data for "trac.nginx.org"
}
]
What is the best way to go about doing this?
You can try using the ruby filter. Get the server zones and create a new object using the key value pairs you want to include. From the top of my head, something like below should work. Obviously you then need to map the object to your field in the index. Change the snipped based on your custom format i.e. build the array or object as you want.
filter {
ruby {
code => " time = event.get('timestamp')
myArr = []
event.to_hash.select {|k,v| ['server_zones'].include?(k)}.each do |key,value|
myCustomObject = {}
#map the key value pairs into myCustomObject
myCustomObject[timestamp] = time
myCustomObject[key] = value
myArr.push(myCustomObject) #you'd probably move this out based on nesting level
end
map['my_indexed_field'] = myArr
"
}
}
In the output section use rubydebug for error debugging
output {
stdout { codec => rubydebug }
}
I have the following format of JSON data that I send to a logstash instance
listening on a http endpoint
{
client: "c",
pageInfo: ["a","b","c"],
restInfo: ["r","s","t"]
}
My goal is to send this input to an elasticsearch endpoint as two different types in the same index; for example
PUT elasticsearchhost:port/myindex/pageInfo
{ client: "c", pageInfo: ["a","b","c"] }
PUT elasticsearchhost:port/myindex/restInfo
{ client: "c", restInfo: ["r","s","t"] }
I have tried with some filters in logstash (split, mutate, grok) but I cannot understand how to perform this very specific split or if I have to modify my configuration also in the output section
You will need to use clone to clone the events and then modify the clones.
For example:
filter {
clone { clones => ["pageInfo", "restInfo" ] }
if [type]=="pageInfo" {
mutate {
remove_field => "restInfo"
}
}
if [type] == "restInfo" {
mutate {
remove_field => "pageInfo"
}
}
}
And then on your elasticsearch output, be sure to include document_type => "%{type}"
I am trying to use logstash for analyzing a file containing JSON objects as follows:
{"Query":{"project_id":"a7565b911f324a9199a91854ea18de7e","timestamp":1392076800,"tx_id":"2e20a255448742cebdd2ccf5c207cd4e","token":"3F23A788D06DD5FE9745D140C264C2A4D7A8C0E6acf4a4e01ba39c66c7c9cbd6a123588b22dc3a24"}}
{"Response":{"result_code":"Success","project_id":"a7565b911f324a9199a91854ea18de7e","timestamp":1392076801,"http_status_code":200,"tx_id":"2e20a255448742cebdd2ccf5c207cd4e","token":"3F23A788D06DD5FE9745D140C264C2A4D7A8C0E6acf4a4e01ba39c66c7c9cbd6a123588b22dc3a24","targets":[]}}
{"Query":{"project_id":"a7565b911f324a9199a91854ea18de7e","timestamp":1392076801,"tx_id":"f7f68c7fb14f4959a1db1a206c88a5b7","token":"3F23A788D06DD5FE9745D140C264C2A4D7A8C0E6acf4a4e01ba39c66c7c9cbd6a123588b22dc3a24"}}
Ideally i'd expect Logstash to understand the JSON.
I used the following config:
input {
file {
type => "recolog"
format => json_event
# Wildcards work, here :)
path => [ "/root/isaac/DailyLog/reco.log" ]
}
}
output {
stdout { debug => true }
elasticsearch { embedded => true }
}
I built this file based on this Apache recipe
When running logstash with debug = true, it reads the objects like this:
How could i see stats in the kibana GUI based on my JSON file, for example number of Query objects and even queries based on timestamp.
For now it looks like it understand a very basic version of the data not the structure of it.
Thx in advance
I found out that logstash will automatically detect JSON byt using the codec field within the file input as follows:
input {
stdin {
type => "stdin-type"
}
file {
type => "prodlog"
# Wildcards work, here :)
path => [ "/root/isaac/Mylogs/testlog.log"]
codec => json
}
}
output {
stdout { debug => true }
elasticsearch { embedded => true }
}
Then Kibana showed the fields of the JSON perfectly.
I would like to extract the interface word from a text from logstash.
Sample log -
2013 Aug 28 13:14:49 logFile: Interface Etherface1/9 is down (Transceiver Absent)
I want to extract "Etherface1/9" out of this and add it as a field called interface.
I am having the following conf file for the same
input
{
file
{
type => "syslog"
path => [ "/home/vineeth/logstash/mylog.log" ]
#path => ["d:/New Folder/sjdc.show.tech/n5k-3a-show-tech.txt"]
start_position=>["beginning"]
}
}
filter {
grok {
type => "syslog"
add_field => [ "port", "Interface %{WORD}" ]
}
}
output
{
stdout
{
debug => true debug_format => "json"
}
elasticsearch
{
embedded => true
}
}
But then i am always getting "_grokparsefailure" under tags and none of these new fields are appearing.
Kindly let me know how i can get the required output
The grok filter expects that you're trying to match some text. Since you're not passing any possible matches, it triggers the _grokparsefailure tag (per the manual, the tag is added "when there has been no successful match").
You might use a match like this:
grok {
match => ["message", "Interface %{DATA:port} is down"]
}
This will still fail if the match text isn't present. Logstash is pretty good at parsing fields with a simple structure, but data embedded in a user-friendly string is sometimes tricky. Usually you'll need to branch based on the message format.
Here's a very simple example, using a conditional with a regex:
if [message] =~ /Interface .+ is down/ {
grok {
match => ["message", "Interface %{DATA:port} is down"]
}
}