Parsing nested JSON log file into ELK - Shodan.io Logs - json

I'm trying to parse the nested JSON log file(Shodan.io)
I have parsed few values. Not able to parse below mentioned values:
hostnames
smb
smb_version
shares
temporary
type
name
comments
anonymous
transport
It will be good if can rid of value of 'raw':[0000, 0000]
You can check my sample log file here.
Below is my existing logstash filter configuration:
input {
file {
path => [ "/path/to/shodan-logs.json" ]
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
target => "json_parse"
add_tag => ["json_filter"]
tag_on_failure => ["json"]
}
grok {
break_on_match => false
add_tag => ["filtered"]
tag_on_failure => ["no_match_found"]
match => {
"message" => [
"%{IP:client_ip}",
"%{TIMESTAMP_ISO8601:timestamp}"
]
}
}
geoip {
source => "client_ip"
add_tag => ["geo_ip_found"]
tag_on_failure => ["geo_ip_not_found"]
}
useragent {
source => "message"
add_tag => ["user_details_found"]
}
# ruby {
# add_tag => ["ruby_filter"]
# code => '
# props = event.get("message")
# if props
# props.each { |x|
# key = x["key"]
# event.set("message.#{key}", x["value"])
# }
# end
# '
# }
mutate {
remove_field => [ "#timestamp", "path", "host", "#version" ]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
user => "elastic"
password => "password"
index => "shodan-demo-%{+dd-MM-YYYY}"
}
stdout {
codec => rubydebug
}
}
Here are snapshots of my output on ELK
Note: I have tried below mentioned methods/filters
Commented ruby code filter, and it is not working.
Multiline input
json codec in input

Related

How to create grok/json filter to parse the below json format

I want to parse this JSON to Kibana using Logstash
{
"Format": "IDEA0",
"ID": "2b03eb1f-fc4c-4f67-94e5-31c9fb32dccc",
"DetectTime": "2022-01-31T08:16:12.600470+07:00",
"EventTime": "2022-01-31T01:23:01.637438+00:00",
"Category": ['Intrusion.Botnet'],
"Confidence": 0.03,
"Note": "C&C channel, destination IP: 192.168.1.24 port: 8007/tcp score: 0.9324",
"Source": [{'IP4': ['192.168.1.25'], 'Type': ['CC']}]
}
I want that ID, Detect Time, Event Time, Category, Confidence, Note, Source is a single field so later i can do visualization in kibana.
Here's what I'm already trying to do
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/*.json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-test"
user => "***"
password => "***"
}
stdout{}
}
But the field is not separated correctly
Any help will be meaningful.
Thanks.
:::UPDATE:::
I already found the solution (Help by other guys from Elastic forum but not 100% optimize need to tweak it a little more)
Here's the Logstash Conf I'm using if someone needs it in the future
input {
file {
path => "/home/ubuntu/Downloads/StratosphereLinuxIPS/output/alerts.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => multiline { pattern => "^{$" negate => "true" what => "previous" }
}
}
filter {
mutate {
gsub => ["message", "'", '"']
}
json {
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test-keempat"
user => "xxx"
password => "xxx"
}
stdout{ codec => rubydebug }
}
Thanks !

Codec Json doesn't send data to elasticsearch

INPUT: json
{"userid": 125,"type": "SELL"}
{"userid": 127,"type": "SELL"}
LOGSTASH CONF FILE:
input {
kafka {
bootstrap_servers => ""
topics => ["topic1"]
codec => "json"
}
}
output {
amazon_es {
hosts => [""]
region => ""
aws_access_key_id => ''
aws_secret_access_key => ''
index => "indexname"
}
stdout { codec => rubydebug }
}
stdout output:
{
"userid" => 127,
"#version" => "1",
"#timestamp" => 2018-10-18T13:54:37.641Z,
"type" => "SELL"
}
The output looks exactly like what I want. But this will just not go into the elasticsearch topic.
If I do not use the json filter the entire json goes as 'message' into ES.
Any help will be appreciated.
Did you try to use json filter?
input {
kafka {
bootstrap_servers => ""
topics => ["topic1"]
}
}
filter {
json {
source => "message"
}
}
output {
amazon_es {
hosts => [""]
region => ""
aws_access_key_id => ''
aws_secret_access_key => ''
index => "indexname",
custom_headers => {'Content-Type': 'application/json'}
}
stdout { codec => rubydebug }
}
I figured the problem. I was using a field called type in my json file data which was causing the problem. logstash stored it as _type in ES which is no more allowed with the newer versions of elasticsearch. And that is why if my type was 'BUY' it would take the entry and if it was 'SELL' it would reject it. Thank you for the help!

Can't able to import json data from file by Logstash

I'm trying to import JSON data from my log file mylogs.log. Following is my logstash config file.
input {
stdin { }
file {
codec => "json"
path => "/logs/mylogs.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "jsonlog"
}
stdout { codec => rubydebug }
file {
path => "/logs/out.log"
}
}
After executing this config file, if I passed any JSON data it's getting parse & sending to Elasticsearch instance. I can see from Elasticsearch instance. But, whatever data exist in log file those not get imported by logstash.
Also, when I manually adding JSON data which getting parse by Logstash & send Elasticsearch instance... those data also not gettign logged in my OUTPUT file.
Don't know what is the issue.
My sample JSON data which I'm using.
{ "logger":"com.myApp.ClassName", "timestamp":"1456976539634", "level":"ERROR", "thread":"pool-3-thread-19", "message":"Danger. There was an error", "throwable":"java.Exception" }
{ "logger":"com.myApp.ClassName", "timestamp":"1456976539649", "level":"ERROR", "thread":"pool-3-thread-16", "message":"I cannot go on", "throwable":"java.Exception" }
OK, after making following modification file path plugin in Lagstash config file it's working now.
input {
stdin { }
file {
codec => "json"
path => "/home/suresh/Desktop/tools/logstash-5.1.1/logs/mylogs.log"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "jsonlog2"
}
stdout { codec => rubydebug }
file {
path => "/home/suresh/Desktop/tools/logstash-5.1.1/logs/out.log"
}
}
But, I'm getting an error of "tags" => [
[0] "_jsonparsefailure"
]
Response from Console-
{
"path" => "/home/suresh/Desktop/tools/logstash-5.1.1/logs/mylogs.log",
"#timestamp" => 2016-12-27T09:56:08.854Z,
"level" => "ERROR",
"logger" => "com.myApp.ClassName",
"throwable" => "java.Exception",
"#version" => "1",
"host" => "BLR-SOFT-245",
"thread" => "pool-3-thread-19",
"message" => "Danger. There was an error",
"timestamp" => "1456976539634",
"tags" => [
[0] "_jsonparsefailure"
]
}

logstash mysql general_logs CSV format

I want to add the mysql general_log to the logstash. I have managed to make the mysql log in CSV format and with the CSV pattern there should be no easier thing to do. Here is my general_log entry:
"2015-08-15 11:52:57","mrr[mrr] # localhost []",4703,0,"Query","SET NAMES utf8"
"2015-08-15 11:52:57","mrr[mrr] # localhost []",4703,0,"Query","SELECT ##SESSION.sql_mode"
"2015-08-15 11:52:57","mrr[mrr] # localhost []",4703,0,"Query","SET SESSION sql_mode='NO_ENGINE_SUBSTITUTION'"
"2015-08-15 11:52:57","mrr[mrr] # localhost []",4703,0,"Init DB","mrr"
and here is my logstash.conf:
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/pki/tls/certs/logstash_forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash_forwarder.key"
}
}
filter {
if [type] == "nginx-access" {
grok {
match => { 'message' => '%{IPORHOST:clientip} %{NGUSER:indent} %{NGUSER:agent} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)\" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:\"(?:%{URI:referrer}|-))\" (?:%{QS:referree}) %{QS:agent}' }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
if [type] == "mysql-general" {
csv {
columns => [ "#timestamp(6)", "user_host", "thready_id", "server_id", "ctype", "query" ]
separator => ","
}
grok {
match => { "user_host", "%{WORD:remoteuser}\[%{WORD:localuser}\] \# %{IPORHOST:dbhost} \[(?:%{IPORHOST:qhost}|-)\]" }
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
host => "172.17.0.5"
cluster => "z0z0.tk-1.5"
flush_size => 2000
}
}
however the user_host column has this format:
"mrr[mrr] # localhost []" and I would like to split it into at least two different values one for the user and the otherone for the host.
I have run this configuration on logstash and it ends up in _grokparsefailure due to the grok parse
when I am running the checktest option on the config file I am getting the following output:
Error: Expected one of #, => at line 36, column 26 (byte 1058) after filter {
if [type] == "nginx-access" {
grok {
match => { 'message' => '%{IPORHOST:clientip} %{NGUSER:indent} %{NGUSER:agent} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)\" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:\"(?:%{URI:referrer}|-))\" (?:%{QS:referree}) %{QS:agent}' }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
if [type] == "mysql-general" {
csv {
columns => [ "#timestamp(6)", "user_host", "thready_id", "server_id", "ctype", "query" ]
separator => ","
}
grok {
match => { "user_host"
Can you give me an idea what is wrong?
The csv{} filter is only parsing, um, comma-separated values. If you'd like to parse fields of other formats, use grok{} on the user_host column after the csv{} filter has created it.
EDIT: to be more explicit.
Run the csv filter:
csv {
columns => [ "#timestamp(6)", "user_host", "thready_id". "server_id", "ctype", "query" ]
separator => ","
}
which should create you a field called "user_host".
You can then run this field through a grok filter, like this (untested) one:
grok {
match => [ "user_host", "%{WORD:myUser}\[%{WORD}\] # %{WORD:myHost} \[\]" ]
}
This will create two more fields for you: myUser and myHost.
Got it working. The error was in fact in the grok patters since the first user and the last host was at some point emtpy the grok did failed in parsing so I had to add some brackets to accept also empty strings. The current logstash.conf looks like this:
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/pki/tls/certs/logstash_forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash_forwarder.key"
}
}
filter {
if [type] == "nginx-access" {
grok {
match => { 'message' => '%{IPORHOST:clientip} %{NGUSER:indent} %{NGUSER:agent} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|)\" %{NUMBER:answer} (?:%{NUMBER:byte}|-) (?:\"(?:%{URI:referrer}|-))\" (?:%{QS:referree}) %{QS:agent}' }
}
geoip {
source => "clientip"
target => "geoip"
database => "/etc/logstash/GeoLiteCity.dat"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
convert => [ "[geoip][coordinates]", "float" ]
}
}
if [type] == "mysql-general" {
csv {
columns => [ "#timestamp(6)", "user_host", "thready_id", "server_id", "ctype", "query" ]
separator => ","
}
grok {
match => { "user_host", "(?:%{WORD:remoteuser}|)\[%{WORD:localuser}\] \# %{IPORHOST:dbhost} \[(?:%{IPORHOST:qhost}|)\]" }
}
}
}
output {
stdout {
codec => rubydebug
}
elasticsearch {
host => "172.17.0.5"
cluster => "clustername"
flush_size => 2000
}
}
Thanks for your help and suggestions

Parse multiline JSON with grok in logstash

I've got a JSON of the format:
{
"SOURCE":"Source A",
"Model":"ModelABC",
"Qty":"3"
}
I'm trying to parse this JSON using logstash. Basically I want the logstash output to be a list of key:value pairs that I can analyze using kibana. I thought this could be done out of the box. From a lot of reading, I understand I must use the grok plugin (I am still not sure what the json plugin is for). But I am unable to get an event with all the fields. I get multiple events (one even for each attribute of my JSON). Like so:
{
"message" => " \"SOURCE\": \"Source A\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.432Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
"message" => " \"Model\": \"ModelABC\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.438Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
{
"message" => " \"Qty\": \"3\",",
"#version" => "1",
"#timestamp" => "2014-08-31T01:26:23.438Z",
"type" => "my-json",
"tags" => [
[0] "tag-json"
],
"host" => "myserver.example.com",
"path" => "/opt/mount/ELK/json/mytestjson.json"
}
Should I use the multiline codec or the json_lines codec? If so, how can I do that? Do I need to write my own grok pattern or is there something generic for JSONs that will give me ONE EVENT with key:value pairs that I get for one event above? I couldn't find any documentation that sheds light on this. Any help would be appreciated. My conf file is shown below:
input
{
file
{
type => "my-json"
path => ["/opt/mount/ELK/json/mytestjson.json"]
codec => json
tags => "tag-json"
}
}
filter
{
if [type] == "my-json"
{
date { locale => "en" match => [ "RECEIVE-TIMESTAMP", "yyyy-mm-dd HH:mm:ss" ] }
}
}
output
{
elasticsearch
{
host => localhost
}
stdout { codec => rubydebug }
}
I think I found a working answer to my problem. I am not sure if it's a clean solution, but it helps parse multiline JSONs of the type above.
input
{
file
{
codec => multiline
{
pattern => '^\{'
negate => true
what => previous
}
path => ["/opt/mount/ELK/json/*.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
exclude => "*.gz"
}
}
filter
{
mutate
{
replace => [ "message", "%{message}}" ]
gsub => [ 'message','\n','']
}
if [message] =~ /^{.*}$/
{
json { source => message }
}
}
output
{
stdout { codec => rubydebug }
}
My mutliline codec doesn't handle the last brace and therefore it doesn't appear as a JSON to json { source => message }. Hence the mutate filter:
replace => [ "message", "%{message}}" ]
That adds the missing brace. and the
gsub => [ 'message','\n','']
removes the \n characters that are introduced. At the end of it, I have a one-line JSON that can be read by json { source => message }
If there's a cleaner/easier way to convert the original multi-line JSON to a one-line JSON, please do POST as I feel the above isn't too clean.
You will need to use a multiline codec.
input {
file {
codec => multiline {
pattern => '^{'
negate => true
what => previous
}
path => ['/opt/mount/ELK/json/mytestjson.json']
}
}
filter {
json {
source => message
remove_field => message
}
}
The problem you will run into has to do with the last event in the file. It won't show up till there is another event in the file (so basically you'll lose the last event in a file) -- you could append a single { to the file before it gets rotated to deal with that situation.