filebeat #timestamp not overwritten - json

I use filebeat to write logs to an elasticsearch server. My logs are in json format. Every line is a json string that looks like this
{"#timestamp": "2017-04-11T07:52:480,230", "user_id": "1", "delay": 12}
I want the #timestamp field from my logs to replace the #timestamp field that filebeat creates when reading the logs. On my kibana dashboard I always get
json_error:#timestamp not overwritten (parse error on 2017-04-11T07:52:48,230)
and end up seeing the #timestamp field created by filebeat
My filebeat conf includes those lines regarding overwriting fields
json.keys_under_root: true
json.overwrite_keys: true
json.add_error_key: true
Also from my log4j conf the #timestamp field created in my logs is in ISO8601 format. Any idea what the problem is and why the #timestamp field is not overwritten?

The problem was the format of the timestamp that log4j is producing.
Filebeat expects something of the form "2017-04-11T09:38:33.365Z" it has to have to T in the middle the Z in the end and dot instead of comma before the milliseconds.
Quickest (and somewhat dirty) way I found to do that was by using the following pattern
pattern='{"#timestamp": "%d{YYYY-MM-dd}T%d{HH:mm:ss.SSS}Z"}
A similar issue can be found here. The suggested solution does not solve the filebeat issue though because it uses comma!

Related

NXLog: Json input to GELF UDP Output

We have a setup where a program logs to a .Json file, in a format that follows the GELF specification.
Currently this is sent to a Graylog2 server using HTTP. This works, but due to the nature of HTTP there's a significant latency, which is an issue if there is a large amount of log messages.
I want to change the HTTP delivery method to UDP, in order to just 'fire and forget'.
The logs are written to files like this:
{ "short_message": "<message>", "host": "<host>", "full_message": "<message>", "_extraField1": "<value>", "_extraField2": "<value>", "_extraField3": "<value>" }
Current configuration is this:
<Extension json>
Module xm_json
</Extension>
<Input jsonLogs>
Module im_file
File '<File Location>'
PollInterval 5
SavePos True
ReadFromLast True
Recursive False
RenameCheck False
CloseWhenIdle True
</Input>
<Output udp>
Module om_udp
Host <IP>
Port <Port>
OutputType GELF_UDP
</Output>
With this setup, part of json log message is added to the "message" field of a GELF message, and sent to the server.
I've tried adding the line `Exec parse_json(), but this will simply result in all fields other than short_message and full_message being excluded.
I'm unsure how to configure this correctly. Even just having the complete log message added to a field is preferable, since I can add an extractor on the server side.
You'd need Exec parse_json() in order for GELF_UDP to generate proper output but it was unclear what the exact issue is with message and full/short_message.
Another option you could try is simply ship the log via om_tcp. In this case you'll not need to use OutputType GELF_TCP since it is already formatted that way.

How to load OSM (GeoJSON) data to ArangoDB?

How I can load OSM data to ArangoDB?
I loaded data sed named luxembourg-latest.osm.pbf from OSM, than converted it to JSON with OSMTOGEOJSON, after I tried to load result geojson to ArangoDB with next command: arangoimp --file out.json --collection lux1 --server.database geodb and got hude list of errors:
...
2017-03-17T12:44:28Z [7712] WARNING at position 719386: invalid JSON type (expecting object, probably parse error), offending context: ],
2017-03-17T12:44:28Z [7712] WARNING at position 719387: invalid JSON type (expecting object, probably parse error), offending context: [
2017-03-17T12:44:28Z [7712] WARNING at position 719388: invalid JSON type (expecting object, probably parse error), offending context: 5.867441,
...
What I am doing wrong?
upd: it's seems that converter osm2json converter should be run with option osmtogeojson --ndjson that produce items not as single Json, but in line by line mode.
As #dmitry-bubnenkov already found out, --ndjson is required to produce the right input for ArangoImp.
One has to know here, that ArangoImp expects a JSON-Subset (since it doesn't parse the json on its own) dubbed as JSONL.
Thus, Each line of the JSON-File is expected to become one json document in the collection after the import. To maximize performance and simplify the implementation, The json is not completely parsed before sending it to the server.
It tries to chop the JSON into chunks with the maximum request size that the server permits. It leans on the JSONL-line endings to isolate possible chunks.
However, the server expects valid JSON for sure. Sending the chopped part to the server with possibly incomplete JSON documents will lead to parse errors on the server, which is the error message you saw in your output.

Kubernetes save JSON logs to file with escaped quotes. Why?

I'm using Fluentd with Elasticsearch for logs from Kubernetes but I noticed that some JSON logs cannot be correctly indexed because JSON is stored as string.
Logs from kubectl logs look like:
{"timestamp":"2016-11-03T15:48:12.007Z","level":"INFO","thread":"cromwell-system-akka.actor.default-dispatcher-4","logger":"akka.event.slf4j.Slf4jLogger","message":"Slf4jLogger started","context":"default"}
But logs saved in file in /var/log/containers/... have escaped quotes and makes them string instead of JSON which spoil indexing:
{"log":"{\"timestamp\":\"2016-11-03T15:45:07.976Z\",\"level\":\"INFO\",\"thread\":\"cromwell-system-akka.actor.default-dispatcher-4\",\"logger\":\"akka.event.slf4j.Slf4jLogger\",\"message\":\"Slf4jLogger started\",\"context\":\"default\"}\n","stream":"stdout","time":"2016-11-03T15:45:07.995443479Z"}
I'm trying to get logs looking like:
{
"log": {
"timestamp": "2016-11-03T15:45:07.976Z",
"level": "INFO",
"thread": "cromwell-system-akka.actor.default-dispatcher-4",
"logger": "akka.event.slf4j.Slf4jLogger",
"message": "Slf4jLogger started",
"context": "default"
},
"stream": "stdout",
"time": "2016-11-03T15: 45: 07.995443479Z"
}
Can you suggest me how to do it?
I ran into the same issue, however I'm using fluent-bit, the "C" version of fluentd (Ruby). Since this is an older issue, I'm answering for the benefit of others who find this.
In fluent-bit v0.13, they addressed this issue. You can now specify the parser to use through annotations. The parser can be configured to decode the log as json.
fluent-bit issue detailing problem
blog post about annotations for specifying the parser
json parser documentation - The docker container's logs come out as json. However, your logs are also json. So an additional decoder is needed.
The final parser with decoder looks like this:
[PARSER]
Name embedded-json
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=======|=========
Decode_Field_As escaped log do_next
Decode_Field_As json log

How to generate a JSON log from nginx?

I'm trying to generate a JSON log from nginx.
I'm aware of solutions like this one but some of the fields I want to log include user generated input (like HTTP headers) which need to be escaped properly.
I'm aware of the nginx changelog entries from Oct 2011 and May 2008 that say:
*) Change: now the 0x7F-0x1F characters are escaped as \xXX in an
access_log.
*) Change: now the 0x00-0x1F, '"' and '\' characters are escaped as \xXX
in an access_log.
but this still doesn't help since \xXX is invalid in a JSON string.
I've also looked at the HttpSetMiscModule module which has a set_quote_json_str directive, but this just seems to add \x22 around the strings which doesn't help.
Any idea for other solutions to log in JSON format from nginx?
Finally it looks like we have good way to do this with vanilla nginx without any modules. Just define:
log_format json_combined escape=json
'{'
'"time_local":"$time_local",'
'"remote_addr":"$remote_addr",'
'"remote_user":"$remote_user",'
'"request":"$request",'
'"status": "$status",'
'"body_bytes_sent":"$body_bytes_sent",'
'"request_time":"$request_time",'
'"http_referrer":"$http_referer",'
'"http_user_agent":"$http_user_agent"'
'}';
Note that escape=json was added in nginx 1.11.8.
http://nginx.org/en/docs/http/ngx_http_log_module.html#log_format
You can try to use that one https://github.com/jiaz/nginx-http-json-log - addition module for Nginx.
You can try to use:
addition module for Nginx nginx-http-json-log
Use any language as done in nginx-json-logformat with example /etc/nginx/conf.d/json_log.conf
A version of the Nginx HTTP stub status module that outputs in JSON format
PS:
The if parameter (1.7.0) enables conditional logging. A request will not be logged if the condition evaluates to “0” or an empty string:
map $status $http_referer{
~\xXX 0;
default 1;
}
access_log /path/to/access.log combined if=$http_referer;
It’s a good idea to use a tool such as https://github.com/zaach/jsonlint to check your JSON data. You can test the output of your new logging format and make sure it’s real-and-proper JSON.

Format for writing a JSON log file?

Are there any format standards for writing and parsing JSON log files?
The problem I see is that you can't have a "pure" JSON log file since you need matching brackets and trailing commas are forbidden. So while the following may be written by an application, it can't be parsed by standard JSON parsers:
[{date:'2012-01-01 02:00:01', severity:"ERROR", msg:"Foo failed"},
{date:'2012-01-01 02:04:02', severity:"INFO", msg:"Bar was successful"},
{date:'2012-01-01 02:10:12', severity:"DEBUG", msg:"Baz was notified"},
So you must have some conventions on how to structure your log files in a way that a parser can process them. The easiest thing would be "one log message object per line, newlines in string values are escaped". Are there any existing standards and tools?
You're not going to write a single JSON object per FILE, you're going to write a JSON object per LINE. Each line can then be parsed individually. You don't need to worry about trailing commas and have the whole set of objects enclosed by brackets, etc. See http://blog.nodejs.org/2012/03/28/service-logging-in-json-with-bunyan/ for a more detailed explanation of what this can look like.
Also check out Fluentd http://fluentd.org/ for a neat toolset to work with.
Edit: this format is now called JSONLines or jsonl as pointed out by #Mnebuerquo below - see http://jsonlines.org/
gem log_formatter is the ruby choice, as the formatter group, now support json formatter for ruby and log4r.
simple to get stated for ruby.
gem 'log_formatter'
require 'log_formatter'
require 'log_formatter/ruby_json_formatter'
logger.debug({data: "test data", author: 'chad'})
result
{
"source": "examples",
"data": "test data",
"author": "chad",
"log_level": "DEBUG",
"log_type": null,
"log_app": "app",
"log_timestamp": "2016-08-25T15:34:25+08:00"
}
for log4r:
require 'log4r'
require 'log_formatter'
require 'log_formatter/log4r_json_formatter'
logger = Log4r::Logger.new('Log4RTest')
outputter = Log4r::StdoutOutputter.new(
"console",
:formatter => Log4r::JSONFormatter::Base.new
)
logger.add(outputter)
logger.debug( {data: "test data", author: 'chad'} )
Advance usage: README
Full Example Code: examples