Format for writing a JSON log file? - json

Are there any format standards for writing and parsing JSON log files?
The problem I see is that you can't have a "pure" JSON log file since you need matching brackets and trailing commas are forbidden. So while the following may be written by an application, it can't be parsed by standard JSON parsers:
[{date:'2012-01-01 02:00:01', severity:"ERROR", msg:"Foo failed"},
{date:'2012-01-01 02:04:02', severity:"INFO", msg:"Bar was successful"},
{date:'2012-01-01 02:10:12', severity:"DEBUG", msg:"Baz was notified"},
So you must have some conventions on how to structure your log files in a way that a parser can process them. The easiest thing would be "one log message object per line, newlines in string values are escaped". Are there any existing standards and tools?

You're not going to write a single JSON object per FILE, you're going to write a JSON object per LINE. Each line can then be parsed individually. You don't need to worry about trailing commas and have the whole set of objects enclosed by brackets, etc. See http://blog.nodejs.org/2012/03/28/service-logging-in-json-with-bunyan/ for a more detailed explanation of what this can look like.
Also check out Fluentd http://fluentd.org/ for a neat toolset to work with.
Edit: this format is now called JSONLines or jsonl as pointed out by #Mnebuerquo below - see http://jsonlines.org/

gem log_formatter is the ruby choice, as the formatter group, now support json formatter for ruby and log4r.
simple to get stated for ruby.
gem 'log_formatter'
require 'log_formatter'
require 'log_formatter/ruby_json_formatter'
logger.debug({data: "test data", author: 'chad'})
result
{
"source": "examples",
"data": "test data",
"author": "chad",
"log_level": "DEBUG",
"log_type": null,
"log_app": "app",
"log_timestamp": "2016-08-25T15:34:25+08:00"
}
for log4r:
require 'log4r'
require 'log_formatter'
require 'log_formatter/log4r_json_formatter'
logger = Log4r::Logger.new('Log4RTest')
outputter = Log4r::StdoutOutputter.new(
"console",
:formatter => Log4r::JSONFormatter::Base.new
)
logger.add(outputter)
logger.debug( {data: "test data", author: 'chad'} )
Advance usage: README
Full Example Code: examples

Related

Using jq to edit key:value in a .conf file from shell

I'm trying to write a shell script that passes an env variable into a .conf file so that I can manipulate the log_file and log_level keys programatically.
Actual file as station.conf
{
"SX1301_conf": {
"lorawan_public": true,
"clksrc": 1,
"radio_0": {
"type": "SX1257",
"rssi_offset": -166.0,
"tx_enable": true,
"antenna_gain": 0
},
"radio_1": {
"type": "SX1257",
"rssi_offset": -166.0,
"tx_enable": false
}
},
"station_conf": {
"log_file": "stderr",
"log_level": "DEBUG",
/* XDEBUG,DEBUG,VERBOSE,INFO,NOTICE,WARNING,ERROR,CRITICAL */
"log_size": 10000000,
"log_rotate": 3,
"CUPS_RESYNC_INTV": "1s"
}
}
I wanted to test manually before passing shell variables so I tried jq '".station_conf.log_level="ERROR"' station.conf, but I keep getting errors including shell quoting errors and invalid numeric literal errors (which btw, seems to be a open bug: https://github.com/stedolan/jq/issues/501)
Any tips on how to do this? Ideally I'd be able to replace log_level value with a $LOG_LEVEL from my env. Thanks!
Assuming the input is valid JSON, for robustness, you could start with:
jq '.station_conf.log_level="ERROR"' station.conf
To pass in a shell variable, consider:
jq —-arg v "$LOG_LEVEL" '
.station_conf.log_level=$v' station.conf
You are getting invalid numeric literal error because at least your example input is not valid json. As you can see, it contains /* comment */, which is not supported by jq. You have several options here.
keep using jq and make your input files valid json.
use another tool instead of jq, which support comments and/or other non-standard features.
If you choose second way, i.e. different tool, you can find some alternatives either on jq web page (https://github.com/stedolan/jq/wiki/FAQ#processing-not-quite-valid-json) or there is also scout (https://github.com/ABridoux/scout).

Retrieving the first entity out of several ones

I am a rank beginner with jq, and I've been going through the tutorial, but I think there is a conceptual difference I don't understand. A common problem I encounter is that a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
In the tutorial, they do this:
# We can use jq to extract just the first commit.
$ curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.[0]'
Here is an example with one object - here, I'd like to return the whole array (just like my_array=['foo']; my_array[0] would return foo in Python).
wget https://hacker-news.firebaseio.com/v0/item/8863.json
I can access and pretty-print the whole thing with .
$ cat 8863.json | jq '.'
$
{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
9224,
...
8876
],
"score": 104,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}
But trying to get the first element fails:
$ cat 8863.json| jq '.[0]'
$ jq: error (at <stdin>:0): Cannot index object with number
I get the same error jq '.[0]' 8863.json, but strangely echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0. What is the difference? Also, is this not the correct way to get the zeroth member of the JSON?
I've looked at other SO posts with this error message and at the manual, but I'm still confused. I think of the file as an array of JSON objects, and I'd like to get the first. But it looks like jq works with something called a "stream", and does operations on all of it (say, return one given field from every object).
Clarification:
Let's say I have 2 objects in my JSON:
{
"by": "pg",
"id": 160705,
"poll": 160704,
"score": 335,
"text": "Yes, ban them; I'm tired of seeing Valleywag stories on News.YC.",
"time": 1207886576,
"type": "pollopt"
}
{
"by": "dpapathanasiou",
"id": 16070,
"kids": [
16078
],
"parent": 16069,
"text": "Dividends don't mean that much: Microsoft in its dominant years (when they had 40%-plus margins and were raking in the cash) never paid a dividend (they did so only recently).",
"time": 1177355133,
"type": "comment"
}
How would I get the entire first object (lines 1-9) with jq?
Cannot index object with number
This error message says it all, you can't index objects with numbers. If you want to get the value of by field, you need to do
jq '.by' file
Wrt
echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0.
It's normal since you didn't specify -R/--raw-input flag, and so jq sees the shell string 8863.json as a JSON string, and one cannot apply array indexing to JSON strings. (To get the first character as a string, you'd write .[0:1].)
If your input file consists of several separate entities, to get the first one:
jq -n 'input' file
or,
jq -n 'first(inputs)' file
To get nth (let's say 5th for example):
jq -n 'nth(5; inputs)' file
a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
As implied in #OguzIsmail's response, there are important differences between:
- a JSON file (i.e, a file containing exactly one JSON entity);
- a file containing a sequence (i.e., stream) of JSON entities;
- a file containing an array of JSON entities.
In the first two cases, you can write jq -n input to select the first entity, and in the case of an array of entities, jq .[0] will suffice.
(In JSON-speak, a "JSON object" is a kind of dictionary, and is not to be confused with JSON entities in general.)
If you have a bunch of JSON objects (whether as a stream or array or whatever), just looking at the first often doesn't really give an accurate picture of all them. For getting a bird's eye view of a bunch of objects, using a "schema inference engine" is often the way to go. For this purpose, you might like to consider my schema.jq schema inference engine. It's usually very simple to use but of course how you use it will depend on whether you have a stream or array of JSON entities. For basic details, see https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed; for related topics (e.g. verification), see the entry for JESS at https://github.com/stedolan/jq/wiki/Modules
Please note that schema.jq infers a structural schema that mirrors the entities under consideration. Such structural schemas have little in common with JSON Schema schemas, which you might also like to consider.

Kubernetes save JSON logs to file with escaped quotes. Why?

I'm using Fluentd with Elasticsearch for logs from Kubernetes but I noticed that some JSON logs cannot be correctly indexed because JSON is stored as string.
Logs from kubectl logs look like:
{"timestamp":"2016-11-03T15:48:12.007Z","level":"INFO","thread":"cromwell-system-akka.actor.default-dispatcher-4","logger":"akka.event.slf4j.Slf4jLogger","message":"Slf4jLogger started","context":"default"}
But logs saved in file in /var/log/containers/... have escaped quotes and makes them string instead of JSON which spoil indexing:
{"log":"{\"timestamp\":\"2016-11-03T15:45:07.976Z\",\"level\":\"INFO\",\"thread\":\"cromwell-system-akka.actor.default-dispatcher-4\",\"logger\":\"akka.event.slf4j.Slf4jLogger\",\"message\":\"Slf4jLogger started\",\"context\":\"default\"}\n","stream":"stdout","time":"2016-11-03T15:45:07.995443479Z"}
I'm trying to get logs looking like:
{
"log": {
"timestamp": "2016-11-03T15:45:07.976Z",
"level": "INFO",
"thread": "cromwell-system-akka.actor.default-dispatcher-4",
"logger": "akka.event.slf4j.Slf4jLogger",
"message": "Slf4jLogger started",
"context": "default"
},
"stream": "stdout",
"time": "2016-11-03T15: 45: 07.995443479Z"
}
Can you suggest me how to do it?
I ran into the same issue, however I'm using fluent-bit, the "C" version of fluentd (Ruby). Since this is an older issue, I'm answering for the benefit of others who find this.
In fluent-bit v0.13, they addressed this issue. You can now specify the parser to use through annotations. The parser can be configured to decode the log as json.
fluent-bit issue detailing problem
blog post about annotations for specifying the parser
json parser documentation - The docker container's logs come out as json. However, your logs are also json. So an additional decoder is needed.
The final parser with decoder looks like this:
[PARSER]
Name embedded-json
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=======|=========
Decode_Field_As escaped log do_next
Decode_Field_As json log

How do I get strings from a JSON file?

I'm writing a internationalized desktop program written in Vala where a use an extern JSON file to store a list of languages.
I'm using gettext for l10n so if I get the string from the json file and I do something like _(string_var) I could get the translated string. The problem is that I don't know how can I add the string to the pot file using xgettext or some similar tool.
Any idea??
If the tool jq (http://stedolan.github.io/jq/) is an option for you, below might work;
$ curl -s https://raw.githubusercontent.com/chavaone/gnomecat/master/data/languages.json | jq .languages[43].name
"English"
The solution I finally use was to modify the JSON file to use double quoted strings only when I wanted to translate that string. For example:
{
'code' : 'es',
'name' : "Spanish; Castilian",
'pluralform' : 'nplurals=2; plural=(n != 1);',
'default-team-email': 'gnome-es-list#gnome.org'
}
In the previous piece of JSON file the only string I wanted to translate was "Spanish; Castillian". Then in the POTFILES.in, I just use the gettext/quoted type.
# List of source files containing translatable strings.
# Please keep this file sorted alphabetically.
[encoding: UTF-8]
[type: gettext/quoted]data/languages.json
[type: gettext/quoted]data/plurals.json
[type: gettext/glade]data/ui/appmenu.ui
[...]

Can a JSON value contain a multiline string

I am writing a JSON file which would be read by a Java program. The fragment is as follows...
{
"testCases" :
{
"case.1" :
{
"scenario" : "this the case 1.",
"result" : "this is a very long line which is not easily readble.
so i would like to write it in multiple lines.
but, i do NOT require any new lines in the output.
I need to split the string value in this input file only.
such that I don't require to slide the horizontal scroll again and again while verifying the correctness of the statements.
the prev line, I have shown, without splitting just to give a feel of my problem"
}
}
}
Per the specification, the JSON grammar's char production can take the following values:
any-Unicode-character-except-"-or-\-or-control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
Newlines are "control characters", so no, you may not have a literal newline within your string. However, you may encode it using whatever combination of \n and \r you require.
The JSONLint tool confirms that your JSON is invalid.
And, if you want to write newlines inside your JSON syntax without actually including newlines in the data, then you're doubly out of luck. While JSON is intended to be human-friendly to a degree, it is still data and you're trying to apply arbitrary formatting to that data. That is absolutely not what JSON is about.
I'm not sure of your exact requirement but one possible solution to improve 'readability' is to store it as an array.
{
"testCases" :
{
"case.1" :
{
"scenario" : "this the case 1.",
"result" : ["this is a very long line which is not easily readble.",
"so i would like to write it in multiple lines.",
"but, i do NOT require any new lines in the output."]
}
}
}
}
Then join it back as the need arrives with
result.join(" ")
Not pretty good solution, but you can try the hjson tool. It allows you to write text multi-lined in editor and then converts it to the proper valid JSON format.
Note: it adds '\n' characters for the new lines, but you can simply delete them in any text editor with the "Replace all.." function.
I believe it depends on what json interpreter you're using... in plain javascript you could use line terminators
{
"testCases" :
{
"case.1" :
{
"scenario" : "this the case 1.",
"result" : "this is a very long line which is not easily readble. \
so i would like to write it in multiple lines. \
but, i do NOT require any new lines in the output."
}
}
}
As I could understand the question is not about how pass a string with control symbols using json but how to store and restore json in file where you can split a string with editor control symbols.
If you want to store multiline string in a file then your file will not store the valid json object. But if you use your json files in your program only, then you can store the data as you wanted and remove all newlines from file manually each time you load it to your program and then pass to json parser.
Or, alternatively, which would be better, you can have your json data source files where you edit a sting as you want and then remove all new lines with some utility to the valid json file which your program will use.