Legal change to JSON input invalidates simple jq - json

Another department continually updates a JSON file that I then query. Its format is three lists of similar-looking dictionaries:
{
"levels":
[
{"a":1, "b":False, "c":"2012", "d":"2017"}
,{"a":2, "b":True, "c":"2013", "d":"9999"}
,...
]
,"costs":
[
{"e":12, "f":"foo", "g":"blarg", "h":"2015", "i":"2018"}
,{"e":-3, "f":"foo", "g":"glorb", "h":"2013", "i":"9999"}
,...
]
,"recipes":
[
{"j":"BAZ", "k":["blarg","glorb","bleeg"], "l":"dill", "m":"2016", "n":"2017"}
,{"j":"BAZ", "k":["blarg","bleeg"], "l":"dill", "m":"2017", "n":"9999"}
,...
]
} # line 3943 (see below)
Recently, my simple jq queries like
jq '.["recipes"][] | select(.l | test("ill"))' < jsonfile
stopped returning all of the results they should (e.g. returning only one of the two "dill" lines above) and started printing this error message:
jq: error (at <stdin>:3943): null (null) cannot be matched, as it is not a string
Line 3943 mentioned in the error is the final line of the file. Queries against the "levels" and "costs" sections of the file continue to work like normal; it's only the "recipes" section of the file that is breaking, as though jq thinks the closing brace of the file is still part of the "recipes" section.
To me this suggests there's been a formatting change or error in the last section of the file. However, software other than jq (e.g. python) doesn't report any problems parsing it. Before I start going through the input line by line ... does this error message indicate anything obvious to a jq expert?
Alas, I do not keep old versions of the file around for comparison. (I think I will start today.)

(self-answering after a bit of investigating)
I think there was no formatting error or change in formatting in the input.
I don't know why my query syntax did not encounter errors previously (maybe I just did not notice), but it seems that the entries in the "recipes" section often do not contain an "l" attribute, and jq will cease processing as soon as it encounters one that does not.
I also don't know why jq does not generate the same error message for every record that lacks that attribute, nor why it waits to the final line of the input to generate the single message. (Maybe that behavior is documented somewhere.)
In any case, I fixed the error (not just the message, but also the failure to display all relevent records) by testing for the presence of the attribute first:
jq '.["recipes"][] | select(has("l") and (.l | test("ill")))' < jsonfile

Related

using jq to split huge json UNDER windows

i want to use jq to split a very large json (>80GB) file into smaller parts (<1GB or a fixed number of lines).
I have the necessary statements together... I thought.
What do I do so far?
jq ". | length" z:\DOWNLOAD\rows.json
works!
Under Windows this should output the first two lines.
jq ".[0:1]" z:\DOWNLOAD\rows.json
but I'm getting an error
jq: error (at z:\DOWNLOAD\rows.json:589): Cannot index object with
object
What I also haven't understood the --Steam switch
yes, there are a bunch answers, but they do not work under windows (double quotes instead of apostrophe, but see error above)
[{"node":"http://www.wikidata.org/entity/Q952111","Unterklasse_von":"http://www.wikidata.org/entity/Q2095"},{"node":"http://.....

Retrieving the first entity out of several ones

I am a rank beginner with jq, and I've been going through the tutorial, but I think there is a conceptual difference I don't understand. A common problem I encounter is that a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
In the tutorial, they do this:
# We can use jq to extract just the first commit.
$ curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.[0]'
Here is an example with one object - here, I'd like to return the whole array (just like my_array=['foo']; my_array[0] would return foo in Python).
wget https://hacker-news.firebaseio.com/v0/item/8863.json
I can access and pretty-print the whole thing with .
$ cat 8863.json | jq '.'
$
{
"by": "dhouston",
"descendants": 71,
"id": 8863,
"kids": [
9224,
...
8876
],
"score": 104,
"time": 1175714200,
"title": "My YC app: Dropbox - Throw away your USB drive",
"type": "story",
"url": "http://www.getdropbox.com/u/2/screencast.html"
}
But trying to get the first element fails:
$ cat 8863.json| jq '.[0]'
$ jq: error (at <stdin>:0): Cannot index object with number
I get the same error jq '.[0]' 8863.json, but strangely echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0. What is the difference? Also, is this not the correct way to get the zeroth member of the JSON?
I've looked at other SO posts with this error message and at the manual, but I'm still confused. I think of the file as an array of JSON objects, and I'd like to get the first. But it looks like jq works with something called a "stream", and does operations on all of it (say, return one given field from every object).
Clarification:
Let's say I have 2 objects in my JSON:
{
"by": "pg",
"id": 160705,
"poll": 160704,
"score": 335,
"text": "Yes, ban them; I'm tired of seeing Valleywag stories on News.YC.",
"time": 1207886576,
"type": "pollopt"
}
{
"by": "dpapathanasiou",
"id": 16070,
"kids": [
16078
],
"parent": 16069,
"text": "Dividends don't mean that much: Microsoft in its dominant years (when they had 40%-plus margins and were raking in the cash) never paid a dividend (they did so only recently).",
"time": 1177355133,
"type": "comment"
}
How would I get the entire first object (lines 1-9) with jq?
Cannot index object with number
This error message says it all, you can't index objects with numbers. If you want to get the value of by field, you need to do
jq '.by' file
Wrt
echo 8863.json | jq '.[0]' gives me parse error: Invalid numeric literal at line 2, column 0.
It's normal since you didn't specify -R/--raw-input flag, and so jq sees the shell string 8863.json as a JSON string, and one cannot apply array indexing to JSON strings. (To get the first character as a string, you'd write .[0:1].)
If your input file consists of several separate entities, to get the first one:
jq -n 'input' file
or,
jq -n 'first(inputs)' file
To get nth (let's say 5th for example):
jq -n 'nth(5; inputs)' file
a large JSON file will contain many objects, each of which is quite big, and I'd like to view the first complete object, to see which fields exist, what types, how much nesting, etc.
As implied in #OguzIsmail's response, there are important differences between:
- a JSON file (i.e, a file containing exactly one JSON entity);
- a file containing a sequence (i.e., stream) of JSON entities;
- a file containing an array of JSON entities.
In the first two cases, you can write jq -n input to select the first entity, and in the case of an array of entities, jq .[0] will suffice.
(In JSON-speak, a "JSON object" is a kind of dictionary, and is not to be confused with JSON entities in general.)
If you have a bunch of JSON objects (whether as a stream or array or whatever), just looking at the first often doesn't really give an accurate picture of all them. For getting a bird's eye view of a bunch of objects, using a "schema inference engine" is often the way to go. For this purpose, you might like to consider my schema.jq schema inference engine. It's usually very simple to use but of course how you use it will depend on whether you have a stream or array of JSON entities. For basic details, see https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed; for related topics (e.g. verification), see the entry for JESS at https://github.com/stedolan/jq/wiki/Modules
Please note that schema.jq infers a structural schema that mirrors the entities under consideration. Such structural schemas have little in common with JSON Schema schemas, which you might also like to consider.

I am using identical syntax in jq to change JSON values, yet one case works while other turns bash interactive, how can I fix this?

I am trying to update a simple JSON file (consists of one object with several key/value pairs) and I am using the same command yet getting different results (sometimes even having the whole json wiped with the 2nd command). The command I am trying is:
cat ~/Desktop/config.json | jq '.Option = "klay 10"' | tee ~/Desktop/config.json
This command perfectly replaces the value of the minerOptions key with "klay 10", my intended output.
Then, I try to run the same process on the newly updated file (just value is changed for that one key) and only get interactive terminal with no result. ps unfortunately isn't helpful in showing what's going on. This is what I do after getting that first command to perfectly change the value of the key:
cat ~/Desktop/config.json | jq ‘.othOptions = "-epool etc-eu1.nanopool.org:14324 -ewal 0xc63c1e59c54ca935bd491ac68fe9a7f1139bdbc0 -mode 1"' | tee ~/Desktop/config.json
which I would have expected would replace the othOptions key value with the assigned result, just as the last did. I tried directly sending the stdout to the file, but no result there either. I even tried piping one more time and creating a temp file and then moving it to change to original, all of these, as opposed to the same identical command, just return > and absolutely zero output; when I quit the process, it is the same value as before, not the new one.
What am I missing here that is causing the same command with just different inputs (the key in second comes right after first and has identical structure, it's not creating an object or anything, just key-val pair like first. I thought it could be tee but any other implementation like a passing of stdout to file produces the same constant > waiting for a command, no user.
I genuinely looked everywhere I could online for why this could be happening before resorting to SE, it's giving me such a headache for what I thought should be simple.
As #GordonDavisson pointed out, using tee to overwrite the input file is a (well-known - see e.g. the jq FAQ) recipe for disaster. If you absolutely positively want to overwrite the file unconditionally, then you might want to consider using sponge, as in
jq ... config.json | sponge config.json
or more safely:
cp -p config.json config.json.bak && jq ... config.json | sponge config.json
For further details about this and other options, search for ‘sponge’ in the FAQ.

Read a log file in R

I'm trying to read a log file in R.
It looks like an extract from a JSON file to me, but when trying to read it using jsonlite I get the following error message: "Error: parse error: trailing garbage".
Here is how my log file look like:
{"date":"2017-05-11T04:37:15.587Z","userId":"admin","module":"Quote","action":"CreateQuote","identifier":"-.admin1002"},
{"date":"2017-05-11T05:12:24.939Z","userId":"a145fhyy","module":"Quote","action":"Call","identifier":"RunUY"},
{"date":"2017-05-11T05:12:28.174Z","userId":"a145fhyy","license":"named","usage":"External","module":"Catalog","action":"OpenCatalog","identifier":"wks.klu"},
Has you can see, the column name is precised directly in front of the content for each line (e.g: "date": or "action":)
And some line can skip some columns and add some other.
What I want to get as output would be to have 7 columns with the corresponding data filled in each:
date
userId
license
usage
module
action
identifier
Does anyone has a suggestion about how to get there?
Thanks a lot in advance
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Thanks everyone for your answers. Here are some precisions about my issue:
The data that I gave as example in an extract of one of my log files. I've got a lot of them that I need to read as one unique table.
I haven't added any commas or anything to it.
#r2evans
I've tried the following:
Log3 <-read.table("/Projects/data/analytics.log.agregated.2017-05‌​-11.log") jsonlite::stream_in(textConnection(gsub(",$","",Log3)))
It returns the following error:
Error: lexical error: invalid char in json text.
c(17, 18, 19, 20, 21, 22, 23, 2
(right here) ------^
I'm not sure how to use sed -e 's/,$//g' infile > outfile and Sys.which("sed"), that something I'm not familiar with. I'm looking into it, but if you have anymore precisions to give me about the usage of it that would be great.
I have saved your example as a file "test.json" and was able to read and parse it like this:
library(jsonlite)
rf <- read_file("test.json")
rfr <- gsub("\\},", "\\}", rf)
data <- stream_in(textConnection(rfr))
It parses and simplifies into a neat data frame exactly like you want. What I do is look for "}," rather than ",$", because the very last comma is not (necessarily) followed by a newline character(s).
However, this might not be the best solution for very large files.. For those you may need to first look for a way to modify the text file itself by getting rid of the commas. Or, if that's possible, ask the people who exported this file to export it in a normal ndjson format:-)

How to add the SwingLibrary plugin to RobotFramework?

I'm trying to execute the SwingLibrary demo available in https://github.com/robotframework/SwingLibrary/wiki/SwingLibrary-Demo
After setting everything up (Jython, RobotFramework, demo app), I can run the following command:
run_demo.py startapp
, and it works (the demo app starts up).
Now if I try to run the sample tests, it fails:
run_demo.py example.txt
[ ERROR ] Error in file '/home/user1/python-scripts/gui_automation/sample-text.txt': Non-existing setting 'Library SwingLibrary'.
[ ERROR ] Error in file '/home/user1/python-scripts/gui_automation/sample-text.txt': Non-existing setting 'Suite Setup Start Test Application'.
==============================================================================
Sample-Text
==============================================================================
Test Add Todo Item | FAIL |
No keyword with name 'Insert Into Text Field description ${arg}' found.
------------------------------------------------------------------------------
Test Delete Todo Item | FAIL |
No keyword with name 'Insert Into Text Field description ${arg}' found.
------------------------------------------------------------------------------
Sample-Text | FAIL |
2 critical tests, 0 passed, 2 failed
2 tests total, 0 passed, 2 failed
==============================================================================
Output: /home/user1/python-scripts/gui_automation/results/output.xml
Log: /home/user1/python-scripts/gui_automation/results/log.html
Report: /home/user1/python-scripts/gui_automation/results/report.html
I suspect that it cannot find swinglibrary.jar, and therefore my plugin installation is probably messed up.
Any ideas?
Take a look at these error messages in the report:
[ ERROR ] Error in file '...': Non-existing setting 'Library SwingLibrary'.
[ ERROR ] Error in file '...': Non-existing setting 'Suite Setup Start Test Application'.
The first rule of debugging is to always assume error messages are telling you the literal truth.
They are telling you that you have an unknown setting. It thinks you are using a setting literally named "Library SwingLibrary" and one named "Suite Setup Start Test". Those are obviously incorrect setting names. The question is, why is it saying that?
My guess is that you are using the space-separated text format, and you only have a single space between "Library" and "SwingLibrary". Because there is one space, robot thinks that whole line is in the first column of the settings table, and whatever is in the first column is treated as the setting name.
The fix should be as simple as inserting two or more spaces after "Library", and two or more spaces after "Suite Setup".
This type of error is why I always recommend using the pipe-separated format. It makes the boundaries between cells much easier to see.