Efficiently get the first record of a JSONL file

Efficiently get the first record of a JSONL file - json

Is it possible to efficiently get the first record of a JSONL file without consuming the entire stream / file? One way I have been able to inefficiently do so is the following:
curl -s http://example.org/file.jsonl | jq -s '.[0]'
I realize that head could be used here to extract the first line, but assume that the file may not use a newline as the record separator and may simply be concatenated objects or arrays.

If I'm understanding correctly, the JSONL format just returns a stream of JSON objects which jq handles quite nicely. Best case scenario that you wanted the first item, you could just utilize the input filter to grab the first item.
I think you could just do this:
$ curl -s http://example.org/file.jsonl | jq -n 'input'
You need the null input -n to not process the input immediately then input just gets one input from the stream. No need to go through the rest of the input stream.

Related

Finding the location (line, column) of a field value in a JSON file

Consider the following JSON file example.json:
{
"key1": ["arr value 1", "arr value 2", "arr value 3"],
"key2": {
"key2_1": ["a1", "a2"],
"key2_2": {
"key2_2_1": 1.43123123,
"key2_2_2": 456.3123,
"key2_2_3": "string1"
}
}
}
The following jq command extracts a value from the above file:
jq ".key2.key2_2.key2_2_1" example.json
Output:
1.43123123
Is there an option in jq that, instead of printing the value itself, prints the location (line and column, start and end position) of the value within a (valid) JSON file, given an Object Identifier-Index (.key2.key2_2.key2_2_1 in the example)?
The output could be something like:
some_utility ".key2.key2_2.key2_2_1" example.json
Output:
(6,25) (6,35)

Given JSON data and a query, there is no
option in jq that, instead of printing the value itself, prints the location
of possible matches.
This is because JSON parsers providing an interface to developers usually focus on processing the logical structure of a JSON input, not the textual stream conveying it. You would have to instruct it to explicitly treat its input as raw text, while properly parsing it at the same time in order to extract the queried value. In the case of jq, the former can be achieved using the --raw-input (or -R) option, the latter then by parsing the read-in JSON-encoded string using fromjson.
The -R option alone would read the input linewise into an array of strings, which would have to be concatenated (e.g. using add) in order to provide the whole input at once to fromjson. The other way round, you could also provide the --slurp (or -s) option which (in combination with -R) already concatenates the input to a single string which then, after having parsed it with fromjson, would have to be split again into lines (e.g. using /"\n") in order to provide row numbers. I found the latter to be more convenient.
That said, this could give you a starting point (the --raw-output (or -r) option outputs raw text instead of JSON):
jq -Rrs '
"\(fromjson.key2.key2_2.key2_2_1)" as $query # save the query value as string
| ($query | length) as $length # save its length by counting its characters
| ./"\n" | to_entries[] # split into lines and provide 0-based line numbers
| {row: .key, col: .value | indices($query)[]} # find occurrences of the query
| "(\(.row),\(.col)) (\(.row),\(.col + $length))" # format the output
'
(5,24) (5,34)
Demo
Now, this works for the sample query, how about the general case? Your example queried a number (1.43123123) which is an easy target as it has the same textual representation when encoded as JSON. Therefore, a simple string search and length count did a fairly good job (not a perfect one because it would still find any occurrence of that character stream, not just "values"). Thus, for more precision, but especially with more complex JSON datatypes being queried, you would need to develop a more sophisticated searching approach, probably involving more JSON conversions, whitespace stripping and other normalizing shenanigans. So, unless your goal is to rebuild a full JSON parser within another one, you should narrow it down to the kind of queries you expect, and compose an appropriately tailored searching approach. This solution provides you with concepts to simultaneously process the input textually and structurally, and with a simple search and ouput integration.

docker and format json

I'm trying to get usable json from the docker cli, however it seems it will only produce json for individual items, and not the complete result, as a whole.
For example, running docker container ls -a --format="{{ json .Names }}" produces:
"hopeful_payne"
"trusting_turing"
"stupefied_morse"
"unruffled_noyce"
"pensive_fermi"
"objective_neumann"
"confident_bhaskara"
"unruffled_cray"
"epic_newton"
"boring_bartik"
"priceless_sinoussi"
"naughty_grothendieck"
"hardcore_bose"
"sad_jones"
"optimistic_napier"
"trusting_stallman"
"xenodochial_dijkstra"
"pedantic_cocks"
The above is not json.
How can I produce a result that is, ideally, a json array?

I think you cannot do this using docker only.
The command-line's format function is effectively taking each input line (one for each container) and applying the Go template to it. So you need another tool to aggregate the lines into a JSON array.
One way that you can achieve your goal is using the excellent jq tool:
docker container ls --format="{\"name\":\"{{.Names}}\"}" --all | jq --slurp
This generates each container line as a JSON string: {"name": "[VALUE]"} and then uses jq to slurp them into a JSON array.
A challenge doing this directly in bash is JSON's stricture that the last element in a list can't be terminated with a ,. So, the following simple bash script generates invalid JSON and you'd need extra logic to remove it (or better yet, not add the last one):
echo "[$(for CONTAINER in $(docker container ls --format="{{.Names}}" --all); do echo "{\"name\":\"${CONTAINER}\"},"; done;)]"

What are you trying to do with these JSON responses? It might be easier just to talk directly to the Docker API, which will give you JSON responses directly. E.g., to get a list of containers:
curl --unix-socket /var/run/docker.sock http://localhost/v1.24/containers/json
You can, as DazWilkin suggested, use jq for filtering JSON on the command line. E.g., if we want a list of container names:
curl --unix-socket /var/run/docker.sock http://localhost/v1.24/containers/json |
jq '[.[]|.Names]'
You can find Docker API documentation here.

One way to think of the output is that it's JSONL: http://jsonlines.org/
This Docker output is JSON, per line. Since you asked for a single attribute -- just the name -- you're simply getting a string back. But, notice it's quoted. It's technically JSON. It may make more sense if you update your format to {{ json . }}, which will then output lines that look more like the JSON you're expecting.
However, it's still a JSON document per line, so you'd have to process each line as its own document.

Take only first json object from stream with jq, do not touch rest

This thread Split multiple input JSONs with jq helped me to solve one problem. But not other.
mkfifo xxs
exec 3<>xxs ## keep open file descriptor
echo '{"a":0,"b":{"c":"C"}}{"x":33}{"asd":889}' >&3
jq -nc input <&3 ## prints 1st object '{"a":0,"b":{"c":"C"}}' and reads out the rest
cat <&3 ## prints nothing
My problem is to make jq stop reading after first object is read, and do not touch other data in stream (fifo). So cat should show the rest of data: '{"x":33}{"asd":889}'.
How to achive that with jq?

jq doesn't have to read to the whole input to get the first value. This can be verified by feeding an infinite sequence of values to jq which takes the first value and exit:
yes '{}' | jq -n input
Though, the question assumes a bit more. Namely that jq can read a single JSON value from a named pipe and stop reading "right at that point" so the rest can be then read by cat.
mkfifo xxs
exec 3<>xxs ## keep open file descriptor
echo '1 2 3' >&3
jq -nc input <&3 >first ## Get first value
cat <&3 >rest ## Nothing to show jq read all data
This gets more complicated as we don't know where that first value ends and most Unix programs (jq included) read input in larger chunks to limit the number of read syscalls.
jq would need an option to read its input one byte at a time. And, while this could be implemented, it may be of limited utility.
The closest thing I can think of is to output the first value to
stderr and the rest to stdout.
jq -n 'input | stderr | inputs' <&3 2>first 1>rest
Input is processed in a streaming fashion (one input value at a time) and you can pipe stdout and/or stderr to something else. Though the whole input has to be valid JSON and it will be prettified while passing through jq (unlike with cat above).
If reading from a named pipe is not a requirement and you can afford to read the input from a file. Then, you can access the first value and the rest in two separate invocations.
echo '1 2 3' > in
jq -n 'input' in >first
jq -n 'input | inputs' in >rest
If stream processing is the goal, it may also be possible to do everything in a single jq script that processes its input incrementally.
This all assumes top-level values. Though, jq can also process nested structures incrementally using the --stream option.

If you want to partially read a stream you will probably need to do it yourself.
You could write a trivial C program to do this.
I doubt there are any off-the-shelf parsers you can find to specify stopping the read of a stream after n objects.
As mentioned before, most stream readers will use stdio and read all they can into a buffer.

Oracle SQLcl: Spool to json, only include content in items array?

I'm making a query via Oracle SQLcl. I am spooling into a .json file.
The correct data is presented from the query, but the format is strange.
Starting off as:
SET ENCODING UTF-8
SET SQLFORMAT JSON
SPOOL content.json
Follwed by a query, produces a JSON file as requested.
However, how do I remove the outer structure, meaning this part:
{"results":[{"columns":[{"name":"ID","type":"NUMBER"},
{"name":"LANGUAGE","type":"VARCHAR2"},{"name":"LOCATION","type":"VARCHAR2"},{"name":"NAME","type":"VARCHAR2"}],"items": [
// Here is the actual data I want to see in the file exclusively
]
I only want to spool everything in the items array, not including that key itself.
Is this possible to set as a parameter before querying? Reading the Oracle docs have not yielded any answers, hence asking here.

Thats how I handle this.
After output to some file, I use jq command to recreate the file with only the items
ssh cat file.json | jq --compact-output --raw-output '.results[0].items' > items.json
`
Using this library = https://stedolan.github.io/jq/

Get pair of value from json file by sed

I want to get value from JSON file:
Example:
{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"},{"name":"ghprbPullId","value":"226"},{"name":"ghprbTargetBranch","value":"master"},
My expect is :
I want to get test#gmail.com, 226 and master.

sed is the wrong tool for processing JSON.
Assuming you have a file tmp.json with valid JSON like
[{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"},
{"name":"ghprbPullId","value":"226"},
{"name":"ghprbTargetBranch","value":"master"}]
you can use jq '.[].value' tmp.son.
If the file instead contains
{"name":"ghprbActualCommitAuthorEmail","value":"test#gmail.com"}
{"name":"ghprbPullId","value":"226"}
{"name":"ghprbTargetBranch","value":"master"}
(i.e., just a stream of 3 separate JSON objects, you could use jq '.value' tmp.json, as jq will apply the filter to each object in succession. You can also use jq -s '.[].value' tmp.son, where the -s flag tells jq to read the entire input into an array first. This lets you use the same filter in both cases.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Efficiently get the first record of a JSONL file - json

Related

Finding the location (line, column) of a field value in a JSON file

docker and format json

Take only first json object from stream with jq, do not touch rest

Oracle SQLcl: Spool to json, only include content in items array?

Get pair of value from json file by sed

Categories

Resources