i want to use jq to split a very large json (>80GB) file into smaller parts (<1GB or a fixed number of lines).
I have the necessary statements together... I thought.
What do I do so far?
jq ". | length" z:\DOWNLOAD\rows.json
works!
Under Windows this should output the first two lines.
jq ".[0:1]" z:\DOWNLOAD\rows.json
but I'm getting an error
jq: error (at z:\DOWNLOAD\rows.json:589): Cannot index object with
object
What I also haven't understood the --Steam switch
yes, there are a bunch answers, but they do not work under windows (double quotes instead of apostrophe, but see error above)
[{"node":"http://www.wikidata.org/entity/Q952111","Unterklasse_von":"http://www.wikidata.org/entity/Q2095"},{"node":"http://.....
Related
I'm doing the following to capture some ADO JSON data:
iteration="$(az boards iteration team list --team Test --project Test --timeframe current)"
Normally, the output of that command contains a JSON key/value pair like the following:
"path": "Test\\Sprint1"
But after capturing the STDOUT into that iteration variable, if I do
echo "$iteration"
That key/value pair becomes
"path": "Test\Sprint1"
And if I attempt to use jq on that output, it breaks because it's not recognized as valid JSON any longer. I'm very unfamiliar with Bash. How can I get that JSON to remain valid all the way through?
As already commented by markp-fuso:
It looks like your echo command is interpreting the backslashes. You can confirm this by running echo 'a\\b' and looking at the output.
The portable way to deal with such problems is to use printf instead of echo:
printf %s\\n "$iteration"
I am trying to create a transfer job on the windows commandline with
bq mk --transfer_config --data_source=amazon_s3
--target_dataset=Usage --display_name='s3_transfer_installs_global_in_v0_test'
--params='{"data_path_template":"mybucket", "destination_table_name_template":"in_table", "file_format":"CSV", "max_bad_records":"0", "skip_leading_rows":"1", "allow_jagged_rows":"false", "allow_quoted_newlines":"true", "access_key_id":"dfadfadf", "secret_access_key":"sdfsfsdfsdf"}'
but I keep getting variations of the error
Too many positional args, still have ['"allow_quoted_newlines":"true","access_key_id":',...
Output from --apilog was also not enlightening.
My JSON validates, but there might still be escape characters needed maybe?
Any help very much appreciated, have been shuffling around quotation marks and backslashes for two hours now...
I got the same error as you when running your query.
I tried to replace double quotes with single quotes in --params option and it seems to be working. Try the following:
bq mk --transfer_config --data_source=amazon_s3 --target_dataset=Usage --display_name='s3_transfer_installs_global_in_v0_test' --params="{'data_path_template':'mybucket', 'destination_table_name_template':'in_table', 'file_format':'CSV', 'max_bad_records':'0', 'skip_leading_rows':'1', 'allow_jagged_rows':'false', 'allow_quoted_newlines':'true', 'access_key_id':'dfadfadf', 'secret_access_key':'sdfsfsdfsdf'}"
I also tried to run the original command in Windows PowerShell and it worked without any changes.
I think the problem is in Windows cmd...
This question already has an answer here:
jq special characters in nested keys
(1 answer)
Closed 3 years ago.
I am trying to use the jq command line JSON processor https://shapeshed.com/jq-json/ (which works great) to process a JSON file that seems to have been made using some poor choices.
Normally your id and value in the JSON file would not contain any periods such as:
{"id":"d9s7g9df7sd9","name":"Tacos"}
To get Tacos from the file you would do the following in bash:
echo $json | jq -r '.name'
This will give you Tacos (There may be some extra code missing from that example but you get the point.)
I have a JSON file that looks like this:
{"stat.blah":123,"stat.taco":495,"stat.yum... etc.
Notice how they decided to use a period in the identifying field associated with the value? This makes using jq very difficult because it associates the period as a separator to dig down into child values in the JSON. Sure, I could first load my file, replace all "." with "_" and that would fix the problem, but this seems like a really dumb and hackish solution. I have no way to change how the initial JSON file is generated. I just have to deal with it. Is there a way in bash I can do some special escape to make it ignore the period?
Thanks
Use generic object index syntax, e.g:
.["stat.taco"]
If you use the generic object syntax, e.g. .["stat.taco"], then chaining is done either using pipes as usual, or without the dot, e.g.
.["stat.taco"]["inner.key"]
If your jq is sufficiently recent, then you can use the chained-dot notation by quoting the keys with special characters, e.g.
."stat.taco"."inner.key"
You can also mix-and-match except that expressions such as: .["stat.taco"].["inner.key"] are not (as of jq 1.6) supported.
Another department continually updates a JSON file that I then query. Its format is three lists of similar-looking dictionaries:
{
"levels":
[
{"a":1, "b":False, "c":"2012", "d":"2017"}
,{"a":2, "b":True, "c":"2013", "d":"9999"}
,...
]
,"costs":
[
{"e":12, "f":"foo", "g":"blarg", "h":"2015", "i":"2018"}
,{"e":-3, "f":"foo", "g":"glorb", "h":"2013", "i":"9999"}
,...
]
,"recipes":
[
{"j":"BAZ", "k":["blarg","glorb","bleeg"], "l":"dill", "m":"2016", "n":"2017"}
,{"j":"BAZ", "k":["blarg","bleeg"], "l":"dill", "m":"2017", "n":"9999"}
,...
]
} # line 3943 (see below)
Recently, my simple jq queries like
jq '.["recipes"][] | select(.l | test("ill"))' < jsonfile
stopped returning all of the results they should (e.g. returning only one of the two "dill" lines above) and started printing this error message:
jq: error (at <stdin>:3943): null (null) cannot be matched, as it is not a string
Line 3943 mentioned in the error is the final line of the file. Queries against the "levels" and "costs" sections of the file continue to work like normal; it's only the "recipes" section of the file that is breaking, as though jq thinks the closing brace of the file is still part of the "recipes" section.
To me this suggests there's been a formatting change or error in the last section of the file. However, software other than jq (e.g. python) doesn't report any problems parsing it. Before I start going through the input line by line ... does this error message indicate anything obvious to a jq expert?
Alas, I do not keep old versions of the file around for comparison. (I think I will start today.)
(self-answering after a bit of investigating)
I think there was no formatting error or change in formatting in the input.
I don't know why my query syntax did not encounter errors previously (maybe I just did not notice), but it seems that the entries in the "recipes" section often do not contain an "l" attribute, and jq will cease processing as soon as it encounters one that does not.
I also don't know why jq does not generate the same error message for every record that lacks that attribute, nor why it waits to the final line of the input to generate the single message. (Maybe that behavior is documented somewhere.)
In any case, I fixed the error (not just the message, but also the failure to display all relevent records) by testing for the presence of the attribute first:
jq '.["recipes"][] | select(has("l") and (.l | test("ill")))' < jsonfile
I am using a simple program to read CSV file, somehow I noticed when I created a CSV using EXCEL or windows based computer go library fails to read it. even when I use cat command it only shows me last line on the terminal. It always results in this error extraneous " in field.
I researched somewhat than I found it is somewhat related to carriage return differences between OS.
But I really want to ask how to make a generic csv reader. I tried reading the same csv using pandas and it was reading successfully. But i am not been able to achieve this using my Go code.
Also screen shot of correct csv Is here
Your file clearly shows that you've got an extra quote at the end of the content. While programs like pandas may be fine with that, I assume it's not valid csv so go does return an error.
Quick example of what's wrong with your data: https://play.golang.org/p/KBikSc1nzD
Update: After your update and a little bit of searching, I have to apoligize, the carriage return does matter and seems to be tha main culprit here, Go seems to be ok handling the \r\n windows variant but not the \r one. In that case what you can do is wrap the bytes.Reader into a custom reader that replaces the \r byte with the \n byte.
Here's an example: https://play.golang.org/p/vNjzwAHmtg
Please note, that the example is just that, an example, it's not handling all the possible cases where \r might be a legit byte.