im trying to parse a database file retrieved from a website via curl, however I having trouble trying to figure out how to get the values.
This is an example of the file
{"Databasename":[{"Var1":"Var1Value","Var2":"Var2Value","Var3":"Var3Value"},{"Var1b":"Var1bValue","Var2b":"Var2bValue","Var3b":"Var3bValue"}],"foldername":{"dbTblcountvar":"dbTblcountvalue","filecountsize":"filecountsizvalue"}}
and with line break for better readability
{
"Databasename":
[
{
"Var1":"Var1aValue",
"Var2":"Var2aValue",
"Var3":"Var3aValue"
},
{
"Var1":"Var1bValue",
"Var2":"Var2bValue",
"Var3":"Var3bValue"
},
{
"Var1":"Var1cValue",
"Var2":"Var2cValue",
"Var3":"Var3cValue"
}
],
"foldername":
{
"dbTblcountvar":"dbTblcountvalue",
"filecountsize":"filecountsizvalue"
}
}
asuming Var2 is always constant, how can i get its value? (Var2aValue,Var2bValue,Var2cValue,Var2dValue,.....)
In the example above the value im trying to get is an id for a file i need to send back to the server to download the file, and perform other operations on it.
Thanks
cat DownloadedFile.Ext | perl -pe 's/"Var2[abc]?":"(.+?)(?<![\\])"/\n\1\n/g' | grep -vPe '(?<!\\)"'
Those commands first put the Var2 (with optional a, b or c after) on a new line, then filter all lines that have a ".
I suppose that is a json file, so I avoid the matching of escaped " with this part of the regexp:
(?<!\\)
Related
I'm trying to write a simple JSON path query
oc get pa -o jsonpath='{range.data[*]}{#.data.vpa.pa\.poc\.hpa\\/value}{"\n"}{end}'
"data" [
{
"vpa"
{
"pa.poc.hpa/value" : 20
}
}
]
from above JSON trying to get 20
"pa.poc.hpa/value" : 20
i'm trying to pull 20 using below, but not getting value . giving empty results
pa\.poc\.hpa\\/value
I don't recognize the {} syntax you're using.
A proper JSON Path would be
$.data[*].vpa['pa.poc.hpa/value']
The [*] would search all of the items in the data array.
Because you have a non-standard property name pa.poc.hpa/value you need to use the bracket syntax with the property name in quotes.
I'm trying to get python to create a json formatted like :
[
{
"machine_working": true
},
{
"MachineName": "TBL165-169",
"MachineType": "Rig Test"
}
]
However, i can seam to do it, this is the code i have currently but its giving me error
this_is_a_dict_too=[]
this_is_a_dict_too = dict(State="on",dict(MachineType="machinetype1",MachineName="MachineType2"))
File "c:\printjson.py", line 40
this_is_a_dict_too = dict(Statedsf="test",dict(MachineType="Rig Test",MachineName="TBL165-169")) SyntaxError: non-keyword arg after
keyword arg
this_is_a_dict_too = [dict(machine_working=True),dict(MachineType="machinetype1",MachineName="MachineType2")]
print(this_is_a_dict_too)
You are trying to make dictionary in dictionary, the error message say that you try to add element without name (corresponding key)
dict(a='b', b=dict(state='on'))
will work, but
dict(a='b', dict(state='on'))
won't.
The thing that you presented is list, so you can use
list((dict(a='b'), dict(b='a')))
Note that example above use two dictionaries packed into tuple.
or
[ dict(a='b'), dict(b='a') ]
If I have some sample data, how do I put it into SQLite (preferably fully automated)?
{"uri":"/","user_agent":"example1"}
{"uri":"/foobar","user_agent":"example1"}
{"uri":"/","user_agent":"example2"}
{"uri":"/foobar","user_agent":"example3"}
I found the easiest way to do this is by using jq and CSV as an intermediary format.
Getting the CSV
First write your data to a file.
I will assume data.json here.
Then construct the header using jq:
% head -1 data.json | jq -r 'keys | #csv'
"uri","user_agent"
The head -1 is because we only want one line.
jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV.
We then call the internal function keys to get the keys of the input as an array.
This we send to the #csv formatter which outputs us a single string with the headers in quoted CSV format.
We then need to construct the data.
% jq -r 'map(tostring) | #csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"
We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […].
This basically converts our dictionary to an array of keys.
Sent to the #csv formatter, we again get some CSV.
Putting it all together we get a single one-liner in the form of:
% (head -1 data.json | jq -r 'keys | #csv' && jq -r 'map(tostring) | #csv' < data.json) > data.csv
If you need to convert the data on the fly, i.e. without a file, try this:
% cat data.json | (read -r first && jq -r '(keys | #csv),(map(tostring) | #csv)' <<<"${first}" && jq -r 'map(tostring) | #csv')
Loading it into SQLite
Open an SQLite database:
sqlite3 somedb.sqlite
Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):
.mode csv
.import data.csv my_table
Now close the shell and open it again for a clean environment.
You can now easily SELECT from the database and do whatever you want to.
Putting it all together
Have an asciinema recording right there:
Edits
Edit:
As pointed out (thanks #Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format.
jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json.
If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.
Edit 2:
Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.
Edit 3:
If you hit any obstacles, try using keys_unsorted instead of keys.
I haven't tested that myself (I kind of assume my columns were already sorted), however #Kyle Barron reports that this was needed.
Edit 4:
As pointed out by youngminz in the comment below the original command fails when working with non-{number,string} values like nested lists.
The command has been updated (with a slightly adapted version from the comment, map() – unlike map_values() converts objects to their keys the same as [.[]], making the map more readable).
Keys remain unaffected, if you really have complex types as keys (which may not even conform to JSON, but I'm too lazy to look it up right now) you can do the same for the key-related mappings.
A way do this without CSV or a 3rd party tool is to use the JSON1 extension of SQLite combined with the readfile extension that is provided in the sqlite3 CLI tool. As well as overall being a "more direct" solution, this has the advantage of handling JSON NULL values more consistently than CSV, which will otherwise import them as empty strings.
If the input file is a well-formed JSON file, e.g. the example given as an array:
[
{"uri":"/","user_agent":"example1"},
{"uri":"/foobar","user_agent":"example1"},
{"uri":"/","user_agent":"example2"},
{"uri":"/foobar","user_agent":"example3"}
]
Then this can be read into the corresponding my_table table as follows. Open the SQLite database file my_db.db using the sqlite3 CLI:
sqlite3 my_db.db
then create my_table using:
CREATE TABLE my_table(uri TEXT, user_agent TEXT);
Finally, the JSON data in my_data.json can be inserted into the table with the CLI command:
INSERT INTO my_table SELECT
json_extract(value, '$.uri'),
json_extract(value, '$.user_agent')
FROM json_each(readfile('my_data.json'));
If the initial JSON file is newline separated JSON elements, then this can be converted first using jq using:
jq -s <my_data_raw.json >my_data.json
It's likely there is a way to do this directly in SQLite using JSON1, but I didn't pursue that given that I was already using jq to massage the data prior to import to SQLite.
sqlitebiter appears to provide a python solution:
A CLI tool to convert CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/TSV/Google-Sheets to a SQLite database file. http://sqlitebiter.rtfd.io/
docs:
http://sqlitebiter.readthedocs.io/en/latest/
project:
https://github.com/thombashi/sqlitebiter
last update approximately 3 months ago
last issue closed approximately 1 month ago, none open
noted today, 2018-03-14
You can use spyql.
spyql reads the json files (with 1 json object per line) and generates INSERT statements that you can pipe into sqlite:
$ spyql -Otable=my_table "SELECT json->uri, json->user_agent FROM json TO sql" < sample3.json | sqlite3 my.db
This assumes that you already created an empty table in the sqlite database my.db.
Disclaimer: I am the author of spyql.
To work with a file of newline delimited JSON objects, including \n in the data.
Add a header column name and ensure the JSON is compact (1 line per record).
cat <(echo '"line"') source.json | jq -c '.' > source.fauxcsv
Import the JSON and header as a "csv" into a temporary table with a column separator \t that won't occur in the JSON. Then create the real table via SQLites JSON functions.
sqlite3 file.db \
-cmd '.separator \t \n' \
-cmd '.import --schema temp source.fauxcsv temp_json_lines' <<-'EOSQL'
INSERT into records SELECT
json_extract(line, '$.rid'),
coalesce(json_extract(line, '$.created_at'), strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
json_extract(line, '$.name')
FROM temp_json_lines;
EOSQL
If (as in the original question) the JSON data comes in the form of JSONLines (that is, one JSON entity per line), and if it is desired to create a table with one of these entities per row, then sqlite3 can be used to import the data by setting .mode=line, e.g. as follows:
create table input (
raw JSON
);
.mode=line
.import input.json input
This approach is worth knowing not least because it can easily be adapted to handle cases where the data is not already in JSONLines format. For example, if input.json contains a single very long JSON array, we could use a tool such as jq or gojq to "splat" it:
.mode=line
.import "|jq -c .[] input.json" input
Similarly, if input.json contains a single object with many keys, and if it is desired to create a table of corresponding single-key objects:
.mode=line
.import "|jq -c 'to_entries[] | {(.key): .value}'" input
If the original data is a single very large JSON array or JSON object, jq's streaming parser could be used to save memory. In this context, it may be worth mentioning two CLI tools with minimal memory requirements: my own jm (based on JSON Machine), and jm.py (based on ijson). E.g., to "splat" each array in a file containing one or more JSON arrays:
.mode=line
.import "|jm input.json" input
With the JSON data safely in an SQLite table, it is (thanks to SQLite's support for JSON) now quite straightforward to create indices, populate other tables, etc., etc.
Here is the first answer compiled into a deno script:
// just for convenience (pathExists)
import {} from "https://deno.land/x/simple_shell#0.9.0/src/stringUtils.ts";
/**
* #description
* convert a json db to csv and then to sqlite
*
* #note
* `sqliteTableConstructor` is a string that is used to create the table, if it is specified the csv file *should not* contain a header row.
* if it's not specified then the csv file *must* contain a header row so it can be used to infer the column names.
*/
const jsonToSqlite = async (
{
jsonDbPath,
jsonToCsvFn,
sqliteDbPath,
sqliteTableConstructor,
tableName,
}: {
jsonDbPath: string;
sqliteDbPath: string;
tableName: string;
sqliteTableConstructor?: string;
// deno-lint-ignore no-explicit-any
jsonToCsvFn: (jsonDb: any) => string;
},
) => {
// convert it into csv
const csvDbPath = `${jsonDbPath.replace(".json", "")}.csv`;
if (csvDbPath.pathExists()) {
console.log(`${csvDbPath} already exists`);
} else {
const db = JSON.parse(await Deno.readTextFile(jsonDbPath));
const csv = jsonToCsvFn(db);
await Deno.writeTextFile(csvDbPath, csv);
}
// convert it to sqlite
if (sqliteDbPath.pathExists()) {
console.log(`${sqliteDbPath} already exists`);
} else {
const sqlite3 = Deno.spawnChild("sqlite3", {
args: [sqliteDbPath],
stdin: "piped",
stderr: "null", // required to make sqlite3 work
});
await sqlite3.stdin.getWriter().write(
new TextEncoder().encode(
".mode csv\n" +
(sqliteTableConstructor ? `${sqliteTableConstructor};\n` : "") +
`.import ${csvDbPath} ${tableName}\n` +
".exit\n",
),
);
await sqlite3.status;
}
};
Example of usage:
await jsonToSqlite(
{
jsonDbPath: "./static/db/db.json",
sqliteDbPath: "./static/db/db.sqlite",
tableName: "radio_table",
sqliteTableConstructor:
"CREATE TABLE radio_table(name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
jsonToCsvFn: (
db: StationDBType[],
) => {
const sanitize = (str: string) =>
str.trim().replaceAll("\n", " ").replaceAll(",", " ");
return db.filter((s) => s.name.trim() && s.url.trim())
.map(
(station) => {
return (
sanitize(station.name) + "," +
sanitize(station.country) + "," +
sanitize(station.language) + "," +
station.votes + "," +
sanitize(station.url) + "," +
sanitize(station.favicon)
);
},
).join("\n");
},
},
);
Edit1:
Importing csv to sqlite by defaults sets all column types to string. In this edit I allow the user to create the table first (via an optional constructor) before importing the csv into it, this way he can specify the exact column types.
Improve example
Edit2:
Turns out that with deno and sqlite-deno you don't need to use csv as an intermediate or shell out to sqlite, here is an example on how to achieve this:
This next code will create a new sql db from the json one.
import { DB } from "https://deno.land/x/sqlite#v3.2.1/mod.ts";
export interface StationDBType {
name: string;
country: string;
language: string;
votes: number;
url: string;
favicon: string;
}
export const db = new DB("new.sql");
db.query(
"create TABLE radio_table (name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
);
const jsonDb: StationDBType[] = JSON.parse(
await Deno.readTextFile("static/db/compressed_db.json"),
);
const sanitize = (s: string) => s.replaceAll('"', "").replaceAll("'", "");
db.query(
`insert into radio_table values ${
jsonDb.map((station) =>
"('" +
sanitize(station.name) +
"','" +
sanitize(station.country) +
"','" +
sanitize(station.language) +
"'," +
station.votes +
",'" +
sanitize(station.url) +
"','" +
sanitize(station.favicon) +
"')"
).join(",")
}`,
);
db.close();
All my university notes are in JSON format and when I get a set of practical questions from a pdf it is formatted like this:
1. Download and compile the code. Run the example to get an understanding of how it works. (Note that both
threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this
is an interface issue, not of concern in this course.)
2. Explore the classes SumTask and StringTask as well as the abstract class Task.
3. Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is
called.
4. Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have
to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.)
Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger
than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer
for a discussion.
5. Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times,
but “pop()”s off only the first task in the queue and executes it.
6. Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as
the following to the SumTask class definition:
private static final String taskType = "SumTask";
Investigate what “static” and “final” mean.
7. More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they
implement this interface. Here’s an example interface:
What I would like to do is copy it into vim and execute a find and replace to convert it into this:
"1": {
"Task": "Download and compile the code. Run the example to get an understanding of how it works. (Note that both threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this is an interface issue, not of concern in this course.)",
"Solution": ""
},
"2": {
"Task": "Explore the classes SumTask and StringTask as well as the abstract class Task.",
"Solution": ""
},
"3": {
"Task": "Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is called.",
"Solution": ""
},
"4": {
"Task": "Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.) Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer for a discussion.",
"Solution": ""
},
"5": {
"Task": "Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times, but “pop()”s off only the first task in the queue and executes it.",
"Solution": ""
},
"6": {
"Task": "Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as the following to the SumTask class definition: private static final String taskType = 'SumTask'; Investigate what “static” and “final” mean.",
"Solution": ""
},
"7": {
"Task": "More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they implement this interface. Here’s an example interface:",
"Solution": ""
}
After trying to figure this out during the practical (instead of actually doing the practical) this is the closest I got:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\)--end--/"\1": {\r "Task": "\2",\r"Solution": "" \r},/g
The 3 problems with this are
I have to add --end-- to the end of each question. I would like it to know when the question ends by looking ahead to a line which starts with [1-9][1-9]*. unfortunately when I search for that It also replaces that part.
This keeps all the new lines within the question (which is invalid in JSON). I would like it to remove the new lines.
The last entry should not contain a "," after the input because that would also be invalid JSON (Note I don't mind this very much as it is easy to remove the last "," manually)
Please keep in mind I am very bad at regular expressions and one of the reasons I am doing this is to learn more about regex so please explain any regex you post as a solution.
In two steps:
%s/\n/\ /g
to solve problem 2, and then:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\([1-9][1-9]*\. \|\%$\)\#=\)/"\1": {\r "Task": "\2",\r"Solution": "" \r},\r/g
to solve problem 1.
You can solve problem 3 with another replace round. Also, my solution inserts an unwanted extra space at the end of the task entries. Try to remove it yourself.
Short explanation of what I have added:
\|: or;
\%$: end of file;
\#=: find but don't include in match.
If each item sits in single line, I would transform the text with macro, it is shorter and more straightforward than the :s:
I"<esc>f.s": {<enter>"Task": "<esc>A"<enter>"Solution": ""<enter>},<esc>+
Record this macro in a register, like q, then you can just replay it like 100#q to do the transformation.
Note that
the result will leave a comma , and the end, just remove it.
You can also add indentations during your macro recording, then your json will be "pretty printed". Or you can make it sexy later with other tool.
You could probably do this with one large regular expression, but that quickly becomes unmaintainable. I would break the task up into 3 steps instead:
Separate each numbered step into its own paragraph .
Put each paragraph on its own line .
Generate the JSON .
Taken together:
%s/^[0-9]\+\./\r&/
%s/\(\S\)\n\(\S\)/\1 \2/
%s/^\([0-9]\+\)\. *\(.*\)/"\1": {\r "Task": "\2",\r "Solution": ""\r},/
This solution also leaves a comma after the last element. This can be removed with:
$s/,//
Explanation
%s/^[0-9]\+\./\r&/ this matches a line starting with a number followed by a dot, e.g. 1., 8., 13., 131, etc. and replaces it with a newline (\r) followed by the match (&).
%s/\(\S\)\n\(\S\)/\1 \2/ this removes any newline that is flanked by non-white-space characters (\S).
%s/^\([0-9]\+\)\. *\(.*\) ... capture the number and text in \1 and \2.
... /"\1": {\r "Task": "\2",\r "Solution": ""\r},/ format text appropriately.
Alternative way using sed, awk and jq
You can perform steps one and two from above straightforwardly with sed and awk:
sed 's/^[0-9]\+\./\n&/' infile
awk '$1=$1; { print "\n" }' RS= ORS=' '
Using jq for the third step ensures that the output is valid JSON:
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
Here as one command line:
sed 's/^[0-9]\+\./\n&/' infile |
awk '$1=$1; { print "\n" }' RS= ORS=' ' |
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
I have an array like below. I want to parse entire data to my bash array.
So i can call the first "JSON addressLineOne" from ${bashaddr[0]}, and etc.
[
{
"id":"f0c546d5-0ce4-55ee-e043-516e0f0afdc1",
"cardType":"WMUSGESTORECARD",
"lastFour":"1682",
"cardExpiryDate":"2012-01-16",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Apt venue",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"534534",
"isDefault":false
},
{
"id":"f0c546d5-0ce0-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"2731",
"cardExpiryDate":"2009-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"119 maple ave.",
"addressLineTwo":"",
"city":"uncasville",
"state":"CT",
"postalCode":"06382",
"phone":"7676456",
"isDefault":false
},
{
"id":"f0c546d5-0ce2-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"6025",
"cardExpiryDate":"2011-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Angeline Street",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"7867876",
"isDefault":false
}
]
I have tried like this:
#!/bin/bash
addressLineOne="$(echo $card | jsawk 'return this.addressLineOne')"
but it gives me the entire address:
["address 1","address 2","address 3"]
Thank you.
I wrote the answer below before reading the comments, but this is exactly the same answer as #4ae1e1 provided, except I don't put -r tag in case you want the values to remain quoted (e.g. passing this as an argument somewhere else).
I know this is not jsawk, but do consider jq:
jq '.[].addressLineOne' yourfile.txt
And to access specific values you can put record number in the square brackets (starting with 0 for the first address and so on). For example to get the address for the third record:
jq '.[2].addressLineOne' yourfile.txt
For learning more about jq and advanced uses, check: http://jqplay.org
What you need to do is make use of the -a switch to apply some post processing and filter the output array like this:
jsawk 'return this.addressLineOne' -a 'return this[0]'
From the documentation:
-b <script> | -a <script>
Run the specified snippet of JavaScript before (-b) or after (-a)
processing JSON input. The `this` object is set to the whole JSON
array or object. This is used to preprocess (-b) or postprocess
(-a) the JSON array before or after the main script is applied.
This option can be specified multiple times to define multiple
before/after scripts, which will be applied in the order they
appeared on the command line.