How to convert a JSON file to an SQLite database - json

If I have some sample data, how do I put it into SQLite (preferably fully automated)?
{"uri":"/","user_agent":"example1"}
{"uri":"/foobar","user_agent":"example1"}
{"uri":"/","user_agent":"example2"}
{"uri":"/foobar","user_agent":"example3"}

I found the easiest way to do this is by using jq and CSV as an intermediary format.
Getting the CSV
First write your data to a file.
I will assume data.json here.
Then construct the header using jq:
% head -1 data.json | jq -r 'keys | #csv'
"uri","user_agent"
The head -1 is because we only want one line.
jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV.
We then call the internal function keys to get the keys of the input as an array.
This we send to the #csv formatter which outputs us a single string with the headers in quoted CSV format.
We then need to construct the data.
% jq -r 'map(tostring) | #csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"
We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […].
This basically converts our dictionary to an array of keys.
Sent to the #csv formatter, we again get some CSV.
Putting it all together we get a single one-liner in the form of:
% (head -1 data.json | jq -r 'keys | #csv' && jq -r 'map(tostring) | #csv' < data.json) > data.csv
If you need to convert the data on the fly, i.e. without a file, try this:
% cat data.json | (read -r first && jq -r '(keys | #csv),(map(tostring) | #csv)' <<<"${first}" && jq -r 'map(tostring) | #csv')
Loading it into SQLite
Open an SQLite database:
sqlite3 somedb.sqlite
Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):
.mode csv
.import data.csv my_table
Now close the shell and open it again for a clean environment.
You can now easily SELECT from the database and do whatever you want to.
Putting it all together
Have an asciinema recording right there:
Edits
Edit:
As pointed out (thanks #Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format.
jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json.
If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.
Edit 2:
Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.
Edit 3:
If you hit any obstacles, try using keys_unsorted instead of keys.
I haven't tested that myself (I kind of assume my columns were already sorted), however #Kyle Barron reports that this was needed.
Edit 4:
As pointed out by youngminz in the comment below the original command fails when working with non-{number,string} values like nested lists.
The command has been updated (with a slightly adapted version from the comment, map() – unlike map_values() converts objects to their keys the same as [.[]], making the map more readable).
Keys remain unaffected, if you really have complex types as keys (which may not even conform to JSON, but I'm too lazy to look it up right now) you can do the same for the key-related mappings.

A way do this without CSV or a 3rd party tool is to use the JSON1 extension of SQLite combined with the readfile extension that is provided in the sqlite3 CLI tool. As well as overall being a "more direct" solution, this has the advantage of handling JSON NULL values more consistently than CSV, which will otherwise import them as empty strings.
If the input file is a well-formed JSON file, e.g. the example given as an array:
[
{"uri":"/","user_agent":"example1"},
{"uri":"/foobar","user_agent":"example1"},
{"uri":"/","user_agent":"example2"},
{"uri":"/foobar","user_agent":"example3"}
]
Then this can be read into the corresponding my_table table as follows. Open the SQLite database file my_db.db using the sqlite3 CLI:
sqlite3 my_db.db
then create my_table using:
CREATE TABLE my_table(uri TEXT, user_agent TEXT);
Finally, the JSON data in my_data.json can be inserted into the table with the CLI command:
INSERT INTO my_table SELECT
json_extract(value, '$.uri'),
json_extract(value, '$.user_agent')
FROM json_each(readfile('my_data.json'));
If the initial JSON file is newline separated JSON elements, then this can be converted first using jq using:
jq -s <my_data_raw.json >my_data.json
It's likely there is a way to do this directly in SQLite using JSON1, but I didn't pursue that given that I was already using jq to massage the data prior to import to SQLite.

sqlitebiter appears to provide a python solution:
A CLI tool to convert CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/TSV/Google-Sheets to a SQLite database file. http://sqlitebiter.rtfd.io/
docs:
http://sqlitebiter.readthedocs.io/en/latest/
project:
https://github.com/thombashi/sqlitebiter
last update approximately 3 months ago
last issue closed approximately 1 month ago, none open
noted today, 2018-03-14

You can use spyql.
spyql reads the json files (with 1 json object per line) and generates INSERT statements that you can pipe into sqlite:
$ spyql -Otable=my_table "SELECT json->uri, json->user_agent FROM json TO sql" < sample3.json | sqlite3 my.db
This assumes that you already created an empty table in the sqlite database my.db.
Disclaimer: I am the author of spyql.

To work with a file of newline delimited JSON objects, including \n in the data.
Add a header column name and ensure the JSON is compact (1 line per record).
cat <(echo '"line"') source.json | jq -c '.' > source.fauxcsv
Import the JSON and header as a "csv" into a temporary table with a column separator \t that won't occur in the JSON. Then create the real table via SQLites JSON functions.
sqlite3 file.db \
-cmd '.separator \t \n' \
-cmd '.import --schema temp source.fauxcsv temp_json_lines' <<-'EOSQL'
INSERT into records SELECT
json_extract(line, '$.rid'),
coalesce(json_extract(line, '$.created_at'), strftime('%Y-%m-%dT%H:%M:%fZ', 'now')),
json_extract(line, '$.name')
FROM temp_json_lines;
EOSQL

If (as in the original question) the JSON data comes in the form of JSONLines (that is, one JSON entity per line), and if it is desired to create a table with one of these entities per row, then sqlite3 can be used to import the data by setting .mode=line, e.g. as follows:
create table input (
raw JSON
);
.mode=line
.import input.json input
This approach is worth knowing not least because it can easily be adapted to handle cases where the data is not already in JSONLines format. For example, if input.json contains a single very long JSON array, we could use a tool such as jq or gojq to "splat" it:
.mode=line
.import "|jq -c .[] input.json" input
Similarly, if input.json contains a single object with many keys, and if it is desired to create a table of corresponding single-key objects:
.mode=line
.import "|jq -c 'to_entries[] | {(.key): .value}'" input
If the original data is a single very large JSON array or JSON object, jq's streaming parser could be used to save memory. In this context, it may be worth mentioning two CLI tools with minimal memory requirements: my own jm (based on JSON Machine), and jm.py (based on ijson). E.g., to "splat" each array in a file containing one or more JSON arrays:
.mode=line
.import "|jm input.json" input
With the JSON data safely in an SQLite table, it is (thanks to SQLite's support for JSON) now quite straightforward to create indices, populate other tables, etc., etc.

Here is the first answer compiled into a deno script:
// just for convenience (pathExists)
import {} from "https://deno.land/x/simple_shell#0.9.0/src/stringUtils.ts";
/**
* #description
* convert a json db to csv and then to sqlite
*
* #note
* `sqliteTableConstructor` is a string that is used to create the table, if it is specified the csv file *should not* contain a header row.
* if it's not specified then the csv file *must* contain a header row so it can be used to infer the column names.
*/
const jsonToSqlite = async (
{
jsonDbPath,
jsonToCsvFn,
sqliteDbPath,
sqliteTableConstructor,
tableName,
}: {
jsonDbPath: string;
sqliteDbPath: string;
tableName: string;
sqliteTableConstructor?: string;
// deno-lint-ignore no-explicit-any
jsonToCsvFn: (jsonDb: any) => string;
},
) => {
// convert it into csv
const csvDbPath = `${jsonDbPath.replace(".json", "")}.csv`;
if (csvDbPath.pathExists()) {
console.log(`${csvDbPath} already exists`);
} else {
const db = JSON.parse(await Deno.readTextFile(jsonDbPath));
const csv = jsonToCsvFn(db);
await Deno.writeTextFile(csvDbPath, csv);
}
// convert it to sqlite
if (sqliteDbPath.pathExists()) {
console.log(`${sqliteDbPath} already exists`);
} else {
const sqlite3 = Deno.spawnChild("sqlite3", {
args: [sqliteDbPath],
stdin: "piped",
stderr: "null", // required to make sqlite3 work
});
await sqlite3.stdin.getWriter().write(
new TextEncoder().encode(
".mode csv\n" +
(sqliteTableConstructor ? `${sqliteTableConstructor};\n` : "") +
`.import ${csvDbPath} ${tableName}\n` +
".exit\n",
),
);
await sqlite3.status;
}
};
Example of usage:
await jsonToSqlite(
{
jsonDbPath: "./static/db/db.json",
sqliteDbPath: "./static/db/db.sqlite",
tableName: "radio_table",
sqliteTableConstructor:
"CREATE TABLE radio_table(name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
jsonToCsvFn: (
db: StationDBType[],
) => {
const sanitize = (str: string) =>
str.trim().replaceAll("\n", " ").replaceAll(",", " ");
return db.filter((s) => s.name.trim() && s.url.trim())
.map(
(station) => {
return (
sanitize(station.name) + "," +
sanitize(station.country) + "," +
sanitize(station.language) + "," +
station.votes + "," +
sanitize(station.url) + "," +
sanitize(station.favicon)
);
},
).join("\n");
},
},
);
Edit1:
Importing csv to sqlite by defaults sets all column types to string. In this edit I allow the user to create the table first (via an optional constructor) before importing the csv into it, this way he can specify the exact column types.
Improve example
Edit2:
Turns out that with deno and sqlite-deno you don't need to use csv as an intermediate or shell out to sqlite, here is an example on how to achieve this:
This next code will create a new sql db from the json one.
import { DB } from "https://deno.land/x/sqlite#v3.2.1/mod.ts";
export interface StationDBType {
name: string;
country: string;
language: string;
votes: number;
url: string;
favicon: string;
}
export const db = new DB("new.sql");
db.query(
"create TABLE radio_table (name TEXT, country TEXT, language TEXT, votes INT, url TEXT, favicon TEXT)",
);
const jsonDb: StationDBType[] = JSON.parse(
await Deno.readTextFile("static/db/compressed_db.json"),
);
const sanitize = (s: string) => s.replaceAll('"', "").replaceAll("'", "");
db.query(
`insert into radio_table values ${
jsonDb.map((station) =>
"('" +
sanitize(station.name) +
"','" +
sanitize(station.country) +
"','" +
sanitize(station.language) +
"'," +
station.votes +
",'" +
sanitize(station.url) +
"','" +
sanitize(station.favicon) +
"')"
).join(",")
}`,
);
db.close();

Related

How to parse IP within fields in ELK

I am trying to automate/ease a procedure to review firewall rules within ELK (ElasticSearch, Logstash, Kibana).
I have some data obtained from a CSV, which is structured like this:
Source;Destination;Service;Action;Comment
10.0.0.0/8 172.16.0.0/16 192.168.0.0/24 23.2.20.6;10.0.0.1 10.0.0.2 10.0.0.3;udp:53
tcp:53;accept;No.10: ID: INC0000000001
My objective is to import this data within ELK by parsing each field (for subnet and/or IP address) and, if possible, add a sequential field (IP_Source1,IP_Destination2,etc) containing each one.
Is this possible, to your knowledge? How?
Thanks for any hint you may be able to provide
You can create a logstash configuration with input as file. Then use first csv filter. CSV filter should look like this.
filter {
csv {
columns => ["source", "destination", "service", "action", "comment"]
separator => ";"
}
}
Next filter will need to be ruby filter.
filter {
ruby {
code => "
arr = event.get(source).split('')
arr.each.with_index(1) do |a, index|
event.set(ip_source+index, a)
end
"
}
}
Final will be output to elasticsearch.
I have not tested code. But I am hoping this shuld give you good hints.

Save content of Spark DataFrame as a single CSV file [duplicate]

This question already has answers here:
Write single CSV file using spark-csv
(16 answers)
Closed 4 years ago.
Say I have a Spark DataFrame which I want to save as CSV file. After Spark 2.0.0 , DataFrameWriter class directly supports saving it as a CSV file.
The default behavior is to save the output in multiple part-*.csv files inside the path provided.
How would I save a DF with :
Path mapping to the exact file name instead of folder
Header available in first line
Save as a single file instead of multiple files.
One way to deal with it, is to coalesce the DF and then save the file.
df.coalesce(1).write.option("header", "true").csv("sample_file.csv")
However this has disadvantage in collecting it on Master machine and needs to have a master with enough memory.
Is it possible to write a single CSV file without using coalesce ? If not, is there a efficient way than the above code ?
Just solved this myself using pyspark with dbutils to get the .csv and rename to the wanted filename.
save_location= "s3a://landing-bucket-test/export/"+year
csv_location = save_location+"temp.folder"
file_location = save_location+'export.csv'
df.repartition(1).write.csv(path=csv_location, mode="append", header="true")
file = dbutils.fs.ls(csv_location)[-1].path
dbutils.fs.cp(file, file_location)
dbutils.fs.rm(csv_location, recurse=True)
This answer can be improved by not using [-1], but the .csv seems to always be last in the folder. Simple and fast solution if you only work on smaller files and can use repartition(1) or coalesce(1).
Use:
df.toPandas().to_csv("sample_file.csv", header=True)
See documentation for details:
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dataframe#pyspark.sql.DataFrame.toPandas
df.coalesce(1).write.option("inferSchema","true").csv("/newFolder",header =
'true',dateFormat = "yyyy-MM-dd HH:mm:ss")
The following scala method works in local or client mode, and writes the df to a single csv of the chosen name. It requires that the df fit into memory, otherwise collect() will blow up.
import org.apache.hadoop.fs.{FileSystem, Path}
val SPARK_WRITE_LOCATION = some_directory
val SPARKSESSION = org.apache.spark.sql.SparkSession
def saveResults(results : DataFrame, filename: String) {
var fs = FileSystem.get(this.SPARKSESSION.sparkContext.hadoopConfiguration)
if (SPARKSESSION.conf.get("spark.master").toString.contains("local")) {
fs = FileSystem.getLocal(new conf.Configuration())
}
val tempWritePath = new Path(SPARK_WRITE_LOCATION)
if (fs.exists(tempWritePath)) {
val x = fs.delete(new Path(SPARK_WRITE_LOCATION), true)
assert(x)
}
if (results.count > 0) {
val hadoopFilepath = new Path(SPARK_WRITE_LOCATION, filename)
val writeStream = fs.create(hadoopFilepath, true)
val bw = new BufferedWriter( new OutputStreamWriter( writeStream, "UTF-8" ) )
val x = results.collect()
for (row : Row <- x) {
val rowString = row.mkString(start = "", sep = ",", end="\n")
bw.write(rowString)
}
bw.close()
writeStream.close()
val resultsWritePath = new Path(WRITE_DIRECTORY, filename)
if (fs.exists(resultsWritePath)) {
fs.delete(resultsWritePath, true)
}
fs.copyToLocalFile(false, hadoopFilepath, resultsWritePath, true)
} else {
System.exit(-1)
}
}
This solution is based on a Shell Script and is not parallelized, but is still very fast, especially on SSDs. It uses cat and output redirection on Unix systems. Suppose that the CSV directory containing partitions is located on /my/csv/dir and that the output file is /my/csv/output.csv:
#!/bin/bash
echo "col1,col2,col3" > /my/csv/output.csv
for i in /my/csv/dir/*.csv ; do
echo "Processing $i"
cat $i >> /my/csv/output.csv
rm $i
done
echo "Done"
It will remove each partition after appending it to the final CSV in order to free space.
"col1,col2,col3" is the CSV header (here we have three columns of name col1, col2 and col3). You must tell Spark to don't put the header in each partition (this is accomplished with .option("header", "false") because the Shell Script will do it.
For those still wanting to do this here's how I got it done using spark 2.1 in scala with some java.nio.file help.
Based on https://fullstackml.com/how-to-export-data-frame-from-apache-spark-3215274ee9d6
val df: org.apache.spark.sql.DataFrame = ??? // data frame to write
val file: java.nio.file.Path = ??? // target output file (i.e. 'out.csv')
import scala.collection.JavaConversions._
// write csv into temp directory which contains the additional spark output files
// could use Files.createTempDirectory instead
val tempDir = file.getParent.resolve(file.getFileName + "_tmp")
df.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save(tempDir.toAbsolutePath.toString)
// find the actual csv file
val tmpCsvFile = Files.walk(tempDir, 1).iterator().toSeq.find { p =>
val fname = p.getFileName.toString
fname.startsWith("part-00000") && fname.endsWith(".csv") && Files.isRegularFile(p)
}.get
// move to desired final path
Files.move(tmpCsvFile, file)
// delete temp directory
Files.walk(tempDir)
.sorted(java.util.Comparator.reverseOrder())
.iterator().toSeq
.foreach(Files.delete(_))
The FileUtil.copyMerge() from the Hadoop API should solve your problem.
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
def merge(srcPath: String, dstPath: String): Unit = {
val hadoopConfig = new Configuration()
val hdfs = FileSystem.get(hadoopConfig)
FileUtil.copyMerge(hdfs, new Path(srcPath), hdfs, new Path(dstPath), true, hadoopConfig, null)
// the "true" setting deletes the source files once they are merged into the new output
}
See Write single CSV file using spark-csv
This is how distributed computing work! Multiple files inside a directory is exactly how distributed computing works, this is not a problem at all since all software can handle it.
Your question should be "how is it possible to download a CSV composed of multiple files?" -> there are already lof of solutions in SO.
Another approach could be to use Spark as a JDBC source (with the awesome Spark Thrift server), write a SQL query and transform the result to CSV.
In order to prevent OOM in the driver (since the driver will get ALL
the data), use incremental collect
(spark.sql.thriftServer.incrementalCollect=true), more info at
http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/.
Small recap about Spark "data partition" concept:
INPUT (X PARTITIONs) -> COMPUTING (Y PARTITIONs) -> OUTPUT (Z PARTITIONs)
Between "stages", data can be transferred between partitions, this is the "shuffle". You want "Z" = 1, but with Y > 1, without shuffle? this is impossible.

Parse complex Json string contained in Hadoop

I want to parse a string of complex JSON in Pig. Specifically, I want Pig to understand my JSON array as a bag instead of as a single chararray. I found that complex JSON can be parsed by using Twitter's Elephant Bird or Mozilla's Akela library. (I found some additional libraries, but I cannot use 'Loader' based approach since I use HCatalog Loader to load data from Hive.)
But, the problem is the structure of my data; each value of Map structure contains value part of complex JSON. For example,
1. My table looks like (WARNING: type of 'complex_data' is not STRING, a MAP of <STRING, STRING>!)
TABLE temp_table
(
user_id BIGINT COMMENT 'user ID.',
complex_data MAP <STRING, STRING> COMMENT 'complex json data'
)
COMMENT 'temp data.'
PARTITIONED BY(created_date STRING)
STORED AS RCFILE;
2. And 'complex_data' contains (a value that I want to get is marked with two *s, so basically #'d'#'f' from each PARSED_STRING(complex_data#'c') )
{ "a": "[]",
"b": "\"sdf\"",
"**c**":"[{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},
{\"**d**\":{\"e\":\"sdfsdf\"
,\"**f**\":\"sdfs\"
,\"g\":\"qweqweqwe\"},
\"c\":[{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"},
{\"d\":21321,\"e\":\"ewrwer\"}]
},]"
}
3. So, I tried... (same approach for Elephant Bird)
REGISTER '/path/to/akela-0.6-SNAPSHOT.jar';
DEFINE JsonTupleMap com.mozilla.pig.eval.json.JsonTupleMap();
data = LOAD temp_table USING org.apache.hive.hcatalog.pig.HCatLoader();
values_of_map = FOREACH data GENERATE complex_data#'c' AS attr:chararray; -- IT WORKS
-- dump values_of_map shows correct chararray data per each row
-- eg) ([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }])
([{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... },
{"d":{"e":"sdfsdf","f":"sdfs","g":"sdf"},... }]) ...
attempt1 = FOREACH data GENERATE JsonTupleMap(complex_data#'c'); -- THIS LINE CAUSE AN ERROR
attempt2 = FOREACH data GENERATE JsonTupleMap(CONCAT(CONCAT('{\\"key\\":', complex_data#'c'), '}'); -- IT ALSO DOSE NOT WORK
I guessed that "attempt1" was failed because the value doesn't contain full JSON. However, when I CONCAT like "attempt2", I generate additional \ mark with. (so each line starts with {\"key\": ) I'm not sure that this additional marks breaks the parsing rule or not. In any case, I want to parse the given JSON string so that Pig can understand. If you have any method or solution, please Feel free to let me know.
I finally solved my problem by using jyson library with jython UDF.
I know that I can solve it by using JAVA or other languages.
But, I think that jython with jyson is the most simplist answer to this issue.

Get list of all values in a key:value list

So my input looks like
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":""}
{"selling":"0","quantity":"100","price":"80000","date":"1401383271","rs_name":"sammvarnish","contact":"PM","notes":"Seers Bank W321 :)"}
{"selling":"0","quantity":"100","price":"70000","date":"1401383168","rs_name":"pwnoramaa","contact":"PM","notes":""}
and the output I want must look like
0,2,80000,1401384212,overhault,PM,""
0,100,80000,1401383271,sammvarnish,PM,"Seers Bank W321 :)"
0,100,70000,1401383168,pwnoramaa,PM,""
What's the best way to do this in bash?
EDIT: changed my needs.
The new output I want is, for
{"selling":"0","quantity":"2","price":"80000","date":"1401384212","rs_name":"overhault","contact":"PM","notes":"testnote"}
as input,
rs name: \t overhault
quantity: \t 2
price: \t 80000
date: \t 29-05 19:23
contact: \t PM
notes: \t testnote
Where \t is a tab character (like in echo "\t").
As you can see, this one is a tad bit more complicated.
For example, it changes the order, and requires the UNIX timestamp to be converted to an alternative format.
I'll use any tool you can offer me as long as you explain clearly how I can use it from a bash script. The input will consist of three of such lines, delimited by a newline character, and it must print the output with an empty line between each of the results.
Don't do this with regular expressions/bash, there are JSON parsers for this kind of task. Simple Python example:
import json
data = json.loads('{"selling":"0","quantity":"2"}')
data = ','.join(data.values())
print(data)
I strongly suggest you just use a simple script like this which you make executable and then call.
EDIT: here's a version which preserves the order:
import json
data = json.loads('{"selling":"0","quantity":"2", "price":"80000"}')
orderedkeys = ['selling', 'quantity', 'price']
values = [data[key] for key in orderedkeys]
values = ','.join(values)
print(values)
output:
0,2,80000

BASH - Parsing values from a single line database file (

im trying to parse a database file retrieved from a website via curl, however I having trouble trying to figure out how to get the values.
This is an example of the file
{"Databasename":[{"Var1":"Var1Value","Var2":"Var2Value","Var3":"Var3Value"},{"Var1b":"Var1bValue","Var2b":"Var2bValue","Var3b":"Var3bValue"}],"foldername":{"dbTblcountvar":"dbTblcountvalue","filecountsize":"filecountsizvalue"}}
and with line break for better readability
{
"Databasename":
[
{
"Var1":"Var1aValue",
"Var2":"Var2aValue",
"Var3":"Var3aValue"
},
{
"Var1":"Var1bValue",
"Var2":"Var2bValue",
"Var3":"Var3bValue"
},
{
"Var1":"Var1cValue",
"Var2":"Var2cValue",
"Var3":"Var3cValue"
}
],
"foldername":
{
"dbTblcountvar":"dbTblcountvalue",
"filecountsize":"filecountsizvalue"
}
}
asuming Var2 is always constant, how can i get its value? (Var2aValue,Var2bValue,Var2cValue,Var2dValue,.....)
In the example above the value im trying to get is an id for a file i need to send back to the server to download the file, and perform other operations on it.
Thanks
cat DownloadedFile.Ext | perl -pe 's/"Var2[abc]?":"(.+?)(?<![\\])"/\n\1\n/g' | grep -vPe '(?<!\\)"'
Those commands first put the Var2 (with optional a, b or c after) on a new line, then filter all lines that have a ".
I suppose that is a json file, so I avoid the matching of escaped " with this part of the regexp:
(?<!\\)