Please help me to extract json data. i tried to fetch the data with some jq queriesm but results came as line-by-line
cat test.json
..
{
"took" : 43,
"timed_out" : false,
"cardss" : {
"values" : 0,
"faileds" : 0
},
"counts" : {
"total" : 200,
"max_hint" : 1.0000004,
"counts" : [
{
"_index" : "test_90.008.",
"_type" : "fluentdd",
"_id" : "SLSLSLSLSLLSdfsdhjdshfdshfdshfkjdsfdsfsfdsf",
"_score" : 1.0000004,
"_source" : {
"payload" : """{"ID":"11390","Key":"SKSKDISKSK","paymentId":"LSDLSLS-LSLSLSLs-KGOGK","bunkoinfo":{"janaluID":"918282827","ipAddress":"0.0.0.0","chethiid":"fkfkfkfkfkfkfkfkfkkf"},"dabbulluInfo":{"checkType":"mundhucheck","currency":"INR","method":"paper","motthamAmount":"331","cards":{"cardsToken":"2021000","upicodes":"331","cardchettha":"6739837","digitcardss":"0000","kaliDate":"00000"}},"PackOrdetls":[{"items":[{"itemName":"00","quantity":"0","price":"331"}]}],"dtdcid":"kskdkskdsjsjsjdososlsksj"}"""
}
},
}
}
required output is below, please support.
Id,paymentId,motthamAmount,curreny
11390,LSDLSLS-LSLSLSLs-KGOGK,331,INR
i tried
cat test.json | jq -r '.counts.counts[]._source.payload.ID, .counts.counts[]._source.payload.paymentId, .counts.counts[]._source.payload.dabbulluInfo.motthamAmount, .counts.counts[]._source.payload.dabbulluInfo.currency'
got output as one-by-one
11390
LSDLSLS-LSLSLSLs-KGOGK
331
INR
If we rewrite your data until it's actually valid, an answer might look like:
jq -rn '
([ "Id", "paymentId", "motthamAmount", "currency" ] | #csv),
(inputs | .counts.counts[] | [
._source.payload.ID,
._source.payload.paymentId,
._source.payload.dabbulluInfo.motthamAmount,
._source.payload.dabbulluInfo.currency
] | #csv)
' <test.json
See this functioning at https://replit.com/#CharlesDuffy2/RequiredInfiniteComment#main.sh
Here's another attempt trying to reuse traversal:
jq -r '
["Id", "paymentId", "motthamAmount", "curreny"], (
.counts.counts[]._source.payload
| [.ID, .paymentId, (.dabbulluInfo | .motthamAmount, .currency)]
) | #csv
'
"Id","paymentId","motthamAmount","curreny"
"11390","LSDLSLS-LSLSLSLs-KGOGK","331","INR"
Demo
I'm having a JSON file groups.json which is having data in below format -
[
{ "key" : "value" },
{ "key" : "value" },
{ "key" : "value" }
]
I need these key value pairs to a bash array of strings like this -
bashArray = [ { "key" : "value" } { "key" : "value" } { "key : "value" } ]
How can I achieve this on Bash3.x?
With modern versions of bash, you'd simply use mapfile in conjunction with the -c option of jq (as illustrated in several other SO threads, e.g. Convert a JSON array to a bash array of strings)
With older versions of bash, you would still use the -c option but would build up the array one item at a time, along the lines of:
while read -r line ; do
ary+=("$line")
done < <(jq -c .......)
Example
#!/bin/bash
function json {
cat<<EOF
[
{ "key" : "value1" },
{ "key" : "value2" },
{ "key" : "value3" }
]
EOF
}
while read -r line ; do
ary+=("$line")
done < <(json | jq -c .[])
printf "%s\n" "${ary[#]}"
Output:
{"key":"value1"}
{"key":"value2"}
{"key":"value3"}
I have a series of gigantic (40-80mb) exported Google Location History JSON files, with which I've been tasked to analyze select activity data. Unfortunately Google has no parameter or option at their download site to choose anything except "one giant JSON containing forever". (The KML option is twice as big.)
Obvious choices like JSON-Converter (laexcel-test incarnation of VBA-JSON); parsing line-by line with VBA; even Notepad++. They all crash and burn. I'm thinking RegEx might be the answer.
This Python script can extract the timestamp and location from a 40mb file in two seconds (with RegEx?). How is Python doing it so fast? (Would it be as fast in VBA?)
I'd be able to extract everything I need, piece by piece, if only I had a magic chunk of RegEx, perhaps with this logic:
Delete everything except:
When timestampMs and WALKING appear between the *same set of [square brackets] :
I need the 13-digit number that follows timestampMS, and,
the one- to three- digit number that follows WALKING.
If it's simpler to include a little more data, like "all the timestamps", or "all activities", I could easily sift through it later. The goal is to make the file small enough that I can manipulate it without the need to rent a supercomputer, lol.
I tried adapting existing RegEx's but I have a serious issue with both RegEx and musical instruments: doesn't how hard I try, I just can't wrap my head around it. So, this is indeed a "please write code for me" question, but it's just one expression, and I'll pay it forward by writing code for others today! Thanks... ☺
.
}, {
"timestampMs" : "1515564666086", ◁― (Don't need this but it won't hurt)
"latitudeE7" : -6857630899,
"longitudeE7" : -1779694452999,
"activity" : [ {
"timestampMs" : "1515564665992", ◁― EXAMPLE: I want only this, and...
"activity" : [ {
"type" : "STILL",
"confidence" : 65
}, { ↓
"type" : "TILTING",
"confidence" : 4
}, {
"type" : "IN_RAIL_VEHICLE",
"confidence" : 20 ↓
}, {
"type" : "IN_ROAD_VEHICLE",
"confidence" : 5
}, {
"type" : "ON_FOOT", ↓
"confidence" : 3
}, {
"type" : "UNKNOWN",
"confidence" : 3
}, {
"type" : "WALKING", ◁―┬━━ ...AND, I also want this.
"confidence" : 3 ◁―┘
} ]
} ]
}, {
"timestampMs" : "1515564662594", ◁― (Don't need this but it won't hurt)
"latitudeE7" : -6857630899,
"longitudeE7" : -1779694452999,
"altitude" : 42
}, {
Edit:
For testing purposes I made a sample file, representative of the original (except for the size). The raw JSON can be loaded directly from this Pastebin link, or downloaded as a local copy with this TinyUpload link, or copied as "one long line" below:
{"locations" : [ {"timestampMs" : "1515565441334","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 2299}, {"timestampMs" : "1515565288606","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 12,"velocity" : 0,"heading" : 350,"altitude" : 42,"activity" : [ {"timestampMs" : "1515565288515","activity" : [ {"type" : "STILL","confidence" : 98}, {"type" : "ON_FOOT","confidence" : 1}, {"type" : "UNKNOWN","confidence" : 1}, {"type" : "WALKING","confidence" : 1} ]} ]}, {"timestampMs" : "1515565285131","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 12,"velocity" : 0,"heading" : 350,"altitude" : 42}, {"timestampMs" : "1513511490011","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 25,"altitude" : -9,"verticalAccuracy" : 2}, {"timestampMs" : "1513511369962","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 25,"altitude" : -9,"verticalAccuracy" : 2}, {"timestampMs" : "1513511179720","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 16,"altitude" : -12,"verticalAccuracy" : 2}, {"timestampMs" : "1513511059677","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 16,"altitude" : -12,"verticalAccuracy" : 2}, {"timestampMs" : "1513510928842","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 16,"altitude" : -12,"verticalAccuracy" : 2,"activity" : [ {"timestampMs" : "1513510942911","activity" : [ {"type" : "STILL","confidence" : 100} ]} ]}, {"timestampMs" : "1513510913776","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 15,"altitude" : -11,"verticalAccuracy" : 2,"activity" : [ {"timestampMs" : "1513507320258","activity" : [ {"type" : "TILTING","confidence" : 100} ]} ]}, {"timestampMs" : "1513510898735","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 16,"altitude" : -12,"verticalAccuracy" : 2}, {"timestampMs" : "1513510874140","latitudeE7" : 123456789,"longitudeE7" : -123456789,"accuracy" : 19,"altitude" : -12,"verticalAccuracy" : 2,"activity" : [ {"timestampMs" : "1513510874245","activity" : [ {"type" : "STILL","confidence" : 100} ]} ]} ]}
The file tested as valid with JSONLint and FreeFormatter.
Obvious choices ...
The obvious choice here is a JSON-aware tool that can handle large files quickly. In the following, I'll use jq, which can easily handle gigabyte-size files quickly so long as there is sufficient RAM to hold the file in memory, and which can also handle very large files even if there is insufficient RAM to hold the JSON in memory.
First, let's assume that the file consists of an array of JSON objects of the form shown, and that the goal is to extract the two values for each admissible sub-object.
This is a jq program that would do the job:
.[].activity[]
| .timestampMs as $ts
| .activity[]
| select(.type == "WALKING")
| [$ts, .confidence]
For the given input, this would produce:
["1515564665992",3]
More specifically, assuming the above program is in a file named program.jq and that the input file is input.json, a suitable invocation of jq would be as follows:
jq -cf program.jq input.json
It should be easy to modify the jq program given above to handle other cases, e.g. if the JSON schema is more complex than has been assumed above. For example, if there is some irregularity in the schema, try sprinkling in some postfix ?s, e.g.:
.[].activity[]?
| .timestampMs as $ts
| .activity[]?
| select(.type? == "WALKING")
| [$ts, .confidence]
You may try this
(?s)^.*?\"longitude[^\[]*?\"activity[^\[]*\[[^\]]*?timestampMs\"[^\"\]]*\"(\d+)\"[^\]]*WALKING[^\]]*?confidence\"\s*:\s*(\b\d{1,3}\b)[^\]]*?\].*$
Regex Demo,,,in which I searched and approached to the target capturing values( timestamp value, walking value) through such keywords like "longitude", "activity", "[", "timestampMs", "]", "walking", "confidence".
Python script
ss=""" copy & paste the file contents' strings (above sample text) in this area """
regx= re.compile(r"(?s)^.*?\"longitude[^\[]*?\"activity[^\[]*\[[^\]]*?timestampMs\"[^\"\]]*\"(\d+)\"[^\]]*WALKING[^\]]*?confidence\"\s*:\s*(\b\d{1,3}\b)[^\]]*?\].*$")
matching= regx.match(ss) # method 1 : using match() function's capturing group
timestamp= matching.group(1)
walkingval= matching.group(2)
print("\ntimestamp is %s\nwalking value is %s" %(timestamp,walkingval))
print("\n"+regx.sub(r'\1 \2',ss)) # another method by using sub() function
Output is
timestamp is 1515564665992
walking value is 3
1515564665992 3
Unfortunately it seems you've picked a language without a performant JSON parser.
With Python you could have:
#!/usr/bin/env python3
import time
import json
def get_history(filename):
with open(filename) as history_file:
return json.load(history_file)
def walking_confidence(history):
for location in history["locations"]:
if "activity" not in location:
continue
for outer_activity in location["activity"]:
confidence = extract_walking_confidence(outer_activity)
if confidence:
timestampMs = int(outer_activity["timestampMs"])
yield (timestampMs, confidence)
def extract_walking_confidence(outer_activity):
for inner_activity in outer_activity["activity"]:
if inner_activity["type"] == "WALKING":
return inner_activity["confidence"]
if __name__ == "__main__":
start = time.clock()
history = get_history("history.json")
middle = time.clock()
wcs = list(walking_confidence(history))
end = time.clock()
print("load json: " + str(middle - start) + "s")
print("loop json: " + str(end - middle) + "s")
On my 98MB JSON history this prints:
load json: 3.10292s
loop json: 0.338841s
That isn't terribly performant, but certainly not bad.
Have a shell script running on Unix that is going through a list of JSON objects like the following, collecting values like <init>() # JSONInputData.java:82. There are also other objects with other values that I need to retrieve.
Is there a better option than grepping for "STACKTRACE_LINE",\n\s*.* and then splitting up that result?
inb4: "add X package to the OS". Need to run generically.
. . .
"probableStartLocationView" : {
"lines" : [ {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "<init>() # JSONInputData.java:82"
} ],
"text" : "<init>() # JSONInputData.java:82"
} ],
"nested" : false
},
. . . .
What if I was looking for "description" : "Dangerous Data Received" in a series of objects like the following knowing that I need to know that it is associated with event 12345 and not with another event listed in the same file?
. . .
"events" : [ {
"id" : "12345",
"important" : true,
"type" : "Creation",
"description" : "Dangerous Data Received",
. . .
Is there a better option than grepping for "STACKTRACE_LINE",\n\s*.* and then splitting up that result?
Yes. Use jq to filter and extract the interesting parts.
Example 1, given this JSON:
{
"probableStartLocationView": {
"lines": [
{
"fragments": [
{
"type": "STACKTRACE_LINE",
"value": "<init>() # JSONInputData.java:82"
}
],
"text": "<init>() # JSONInputData.java:82"
}
],
"nested": false
}
}
Extract value where type is "STACKTRACE_LINE":
jq -r '.probableStartLocationView.lines[] | .fragments[] | select(.type == "STACKTRACE_LINE") | .value' file.json
This is going to produce one line per value.
Example 2, given this JSON:
{
"events": [
{
"id": "12345",
"important": true,
"type": "Creation",
"description": "Dangerous Data Received"
}
]
}
Extract the id where description starts with "Dangerous":
jq -r '.events[] | select(.description | startswith("Dangerous")) | .id'
And so on.
See the jq manual for more examples and capabilities.
Also there are many questions on Stack Overflow using jq,
that should help you find the right combination of filtering and extracting the relevant parts.
I have one curl command if I run it , output as below,
{
"page" : 1,
"records" : 1,
"total" : 1,
"rows" : [ {
"automated" : true,
"collectionProtocol" : "MagBead Standard Seq v2",
"comments" : "",
"copy" : false,
"createdBy" : "stest",
"custom1" : "User Defined Field 1=",
"custom2" : "User Defined Field 2=",
"custom3" : "User Defined Field 3=",
"custom4" : "User Defined Field 4=",
"custom5" : "User Defined Field 5=",
"custom6" : "User Defined Field 6=",
"description" : null,
"editable" : false,
"expanded" : false,
"groupName" : "99111",
"groupNames" : [ "all" ],
"inputCount" : 1,
"instrumentId" : 1,
"instrumentName" : "42223",
"jobId" : 11111,
"jobStatus" : "In Progress",
"leaf" : true,
"modifiedBy" : null,
"name" : "Copy_of_Test_Running2"
} ]
}
I want to extract only jobId`s value.
This output will be
11111
If there is multiple rows then, there is multiple jobId
11111
11112
11113
I want to extract only jobId and process in the while loop.
like below,
while read job; do
echo $job
done < < (curl command)
and I want to use that job id in another command.
That curl results could be multiple.
Do you have idea to get easy to extract curl output and make a while or for loop?
I think jq (thanks to #Mircea ) is a nice solution.
Besides, I can provide a simple awk solution only if the curl's output format is disciplinary and does not has any dirty symbol.
So, just be careful to use this:
while IFS= read -r line
do
echo $line|awk -F':' '/jobId/{split($2,a,",");for(i in a){if(a[i]){printf("%d\n",a[i])}}}'
done < "$file"