Convert MongoDB Document to Extended JSON in Shell - json

I am looking for a Shell tool that can convert a mongodb document into extended JSON.
So if the original JSON file looks like this:
{
"_id" : ObjectId("5a8c60b8c83eaf000fb39547"),
"name" : "myName",
"created" : ISODate("2018-02-20T17:54:00.091Z"),
"components" : [
...
The result would be something like this:
{
"$oid" : "5a8c60b8c83eaf000fb39547",
"name" : "myName",
"created" : { "$date" : "2018-02-20T17:54:00.091Z"},
"components" : [
...

The MongoDB shell speaks Javascript, so the answer is simple: use JSON.stringify(). If your command is db.serverStatus(), then you can simply do this:
JSON.stringify(db.serverStatus())
This won't output the proper "strict mode" representation of each of the fields ({ "floatApprox": <number> } instead of { "$numberLong": "<number>" }), but if what you care about is getting standards-compliant JSON out, this'll do the trick.

Related

How can i match fields with wildcards using jq?

I have a JSON object of the following form:
{
"Task11c-0-20181209-12:59:30-65611" : {
"attributes" : {
"configname" : "Task11c",
"datetime" : "20181209-12:59:30",
"experiment" : "Task11c",
"inifile" : "lab1.ini",
"iterationvars" : "",
"iterationvarsf" : "",
"measurement" : "",
"network" : "Manhattan1_1C",
"processid" : "65611",
"repetition" : "0",
"replication" : "#0",
"resultdir" : "results",
"runnumber" : "0",
"seedset" : "0"
},
......
},
......
"Task11b-12-20181209-13:03:17-65612" : {
....
....
},
.......
}
I reported only the first part, but in general I have many other sub-objects which match a string like Task11c-0-20181209-12:59:30-65611. They all have in common the initial word Task. I want to extract the processid from each sub-object. I'm trying to use a wildcard like in bash, but it seems not to be possible.
I also read about the match() function, but it works with strings and not json objects.
Thanks for the support.
Filter keys that start with Test and get only the attribute of your choice using the select() expression
jq 'to_entries[] | select(.key|startswith("Task")).value.attributes.processid' json

read.json only reading the first object in Spark

I have a multiLine json file, and I am using spark's read.json to read the json, the problem is that it is only reading the first object from that json file
val dataFrame = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").json(path)
dataFrame.rdd.saveAsTextFile("DataFrame")
Sample json:
{
"_id" : "589895e123c572923e69f5e7",
"thing" : "54eb45beb5f1e061454c5bf4",
"timeline" : [
{
"reason" : "TRIP_START",
"timestamp" : "2017-02-06T17:20:18.007+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.1174091,
69.1174091
],
"endLocation" : [],
"startLocation" : []
},
"reason" : "TRIP_END",
"timestamp" : "2017-02-06T17:25:26.026+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.5691428,
48.1122443
],
"endLocation" : [],
"startLocation" : []
}
],
"__v" : 0
}
{
"_id" : "589895e123c572923e69f5e8",
"thing" : "54eb45beb5f1e032241c5bf4",
"timeline" : [
{
"reason" : "TRIP_START",
"timestamp" : "2017-02-06T17:20:18.007+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.1174091,
50.1174091
],
"endLocation" : [],
"startLocation" : []
},
"reason" : "TRIP_END",
"timestamp" : "2017-02-06T17:25:26.026+02:00",
"type" : "TRIP_EVENT",
"location" : [
51.1174091,
69.1174091
],
"endLocation" : [],
"startLocation" : []
}
],
"__v" : 0
}
I get only the first entry with id = 589895e123c572923e69f5e7.
Is there something that I am doing wrong?
Are you sure multiple multi line JSON is supported?
Each line must contain a separate, self-contained valid JSON object... For a regular multi-line JSON file, set the multiLine option to true
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Where a "regular JSON file" means the entire file is a singular JSON object / array, however, simply putting {} around your data won't work because you need a key for every object, and so you'd need a top level key, maybe say "objects". Similarly, you can try an array, but wrapping with []. Either way, these will only work if every object in that array or object is separated by commas.
tl;dr - the whole file needs to be one valid JSON object when multiline=true
You're only getting one object because it parses the first set of brackets, and that's it.
If you have full control over the JSON file, the indented layout is purely for human consumption. Just flatten the objects and let Spark parse it as the API is intended to be used
Keep one line and one JsValue in file, remove .option("multiLine", true).
like this:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

Need to extract the timestamp from a logstash elasticsearch cluster

I'm trying to determine the freshness of the most recent record in my logstash cluster, but I'm having a bit of trouble digesting the Elasticsearch DSL.
Right now I am doing something like this to extract the timestamp:
curl -sX GET 'http://localhost:9200/logstash-2015.06.02/' -d'{"query": {"match_all": {} } }' | json_pp | grep timestamp
which gets me;
"#timestamp" : "2015-06-02T00:00:28.371+00:00",
I'd like to use an elasticsearch query directly with no grep hackiness.
The raw JSON (snipped for length) looks like this:
{
"took" : 115,
"timed_out" : false,
"hits" : {
"hits" : [
{
"_index" : "logstash-2015.06.02",
"_source" : {
"type" : "syslog",
"#timestamp" : "2015-06-02T00:00:28.371+00:00",
"tags" : [
"sys",
"inf"
],
"message" : " 2015/06/02 00:00:28 [INFO] serf: EventMemberJoin: generichost.example.com 10.1.1.10",
"file" : "/var/log/consul.log",
"#version" : 1,
"host" : "generichost.example.com"
},
"_id" : "AU4xcf51cXOri9NL1hro",
"_score" : 1,
"_type" : "syslog"
},
],
"total" : 8605141,
"max_score" : 1
},
"_shards" : {
"total" : 50,
"successful" : 50,
"failed" : 0
}
}
Any help would be appreciated. I know the query is simple, I just don't know what it is.
You don't need to use the DSL for this. You can simply cram everything into the URL query string, like this:
curl -s XGET 'localhost:9200/logstash-2015.06.02/_search?_source=#timestamp&size=1&sort=#timestamp:desc&format=yaml'
So:
_source=#timestamp means we're only interested in getting the #timestamp value
size=1 means we only need one result
sort=#timestamp:desc means we want to sort on #timestamp descending (i.e. latest first)
format=yaml will get you the result in YAML format which is a bit more concise than JSON in your case
The output would look like this:
- _index: "logstash-2015.06.02"
_type: "syslog"
_id: "AU4xcf51cXOri9NL1hro"
_score: 1.0
_source:
#timestamp: "2015-06-02T00:00:28.371+00:00"
You don't need json_pp anymore, you can still simply grep #timestamp to get the data you need.
Note that in 1.6.0, there will be a way to filter out all the metadata (i.e. _index, _type, _id, _score) and only get the _source for a search result using the filter_path parameter in the URL.

if the json value is longer then how to change the json format it into another json format, dynamically need to change in js (nodejs)

INPUT: Input value is looks like this.
{
"title" : "new resource",
"user" : {
"firstName" : "tester",
"lastname" : "test"
}
}
OUTPUT: output looks like this
{
"title" : "new resource",
"user.firstName" : "tester",
"user.lastname" : "test"
}
In some case the json value may be logner, so we cant put much for loop for each and every set of iteration.
By using user.firstName you can reach to the value:"tester"
but its not the change of the format...
not a good question.

is it possible to extract the specific data in a JSON data , without reading all the values

I have this JSON Data .
My question is that , is it possible to extract the specific data in a JSON data , without reading all the values .
I mean is it possible to query the data as we do in SQL ??
{ "_id" : ObjectId("4e61501e6a73bc73f82f91f3"), "created_at" : "2011-09-02 17:52:30.285", "cust_id" : "sdtest", "moduleName" : "balances", "responses" : [
{
"questionNum" : "1",
"answer" : "Hard",
"comments" : "is that you john wayne?"
},
{
"questionNum" : "2",
"answer" : "Somewhat",
"comments" : "ARg!"
},
{
"questionNum" : "3",
"answer" : "",
"comments" : "Yes"
}
] }
Yes, but you will need to write extra code to do it, or use a third party library. There are a few available: http://www.google.co.uk/search?q=json+linq+sql
Well, unless you use an incremental JSON parser, you'll have to parse the whole JSON first. After that, it depends on your programming language's abilities of how you can filter. For example, in Python
import json
obj = json.loads(jsonData)
answeredQuestions = filter(lambda response: response.answer, obj["responses"])