How to delete subdocument but keep contents in mongodb - json

I ran a script that populated my collection with approximately 60k documents. Because of a mistype, it created subdocuments within all of the documents and contains duplicate information. I really don't need/want the subdocument, but I don't want to delete the it entirely, because I want a field within it to remain.
This is my document structure
{
"_id" : ObjectId(""),
"title" : "",
"url" : "",
"description" : "",
"author" : "",
"publishedAt" : "",
"content" : ""
"source" : {
"id" : "Source",
"name" : "Source"
},
"urlToImage" : ""
}
Ultimately what I want to do, if possible, is remove the source subdocument, but keep the name field. Below is what I want.
{
"_id" : ObjectId(""),
"title" : "",
"url" : "",
"description" : "",
"author" : "",
"publishedAt" : "",
"content" : ""
"name" : "Source"
"urlToImage" : ""
}
I know this would be a multi-part query. I just don't want to make a mistake and delete the entire subdocument without pulling out the fields first.

option 1 - $rename and $unset
use $rename operator to rename source.name to name
perhaps with some check (filter) that you are not going to overwrite existing name with null
then remove the source subdocument using $unset operator
again, just to be sure, you can add filter to make sure that the name field already exists in document where you are unsetting the source
option 2 - find and $set+$unset
retrieve the document
update the document using $set and $unset
Example (in Python):
while True:
doc = db.find_one({
'_id': 'foobar',
'source.name': {'$exists': True},
})
res = db.update_one(
{
'_id': 'foobar',
'source.name': doc['source']['name'],
}, {
'$set': {'name': doc['source']['name']},
'$unset': {'source': ''},
}
)
if res.modified_count == 1:
break
# if nothing was modified then somebody has updated
# the source.name right after our find_one()

Related

Are identical paths with the only difference that one uses query but the other path parameters allowed?

My problem is similar to the one asked in the "Paths that differ only in query parameter names" post. However, I am not interested in identical paths that use different query parameter; instead, my problem is concerning similar paths that use the same parameters but take those parameters as input in a different way, one as query parameters and the other as path parameters.
I tried using creating two different paths in my swagger.json file, one containing them as query parameters and the other as path parameters, but it did not work. Please see below my code:
'''swagger.json
{
// swagger initialisation
"paths":{
"foo_b_ar/{foo}/{bar}":{
"get":{
"summary":"foo foo"
"operationId": "super secret id"
"produces":[
"someName/json"
],
"parameters":[
{
"name" = "foo",
"in" = "path",
"description" = "something",
"required" = true,
"type" = "string"
},
{
"name" = "bar",
"in" = "path",
"description" = "something2",
"required" = true,
"type" = "string"
}
],
"responses":{
// something not relevant
} } },
"foo_b_ar":{
"get":{
"summary":"foo foo"
"operationId": "super secret id"
"produces":[
"someName/json"
],
"parameters":[
{
"name" = "foo",
"in" = "query",
"description" = "something",
"required" = true,
"type" = "string"
},
{
"name" = "bar",
"in" = "query",
"description" = "something2",
"required" = true,
"type" = "string"
}
],
"responses":{
// something not relevant
} } }
}
My question is whether what I am trying to do in the swagger.json file is allowed?
/foo_b_ar and /foo_b_ar/{foo}/{bar} (note the leading / - it's required in path names) are different paths.
/foo_b_ar/smth/other and /foo_b_ar/{foo}/{bar} are also different paths, the former (concrete) definition is supposed to match first if used.
Example of identical paths:
/{foo} and /{bar}
/foo_b_ar/{foo} and /foo_b_ar/{bar}
/foo_b_ar/{foo}/{bar} and /foo_b_ar/{param1}/{param2}
/foo_b_ar/{foo}/something and /foo_b_ar/{bar}/something
/foo_b_ar/something/{foo} and /foo_b_ar/something/{bar}
That is, identical paths are paths that would be the same if you removed the {parameters} from them.
As for why "it did not work", without knowing the details of what exactly did not work and where, it could be because your swagger.json is not valid JSON and also not a valid OpenAPI definition. Please validate it using a JSON validator (e.g. https://jsonlint.com) and in Swagger Editor (https://editor.swagger.io). Some of the errors are:
Missing commas between object fields in JSON.
Invalid key/value separators (= instead of :).
Missing / at the beginning of paths. "foo_b_ar" is not a valid path, it must be "/foo_b_ar".
Duplicate operation IDs: "operationId": "super secret id".

How can i match fields with wildcards using jq?

I have a JSON object of the following form:
{
"Task11c-0-20181209-12:59:30-65611" : {
"attributes" : {
"configname" : "Task11c",
"datetime" : "20181209-12:59:30",
"experiment" : "Task11c",
"inifile" : "lab1.ini",
"iterationvars" : "",
"iterationvarsf" : "",
"measurement" : "",
"network" : "Manhattan1_1C",
"processid" : "65611",
"repetition" : "0",
"replication" : "#0",
"resultdir" : "results",
"runnumber" : "0",
"seedset" : "0"
},
......
},
......
"Task11b-12-20181209-13:03:17-65612" : {
....
....
},
.......
}
I reported only the first part, but in general I have many other sub-objects which match a string like Task11c-0-20181209-12:59:30-65611. They all have in common the initial word Task. I want to extract the processid from each sub-object. I'm trying to use a wildcard like in bash, but it seems not to be possible.
I also read about the match() function, but it works with strings and not json objects.
Thanks for the support.
Filter keys that start with Test and get only the attribute of your choice using the select() expression
jq 'to_entries[] | select(.key|startswith("Task")).value.attributes.processid' json

read.json only reading the first object in Spark

I have a multiLine json file, and I am using spark's read.json to read the json, the problem is that it is only reading the first object from that json file
val dataFrame = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").json(path)
dataFrame.rdd.saveAsTextFile("DataFrame")
Sample json:
{
"_id" : "589895e123c572923e69f5e7",
"thing" : "54eb45beb5f1e061454c5bf4",
"timeline" : [
{
"reason" : "TRIP_START",
"timestamp" : "2017-02-06T17:20:18.007+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.1174091,
69.1174091
],
"endLocation" : [],
"startLocation" : []
},
"reason" : "TRIP_END",
"timestamp" : "2017-02-06T17:25:26.026+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.5691428,
48.1122443
],
"endLocation" : [],
"startLocation" : []
}
],
"__v" : 0
}
{
"_id" : "589895e123c572923e69f5e8",
"thing" : "54eb45beb5f1e032241c5bf4",
"timeline" : [
{
"reason" : "TRIP_START",
"timestamp" : "2017-02-06T17:20:18.007+02:00",
"type" : "TRIP_EVENT",
"location" : [
11.1174091,
50.1174091
],
"endLocation" : [],
"startLocation" : []
},
"reason" : "TRIP_END",
"timestamp" : "2017-02-06T17:25:26.026+02:00",
"type" : "TRIP_EVENT",
"location" : [
51.1174091,
69.1174091
],
"endLocation" : [],
"startLocation" : []
}
],
"__v" : 0
}
I get only the first entry with id = 589895e123c572923e69f5e7.
Is there something that I am doing wrong?
Are you sure multiple multi line JSON is supported?
Each line must contain a separate, self-contained valid JSON object... For a regular multi-line JSON file, set the multiLine option to true
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Where a "regular JSON file" means the entire file is a singular JSON object / array, however, simply putting {} around your data won't work because you need a key for every object, and so you'd need a top level key, maybe say "objects". Similarly, you can try an array, but wrapping with []. Either way, these will only work if every object in that array or object is separated by commas.
tl;dr - the whole file needs to be one valid JSON object when multiline=true
You're only getting one object because it parses the first set of brackets, and that's it.
If you have full control over the JSON file, the indented layout is purely for human consumption. Just flatten the objects and let Spark parse it as the API is intended to be used
Keep one line and one JsValue in file, remove .option("multiLine", true).
like this:
{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}

MongoDB AND Comparison Fails

I have a Collection named StudentCollection with two documents given below,
> db.studentCollection.find().pretty()
{
"_id" : ObjectId("52d7c0c744b4dd77efe93df7"),
"regno" : 101,
"name" : "Ajeesh",
"gender" : "Male",
"docs" : [
"voterid",
"passport",
"drivinglic"
]
}
{
"_id" : ObjectId("52d7c6a144b4dd77efe93df8"),
"regno" : 102,
"name" : "Sathish",
"gender" : "Male",
"dob" : ISODate("2013-12-09T21:05:00Z")
}
Why does the below query returns a document when it doesn't fulfil the criteria which I gave in find command. I know it's a bad & stupid query for AND comparison. I tried this with MySQL and it doesn't return anything as expected but why does NOSQL makes problem. I hope it's considering the last field for comparison.
> db.studentCollection.find({regno:101,regno:102}).pretty()
{
"_id" : ObjectId("52d7c6a144b4dd77efe93df8"),
"regno" : 102,
"name" : "Sathish",
"gender" : "Male",
"dob" : ISODate("2013-12-09T21:05:00Z")
}
Can anyone brief why does Mongodb works this way?
MongoDB leverages JSON/BSON and names should be unique (http://www.ietf.org/rfc/rfc4627.txt # 2.2.) Found this in another post How to generate a JSON object dynamically with duplicate keys? . I am guessing the value for 'regno' gets overridden to '102' in your case.
If what you want is an OR query, try the following:
db.studentCollection.find ( { $or : [ { "regno" : "101" }, {"regno":"102"} ] } );
Or even better, use $in:
db.studentCollection.find ( { "regno" : { $in: ["101", "102"] } } );
Hope this helps!
Edit : Typo!
MongoDB converts your query to a Javascript document. Since you have not mentioned anything for $and condition in your document, your query clause is getting overwritten by the last value which is "regno":"102". Hence you get last document as result.
If you want to use an $and, you may use any of the following:
db.studentCollection.find({$and:[{regno:"102"}, {regno:"101"}]});
db.studentCollection.find({regno:{$gte:"101", $lte:"102"}});

is it possible to extract the specific data in a JSON data , without reading all the values

I have this JSON Data .
My question is that , is it possible to extract the specific data in a JSON data , without reading all the values .
I mean is it possible to query the data as we do in SQL ??
{ "_id" : ObjectId("4e61501e6a73bc73f82f91f3"), "created_at" : "2011-09-02 17:52:30.285", "cust_id" : "sdtest", "moduleName" : "balances", "responses" : [
{
"questionNum" : "1",
"answer" : "Hard",
"comments" : "is that you john wayne?"
},
{
"questionNum" : "2",
"answer" : "Somewhat",
"comments" : "ARg!"
},
{
"questionNum" : "3",
"answer" : "",
"comments" : "Yes"
}
] }
Yes, but you will need to write extra code to do it, or use a third party library. There are a few available: http://www.google.co.uk/search?q=json+linq+sql
Well, unless you use an incremental JSON parser, you'll have to parse the whole JSON first. After that, it depends on your programming language's abilities of how you can filter. For example, in Python
import json
obj = json.loads(jsonData)
answeredQuestions = filter(lambda response: response.answer, obj["responses"])