Mongo db and using maps in queries - json

Im new to mongo and am trygin to figure out queries, im using java driver
I have a document in the database which is as follows
{"name":"joe", "surname":"bloggs", "other": {"name":"fred", "surname" : "flint"}, "important" : "important info that does not chnage" }
Now i want to do a upsert on the document and only update it if it does not exist
I have the following which i wan tto insert if it doesnt exist and update if it does.
Bson filter = Filters.and(
Filters.eq("name", oldObj.getName()),
Filters.eq("surname", oldObj.getSurname()),
Filters.eq("translations", oldObj.getOtherMap())
);
Document documentToInsertUpdate = new Document();
documentToInsertUpdate.put("important", newObj.getInfo());
documentToInsertUpdate.put("name", newObj.getName());
documentToInsertUpdate.put("surname", newObj.getSurname());
documentToInsertUpdate.put("other", newObj.getOtherMap());
Bson update = new Document("$set", documentToInsertUpdate);
UpdateOptions options = new UpdateOptions().upsert(true);
collection.updateOne(filter, update, options, getSingleResultCallback(new ArrayList<Translations>()));
there seems to be a problem when i try to set the "other" section of the json in the filter and the document, it doesnt work. If i remove the "other" sections from the filter and documentToInsertUpdate it works fine.
How can i work with the second part i.e
"other": {"name":"fred", "surname" : "flint"}

Related

Insert JSON to Hadoop using Spark (Java)

I'm very new in Hadoop,
I'm using Spark with Java.
I have dynamic JSON, exmaple:
{
"sourceCode":"1234",
"uuid":"df123-....",
"title":"my title"
}{
"myMetaDataEvent": {
"date":"10/10/2010",
},
"myDataEvent": {
"field1": {
"field1Format":"fieldFormat",
"type":"Text",
"value":"field text"
}
}
}
Sometimes I can see only field1 and sometimes I can see field1...field50
And maybe the user can add fields/remove fields from this JSON.
I want to insert this dynamic JSON to hadoop (to hive table) from Spark Java code,
How can I do it?
I want that the user can after make HIVE query, i.e: select * from MyTable where type="Text
I have around 100B JSON records per day that I need to insert to Hadoop,
So what is the recommanded way to do that?
*I'm looked on the following: SO Question but this is known JSON scheme where it isnt my case.
Thanks
I had encountered kind of similar problem, I was able to resolve my problem using this. ( So this might help if you create the schema before you parse the json ).
For a field having a string data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.StringType, true);
For a field having a int data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.IntegerType, true);
After you have added all the fields in a List<StructField>,
Eg:-
List<StructField> innerField = new ArrayList<StructField>();
.... Field adding logic ....
Eg:-
innerField.add(field1);
innerField.add(field2);
// One instance can come, or multiple instance of value comes in an array, then it needs to be put in Array Type.
ArrayType getArrayInnerType = DataTypes.createArrayType(DataTypes.createStructType(innerField));
StructField getArrayField = DataTypes.createStructField(<name of field>, getArrayInnerType,true);
You can then create the schema :-
StructType structuredSchema = DataTypes.createStructType(getArrayField);
Then I read the json using the schema generated using the Dataset API.
Dataset<Row> dataRead = sqlContext.read().schema(structuredSchema).json(fileName);

Create a dynamically populated table from JSON API data

I'm trying to populate a table with the output of several API sources having extracted several identical variables from each source. So far I have most of the pieces I need worked out, such as how to retrieve the data, create the table and import the data in order to create the table using an existing framework. What I don't have is a means of formatting the API data to meet the framework's requirement such as..
Required format:
[
{id:1, name:"Billy Bob", age:"12", gender:"male", height:1, col:"red", dob:"", cheese:1},
{id:2, name:"Mary May", age:"1", gender:"female", height:2, col:"blue", dob:"14/05/1982", cheese:true},
{id:3, name:"Christine Lobowski", age:"42", height:0, col:"green", dob:"22/05/1982", cheese:"true"},
]
Create a table from the formatted data...
<div id="example-table"></div>
$("#example-table").tablear({
columns:[
{title:"Name", field:"name"},
{title:"Age", field:"age"},
{title:"Gender", field:"gender"},
{title:"Height", field:"height"},
{title:"Favourite Color", field:"col"},
{title:"Date Of Birth", field:"dob"},
{title:"Cheese Preference", field:"cheese"},
],
});
Whenever you get json as response you need to parse the json and then play with that.
Example:
if(response)
{
var myObj = parse.JSON(response) // this will give you the javascript
//object
}
There is also another option of using jquery datatable using server side processing. This automatically creates a table for you.
Jquery Datatable

Azure tables unable to store flattened JSON

I am using the npm flat package, and arrays/objects are flattened, but object/array keys are surrounded by '' , like in 'task_status.0.data' using the object below.
These specific fields do not get stored into AzureTables - other fields go through, but these are silently ignored. How would I fix this?
var obj1 = {
"studentId": "abc",
"task_status": [
{
"status":"Current",
"date":516760078
},
{
"status":"Late",
"date":1516414446
}
],
"student_plan": "n"
}
Here is how I am using it - simplified code example: Again, it successfully gets written to the table, but does not write the properties that were flattened (see further below):
var flatten = require('flat')
newObj1 = flatten(obj1);
var entGen = azure.TableUtilities.entityGenerator;
newObj1.PartitionKey = entGen.String(uniqueIDFromMyDB);
newObj1.RowKey = entGen.String(uniqueStudentId);
tableService.insertEntity(myTableName, newObj1, myCallbackFunc);
In the above example, the flattened object would look like:
var obj1 = {
studentId: "abc",
'task_status.0.status': 'Current',
'task_status.0.date': 516760078,
'task_status.1.status': 'Late',
'task_status.1.date': 516760078,
student_plan: "n"
}
Then I would add PartitionKey and RowKey.
all the task_status fields would silently fail to be inserted.
EDIT: This does not have anything to do with the actual flattening process - I just checked a perfectly good JSON object, with keys that had 'x.y.z' in it, i.e. AzureTables doesn't seem to accept these column names....which almost completely destroys the value proposition of storing schema-less data, without significant rework.
. in column name is not supported. You can use a custom delimiter to flatten your objects instead.
For example:
newObj1 = flatten(obj1, {delimiter: '__'});

Parsing Google Custom Search API for Elasticsearch Documents

After retrieving results from the Google Custom Search API and writing it to JSON, I want to parse that JSON to make valid Elasticsearch documents. You can configure a parent - child relationship for nested results. However, this relationship seems to not be inferred by the data structure itself. I've tried automatically loading, but not results.
Below is some example input that doesn't include things like id or index. I'm trying to focus on creating the correct data structure. I've tried modifying graph algorithms like depth-first-search but am running into problems with the different data structures.
Here's some example input:
# mock data structure
google = {"content": "foo",
"results": {"result_one": {"persona": "phone",
"personb": "phone",
"personc": "phone"
},
"result_two": ["thing1",
"thing2",
"thing3"
],
"result_three": "none"
},
"query": ["Taylor Swift", "Bob Dole", "Rocketman"]
}
# correctly formatted documents for _source of elasticsearch entry
correct_documents = [
{"content":"foo"},
{"results": ["result_one", "result_two", "result_three"]},
{"result_one": ["persona", "personb", "personc"]},
{"persona": "phone"},
{"personb": "phone"},
{"personc": "phone"},
{"result_two":["thing1","thing2","thing3"]},
{"result_three": "none"},
{"query": ["Taylor Swift", "Bob Dole", "Rocketman"]}
]
Here is my current approach this is still a work in progress:
def recursive_dfs(graph, start, path=[]):
'''recursive depth first search from start'''
path=path+[start]
for node in graph[start]:
if not node in path:
path=recursive_dfs(graph, node, path)
return path
def branching(google):
""" Get branches as a starting point for dfs"""
branch = 0
while branch < len(google):
if google[google.keys()[branch]] is dict:
#recursive_dfs(google, google[google.keys()[branch]])
pass
else:
print("branch {}: result {}\n".format(branch, google[google.keys()[branch]]))
branch += 1
branching(google)
You can see that recursive_dfs() still needs to be modified to handle string, and list data structures.
I'll keep going at this but if you have thoughts, suggestions, or solutions then I would very much appreciate it. Thanks for your time.
here is a possible answer to your problem.
def myfunk( inHole, outHole):
for keys in inHole.keys():
is_list = isinstance(inHole[keys],list);
is_dict = isinstance(inHole[keys],dict);
if is_list:
element = inHole[keys];
new_element = {keys:element};
outHole.append(new_element);
if is_dict:
element = inHole[keys].keys();
new_element = {keys:element};
outHole.append(new_element);
myfunk(inHole[keys], outHole);
if not(is_list or is_dict):
new_element = {keys:inHole[keys]};
outHole.append(new_element);
return outHole.sort();

MongoDB - Dynamically update an object in nested array

I have a document like this:
{
Name : val
AnArray : [
{
Time : SomeTime
},
{
Time : AnotherTime
}
...arbitrary more elements
}
I need to update "Time" to a Date type (right now it is string)
I would like to do something psudo like:
foreach record in document.AnArray { record.Time = new Date(record.Time) }
I've read the documentation on $ and "dot" notation as well as a several similar questions here, I tried this code:
db.collection.update({_id:doc._id},{$set : {AnArray.$.Time : new Date(AnArray.$.Time)}});
And hoping that $ would iterate the indexes of the "AnArray" property as I don't know for each record the length of it. But am getting the error:
SyntaxError: missing : after property id (shell):1
How can I perform an update on each member of the arrays nested values with a dynamic value?
There's no direct way to do that, because MongoDB doesn't support an update-expression that references the document. Moreover, the $ operator only applies to the first match, so you'd have to perform this as long as there are still fields where AnArray.Time is of $type string.
You can, however, perform that update client side, in your favorite language or in the mongo console using JavaScript:
db.collection.find({}).forEach(function (doc) {
for(var i in doc.AnArray)
{
doc.AnArray[i].Time = new Date(doc.AnArray[i].Time);
}
db.outcollection.save(doc);
})
Note that this will store the migrated data in a different collection. You can also update the collection in-place by replacing outcollection with collection.