Selector query comparing two fields Cloudant/CouchDB/Mango - json

I have a CouchDB database, which uses a query language Mango - which seems to be the same as Cloudant's query language.
I'm trying to search and compare two fields to each other and only return the relevant results when they're equal.
For example:
{
"_id": "ACCEPT0",
"_rev": "1-92ea4e727271aefd0a2befed0d4bb736",
"OfferID": "OFFER0"
}
{
"_id": "ACCEPT1",
"_rev": "3-986ca6e717b225ac909d644de54d5f7d",
"OfferID": "OFFER3"
}
{
"_id": "OFFER0",
"_rev": "1-2af5f5c7b1c59dd3f0997f748a367cb2",
"From": "merchant1",
"To": "customer1"
}
{
"_id": "OFFER1",
"_rev": "6-f0927c5d4f9fd8a2d2b602f1c265d6d5",
"From": "merchant1",
"To": "customer2"
}
I trying to come up with a query which will, in this example, return "OFFER0" - since OFFER0 exists in an "OfferID"
EDIT (clarification): The query needs to be able to select all the _id's which begin with OFFER and which exist in an OfferID field.
I know I can set this up with a view (as seen from: Cloudant query to return records where 2 fields are equal), but I need this in a Mango query as it'll be running over Hyperledger

You can easily return documents whose _id field starts with "OFFER" with the following query:
{
"selector": {
"_id": {
"$regex": "^OFFER"
}
}
}
but this is likely to be inefficient because Cloudant has to scan the whole database, testing each documents _id field with that regular expression.
A better way to design your data may be to have a type field which distinguishes between the document types in your database e.g.
{
"_id": "OFFER0",
"_rev": "1-2af5f5c7b1c59dd3f0997f748a367cb2",
"type": "offer",
"From": "merchant1",
"To": "customer1"
}
and then a query to return all documents where type = 'offer' becomes:
{
"selector": {
"type": "offer"
}
}
I don't fully understand the part of the question where you say "which exist in an OfferID field." but it's important to note that Cloudant Query & Mango can only query single documents - you can't say "get me all the documents which are offers, where another document has a certain property". Include all the data you need in each document and then you'll be able to query it cleanly.

Related

JSON query matching

For the given input JSON:
{
"person": {
"name": "John",
"age": 25
},
"status": {
"title": "assigned",
"type": 3
}
}
I need to build a string query that I could use to answer if the given JSON matches it or not. For example if the given person's name is "John" and his age is in the range of 20..30 and his status is not 4.
I need the query to be presented as a string and a commonly known library that can run it. I need it on multiple platforms (iOS, Android, Xamarin). I've tried JSON Path and JSON schemas, but didn't really figure out if it's able to achieve that with them. JSON Path seems to be specified on finding a single value in the JSON by a certain condition and JSON Schema mostly checks for the data structure and types.
Ok, found the solution. If I format the whole input object as a single-element array like that:
[
{
"person": {
"name": "John",
"age": 25
},
"status": {
"title": "assigned",
"type": 3
}
}
]
That will allow me to use JSON Path expressions:
$[?(#.person.name == 'John' && #.person.age >= 20 && #.status.type != 4)]
Basically if it doesn't match there won't be a result.

Pentaho Kettle: How to dynamically fetch JSON file columns

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.

Reading Inconsistent Nested JSON in Athena

In Athena, I am reading some nested JSON files into a table. The field that actually contains the nested JSON has an inconsistent number of fields within it across the different files in the raw data.
Sometimes the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:53.110 AM",
"data": {
"a": "asd",
"b": "adf",
"body": {
"sid": {
"uif": "yes",
"sidd": "no",
"state": "idle"
}
},
"category": "scene"
}
}
Other times the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:45.436 AM",
"data": {
"a": "event",
"b": "state",
"body": {
"persona": {
"one": {
"movement": "idle"
}
}
},
"category": "scene"
}
}
Other times the "body" field contains both the "sid" struct and the "persona" struct.
As you can see the fields given within "body" are not always consistent. I tried to add all of the possible fields and their structures within my CREATE EXTERNAL TABLE query. However, the "data" column that contains the "body" field still does not fill and remains blank when I "preview table" in Athena.
In the CREATE TABLE DDL, is there a way to indicate that I want to fill all of columns that aren't present in the nested JSON of each file with null values?
Furthermore, the 'names' given to the fields in the query do not have to correspond to the key values in the raw JSON. It seems Athena is simply reading the structure and nothing else. Is there a way to indicate which JSON key corresponds to which Athena field name directly? So that if some fields are missing from the "body" of one file, Athena can know which one is missing and fill it in as null?

Update MongoDB document using JSON

Is there a way to update a complex MongoDB document from C# using JSON? For example, suppose I have the following document:
{
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
Now I want to update the age of the second book's author (Sam Hill) from 20 to 21. Suppose I have the following JSON representation:
{
"readingList":
[
{
"title": "Title2",
"author":
{
"age": 21
}
}]
}
Basically the second JSON string is like the first one, minus all the fields and array elements that don't change, except for one field in any array being looked at that uniquely identifies that index. In this case, the "age" field is included since it is being updated with the given value. The "title" field is given to locate the right array element while searching for the field to update. There may also be even more subdocuments and arrays to go through, and the format is not static (it may change at a later time). This is just a simplified example.
Is it possible to pass in something like this to some function and update the correct field that way? Is there something at least similar to this, so I can just pass in some JSON to do the update?
The reason I am looking to do it this way, rather than through simpler means, is because I want to keep track of a history of changes to documents, and if I want to backtrack to an earlier version, I want an easy way to do so that can handle this level of complexity.
UPDATE:
I have some clarifications to make. In this particular scenario I have no way to predict what kinds of changes would need to be made. A change could be made to any field at any time, and that field could be anywhere in the document, possibly at the top level, or within multiple nested subdocuments/arrays. The data we're dealing with is for a separate party that may use it and modify it at will, so we have no control over what they choose to do with it. In addition, there is no fixed schema. The other party could add new fields, including new subdocuments or arrays, or delete them.
The reason I'm asking this question is because I would like to store a history of changes to documents in such a way that I could revert to an older snapshot of the document by applying the changes in reverse. In this case, changing the age from to 20 to 21 would revert the document to an older state (assuming that someone messed with the age beforehand and made it 20, and I wanted to fix it back to 21). Since somebody could make any change they wanted to the system, including to the underlying structure of the data itself, I can't just come up with my own schema, or hardcode a solution that changes specific fields using this specific schema.
In this example, the change in age from 20 to 21 would be from a record in the history whose structure I couldn't predict beforehand. So I am looking for an efficient solution to apply an unpredictable update to a document given a simplified JSON representation of the change to be made.
I am also open to alternatives that don't involve JSON if they are fairly efficient. I brought up JSON because I figured that, given MongoDB's usage of JSON to structure documents, it would make the most sense, and perhaps be superior to something like string manipulation. Another alternative I considered would involve storing the change using some kind of custom dot notation, like this: readingList[ISBN:3513531513551].author.age=21"
This would require me to create a custom function to interpret the string and turn it into something useful though, so it doesn't sound like the best solution.
Hi friend I used below JSON document
{
"_id" : ObjectId("56a99c121f25cc3a3c709151"),
"name" : "John Smith",
"age" : 35,
"readingList" : [
{
"title" : "Title1",
"ISBN" : NumberLong(6246246426724),
"author" : {
"name" : "James Johnson",
"age" : 40
}
},
{
"title" : "Title2",
"ISBN" : NumberLong(3513531513551),
"author" : {
"name" : "Sam Hill",
"age" : "25"
}
}
]
}
I just used condition as author name is Sam Hill and execute below query in C# and its work.
IMongoQuery query = Query.And(Query.EQ("name", "John Smith"), Query.EQ("readingList.author.name", "Sam Hill"));
var result =collection.Update(query,
MongoDB.Driver.Builders.Update.Set("readingList.$.author.age", "21"));
you can query your main document let's assume your main collection is named "books" this is the structure:
{
"id":"123",
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
// you need a query that returns the main document by id for example, when you have the main document you can query at the one you want to modify in the list and assing it to a varibale let's say readItem, then do the modifications you need and after that you can update only the fields you need using set and onle the element in the array using "$" something like:
readItem.title = "some new title";
readItem.age++;
var update = MongoDB.Driver.Builders.Update.Set("readingList.$", BsonDocumentWrapper.Create(readItem));
Update<Book>(query, update);
Actually I would not advise you to choose this kind of data model because in my experience it will get pretty messy. Still, you might have some very specific requirements which might force you to have this and only this data model.
I would create two collections: persons and readinglists.
persons would look like:
{
"id":"123",
"name": "John Smith",
"age": 35
}
and readinglists would look like (note that it has a compound natural id):
{
"_id": { "personid":"123", "title": "Title1"},
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
}
Then you can easily update the readinglist:
var query = Query.EQ("_id", new BsonDocument(new BsonElement[]{ new BsonElement("personid":"123"), BsonElement("title":"Title1")}));
readingListCollection.Update(query, Update.Set("author.age": 22));
In your data mode you need to know the array index of the second document. It is better to model readingList attribute as a map. In following example I used isbn as a map key:
{
"id":"123",
"name":"John Smith",
"age":35,
"readingList":{
"6246246426724":{
"title":"Title1",
"ISBN":6246246426724,
"author":{
"name":"James Johnson",
"age":40
}
},
"3513531513551":{
"title":"Title2",
"ISBN":3513531513551,
"author":{
"name":"Sam Hill",
"age":20
}
}
}
}
In this data model you can access second book directly. For instance by dot notation:
db.authors.update(
{ item: "123" },
{ $set: { "readingList.3513531513551.author.age": 22 } }
)
Unfortunately I do know C# notation for that but should be straight forward.

Identifying Duplicates in CouchDB

I'm new to CouchDB and document-oriented databases in general.
I've been playing around with CouchDB, and was able to get familiar with creating documents (with perl) and using the Map/Reduce functions in Futon to query the data and create views.
One of the things I'm still trying to figure out is how to identify duplicate values across documents using Futon's Map/Reduce.
For example, if I have the following documents:
{
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
}
{
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
And I wanted to get a list of document id's that had duplicate "name" values, is this something I could do with the Futon Map/Reduce?
The result was hoping to achieve is as follows:
{
"name": "carl",
"dupes": [ "123", "124" ]
}
..or..
{
"carl": [ "123", "124" ]
}
.. which would be the value, and associated document ids which contain those duplicate values.
I've tried a few different things with Map/Reduce, but so far as I understand, the Map function works with data on a per-document basis, and the Reduce functions only allow you to work with the keys/values from a given document.
I know i could just pull the data I need with perl, work magic there, and get the result I want, but I'm trying to work only with CouchDB for now in order to better understand it's benefits / limitations.
Another way I'm thinking about doing this is to use a single document like an RDBMS table:
{
"_id": "names",
"rec1": {
"_id": "123",
"name": "carl",
"timestamp": "2012-01-27T17:06:03Z"
},
"rec2": {
"_id": "124",
"name": "carl",
"timestamp": "2012-01-27T17:07:03Z"
}
}
.. which should allow me to use the Map/Reduce functions in the way I originally thought. However I'm not sure if this is ideal.
I understand that my mind is still stuck in RDBMS land, so much of what I'm trying to do above may not be necessary. Any insight on this would be much appreciated.
Thanks!
Edit: Fixed JSON syntax in some of the examples.
If you merely want a list of unique values, that's pretty easy. If you wish to identify the duplicates, then it gets less easy.
In both cases, a map function like this should suffice:
function (doc) {
emit(doc.name);
}
For your reduce function, just enter _count.
Your view output will look like: (based on your 2 documents)
{
"rows": [
{ "key": "carl", "value": 2 }
]
}
From there, you will have a list of names as well as their frequency. You can take that list and filter it yourself, or you can take the "all couch" route and use a _list function to perform that final filtering.
function (head, req) {
var row, duplicates = [];
while (row = getRow()) {
if (row.value > 1) {
duplicates.push(row);
}
}
send(JSON.stringify(duplicates));
}
Read up about _list functions, they're pretty handy and versatile.