Format for storing json in Amazon DynamoDB - json

I've got JSON file that looks like this
{
"alliance":{
"name_part_1":[
"Ab",
"Aen",
"Zancl"
],
"name_part_2":[
"aca",
"acia",
"ythrae",
"ytos"
],
"name_part_3":[
"Alliance",
"Bond"
]
}
}
I want to store it in dynamoDB.
The thing is that I want a generator that would take random elements from fields like name_part_1, name_part_2 and others (number of name_parts_x is unlimited and overalls number of items in each parts might be several hundreds) and join them to create a complete word. Like
name_part_1[1] + name_part_2[10] + name_part[3]
My question is that what format I should use to do that effectively? Or NoSQL shouldn't be used for that? Should I refactor JSON for something like
{
"name": "alliance",
"parts": [ "name_part_1", "name_part_2", "name_part_3" ],
"values": [
{ "name_part_1" : [ "Ab ... ] }, { "name_part_2": [ "aca" ... ] }
]
}

This is a perfect use case for DynamoDB.
You can structure like this,
NameParts (Table)
namepart (partition key)
range (hash key)
namepart: name_part_1 range: 0 value: "Ab"
This way each name_part will have its own range and scalable. You can extend it to thousands or even millions.
You can do a batch getitem from the sdk of your choice and join those values.
REST API reference:
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html
Hope it helps.

You can just put the whole document as it is in DynamoDB and then use document path to access the elements you want.
Document Paths
In an expression, you use a document path to tell DynamoDB where to
find an attribute. For a top-level attribute, the document path is
simply the attribute name. For a nested attribute, you construct the
document path using dereference operators.
The following are some examples of document paths. (Refer to the item
shown in Specifying Item Attributes.)
A top-level scalar attribute: ProductDescription A top-level list
attribute. (This will return the entire list, not just some of the
elements.) RelatedItems The third element from the RelatedItems list.
(Remember that list elements are zero-based.) RelatedItems[2] The
front-view picture of the product. Pictures.FrontView All of the
five-star reviews. ProductReviews.FiveStar The first of the five-star
reviews. ProductReviews.FiveStar[0] Note The maximum depth for a
document path is 32. Therefore, the number of dereferences operators
in a path cannot exceed this limit.
Note that each document requires a unique Partition Key.

Related

How can I query for multiple values after a wildcard?

I have a json object like so:
{
_id: "12345",
identifier: [
{
value: "1",
system: "system1",
text: "text!"
},
{
value: "2",
system: "system1"
}
]
}
How can I use the XDevAPI SearchConditionStr to look for the specific combination of value + system in the identifier array? Something like this, but this doesn't seem to work...
collection.find("'${identifier.value}' IN identifier[*].value && '${identifier.system} IN identifier[*].system")
By using the IN operator, what happens underneath the covers is basically a call to JSON_CONTAINS().
So, if you call:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system1')
.execute()
What gets executed, in the end, is (simplified):
JSON_CONTAINS('["1", "2"]', '"2"') AND JSON_CONTAINS('["system1", "system1"]', '"system1"')
In this case, both those conditions are true, and the document will be returned.
The atomic unit is the document (not a slice of that document). So, in your case, regardless of the value of value and/or system, you are still looking for the same document (the one whose _id is '12345'). Using such a statement, the document is either returned if all search values are part of it, and it is not returned if one is not.
For instance, the following would not yield any results:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system2')
.execute()
EDIT: Potential workaround
I don't think using the CRUD API will allow to perform this kind of "cherry-picking", but you can always use SQL. In that case, one strategy that comes to mind is to use JSON_SEARCH() for retrieving an array of paths corresponding to each value in the scope of identifier[*].value and identifier[*].system i.e. the array indexes and use JSON_OVERLAPS() to ensure they are equal.
session.sql(`select * from collection WHERE json_overlaps(json_search(json_extract(doc, '$.identifier[*].value'), 'all', ?), json_search(json_extract(doc, '$.identifier[*].system'), 'all', ?))`)
.bind('2', 'system1')
.execute()
In this case, the result set will only include documents where the identifier array contains at least one JSON object element where value is equal to '2' and system is equal to system1. The filter is effectively applied over individual array items and not in aggregate, like on a basic IN operation.
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

Is it possible to get a sorted list of attribute values from a JSON array using JSONPath

Given JSON like:
[
{
"Serial no": 994,
},
{
"Serial no": 456,
}
]
I know this query will give me an array of all Serial no values, in the order they are in the JSON: $..['Serial no']
I'm not sure exactly what sorting capabilities JSONPath has but I think you can use / and \ to sort - but how are they used to modify my query string in this case? I am only interested doing this in pure JSONPath, not JS or post-query sorting - that's easy, I just want to know if I can avoid it.
This is a source I found suggesting sorting is supported but it might be product-specific?
I'm using http://www.jsonquerytool.com/ to test this

How to create nodes with variable labels in cypher?

I am using JSON APOC plugin to create nodes from a JSON with lists in it, and I am trying to create nodes whose label is listed as an element in the list:
{
"pdf":[
{
"docID": "docid1",
"docLink": "/examplelink.pdf",
"docType": "PDF"
}
],
"jpeg":[
{
"docID": "docid20",
"docLink": "/examplelink20.pdf",
"docType": "JPEG"
}
],
...,}
And I want to both iterate through the doctypes (pdf, jpeg) and set the label as the docType property in the list. Right now I have to do separate blocks for each doctype list (jpeg: [], pdf:[]):
WITH "file:////input.json" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.pdf as doc
MERGE (d:PDF {docID: doc.docID})
I'd like to loop through the doctype lists, creating the node for each doctype with the label as either the list name (pdf) or the node's docType name (PDF). Something like:
WITH "file:////input.json" AS url
CALL apoc.load.json(url) YIELD value
for each doctypelist in value
for each doc in doctype list
MERGE(d:doc.docType {docID: doc.docID})
Or
WITH "file:////input.json" AS url
CALL apoc.load.json(url) YIELD value
for each doctypelist in value
for each doc in doctype list
MERGE(d {docID: doc.docID})
ON CREATE SET d :doc.docType
Cypher currently does not support this. To set a label, you must hardcode it into the Cypher. You could do filters, or multiple matches to do this in a tedious way, but if you aren't allowed to install any plug-ins to your Neo4j db, I would recommend either just putting an index on the type, or use a node+relation instead of the label. (There are a lot of valid doc types, so if you have to support them all, pure Cypher will make it very painful.)
Using APOC however, there is a procedure specifically for this apoc.create.addLabels
CREATE (:Movie {title: 'A Few Good Men', genre: 'Drama'});
MATCH (n:Movie)
CALL apoc.create.addLabels( id(n), [ n.genre ] ) YIELD node
REMOVE node.genre
RETURN node;

How to query documents which are arrays

I'm a novice Couchbase user and I have a bucket which I've created that contains documents which are actually arrays in the form of:
{
"key": [
{
"data1": "somedata1"
},
{
"data2": "somedata2"
}
]
}
I want to query these documents via N1QL statements and have yet to find a solution to how to do this properly. More specifically, I would like to select fields inside each sub-document that is in an array of a certain key. For example, I would like to access: key.[1].data2 or key.[0].data1
How should I do it?
Couchbase has some reserved keywords that need to be escaped. In this case, key needs to be escaped. For example, if you're querying against my_bucket, then
SELECT my_bucket.`key`[0].data1 FROM my_bucket;
should return somedata1

How to extract JSON values that does not have attribute names?

{
"A1":{
"name":"Ad hoc",
"projectId":0
},
"X2":{
"name":"BBB",
"projectId":101
},
"AB":{
"name":"CCC",
"projectId":102
},
"recordsCount":3
}
For this JSON, how to extract values? i need output like,
A1, Ad hoc
X2, BBB
AB, CCC
experts inputs appreciated.
Analysis
XPath can't read unnamed attributes. It will always result in an exception. If you want to get the values, you need to use JsonPath.
Solution
Even then, it makes sense to add surrounding brackets, otherwise the first level will be consumed as well:
[
{
"A1":{
"name":"Ad hoc",
"projectId":0
},
"X2":{
"name":"BBB",
"projectId":101
},
"AB":{
"name":"CCC",
"projectId":102
},
"recordsCount":3
}
]
You can try the correct query on jsonpath.com. For me (with the additional brackets) the path $.* worked.
To extract the values, you need to use a tExtractJSONFields component in Talend if using a file or REST request.
A valid JSON query could be easily [0] for the field with alphanumeric identifiers.