Is Apache Ignite suited for NoSQL schema - json

Is JSON within a JSON supported within Apache Ignite?
Example:
{
"stuff": {
"onetype": [
{"id":1,"name":"John Doe"},
{"id":2,"name":"Don Joeh"}
],
"othertype": {"id":2,"company":"ACME"}
},
"otherstuff": {
"thing": [[1,42],[2,2]]
}
}
Goal is being able to query based on any field in a JSON. So far with Apache Ignite I have seen that with creating a class and then storing object of it - is possible to add indexes and query json on a first level of Key/Value pairs but did not see any example for a nested JSON.
Is it maybe better to use MongoDB or Cassandra for that kind of need (to index and query any nested field within a JSON)?

JSON is treated as a regular string when it's put into a cache.
When a JSON has only a single level, then it's possible to represent it as either POJO or BinaryObject, put it into a cache and benefit from all the querying capabilities, but nested objects cannot be indexed and queried properly so far.
As an option, you could use ScanQueries

Related

Read JSON data values

For example,
if the data in kafka toipc looks like this
{
"header": {
"name": "jake"
},
"body": {
"Data":"!#$%&&"
}
}
So how do I read the value "!#$%&&" from my consumer application? I need to process the data once I get that data
You'll need to consume the data using String Serde, JSON Serde, or define your own.
If you define your own, then you'd call value.getBody().getData(), like any other Java Object, where value is the argument from mapValues, peek, filter, etc. Kafka Streams DSL
For the others, the answer will depend on what JSON library you're using, but the answer isn't unique to Kafka, so read that library's documentation on parsing strings.
Here's one example of consuming using String Serde - https://github.com/confluentinc/kafka-streams-examples/blob/7.1.1-post/src/main/java/io/confluent/examples/streams/JsonToAvroExample.java#L118

How to store large JSON documents(>20MB) in MongoDB without using GridFS

I want to store a large document in MongoDB, however, these are the two ways I will interact with the document:
I do frequent reads of that data and need to get a part of that data using aggregations
When I need to write to the document, I will be building it from scratch again, i.e remove the document that exists and insert a new one.
Here is how a sample document looks like:
{
"objects_1": [
{
}
],
"objects_2": [
{
}
],
"objects_3": [
{
}
],
"policy_1": [
{
}
],
"policy_2": [
{
}
],
"policy_3": [
{
}
]
}
Here is how I want to access that data:
{
"objects_1": [
{
}
}
If I was storing it in a conventional way, I would write a query like this:
db.getCollection('configuration').aggregate([
{ $match: { _id: "FAAAAAAAAAAAA" } },
{ $project: {
"_id": 0,
"a_objects": {
$filter: {
input: "$settings.a_objects",
as: "arrayItem",
cond: { $eq: [ "$$arrayItem.name", "objectName" ] }
}
}
}}
])
However, since the size of the document is >16 MB, we cant save it directly to MongoDB. The size can be a max of 50MB.
Solutions I thought of:
I thought of storing the json data in gridfs format and reading it as per the docs here: https://docs.mongodb.com/manual/core/gridfs/ . However, then I would need to read the entire file every time I want to look up only one object inside the large json blob, and I need to do such reads frequently, on multiple large documents which would lead to high memory usage
I thought of splitting the json into parts and storing each object in it's own separate collection, and when I need to fetch the entire document, I can reassemble the json
How should I approach this problem? Is there something obvious that I am missing here?
I think your problem is that you're not using the right tools for the job, or not using the tools you have in the way they were meant to be used.
If you want to persist large objects as JSON then I'd argue that a database isn't a natural choice for that - especially if the objects are large. I'd be looking at storage systems designed to do that well (say if your solution is on Azure/AWS/GCP see what specialist service they offer) or even just the file system if you run on a local server.
There's no reason why you can't have the JSON in a file and related data in a database - yes there are issues with that but the limitations of MongoDB won't be one of them.
I do frequent reads of that data and need to get a part of that data using aggregations
If you are doing frequent reads, and only for part of the data, then forcing your system to always read the whole record means you are just penalizing yourself. One option is to store the bits that are highly read in a way that doesn't incur the performance penalty of the full read.
Storing objects as JSON means you can change your program and data without having to worry about what the database looks like, its convenient. But it also has it's limitations. If you think you have hit those limitations then now might be the time to consider a re-architecture.
I thought of splitting the JSON into parts and storing each object in it's own separate collection, and when I need to fetch the entire document, I can reassemble the JSON
That's definably worth looking into. You just need to make sure that the different parts are not stored in the same table / rows, otherwise there'll be no improvement. Think carefully about how you spilt the objects up - think about the key scenarios the objects deal with - e.g. you mention reads. Designing the sub-objects to align with key scenarios is the way to go.
For example, if you commonly show an object's summary in a list of object summaries (e.g. search results), then the summary text, object name, id are candidates for object data that you would split out.

manipulating (nested) JSON keys and there values, using nifi

I am currently facing an issue where I have to read a JSON file that has mostly the same structure, has about 10k+ lines, and is nested.
I thought about creating my own custom processor which reads the JSON and replaces several matching key/values to the ones needed. As I am trying to use NiFi I assume that there should be a more comfortable way as the JSON-structure itself is mostly consistent.
I already tried using the ReplaceText processor as well as the JoltTransformJson processor, but I could not figure out. How can I transform both keys and values, if needed? For example: if there is something like this:
{
"id": "test"
},
{
"id": "14"
}
It might be necessary to turn the "id" into "Number" and map "test" to "3", as I am using different keys/values in my jsonfiles/database, so they need to fit those. Is there a way of doing so without having to create my own processor?
Regards,
Steve

Mongodb: Store a tree as one nested document or store one document per node?

I am using MongoDB and I want to store various trees in it.
One way of storing tree is to store each node as a document with references to its children/parent/ancestors (as mentioned here)
Other way of storing it is to store whole tree as one document with children as sub-documents. e.g.
tree : {
"title" : "root",
"children" : [
{
"title" : "node_1",
"children" : [
...
]
},
{
"title" : "node_2",
"children" : [
...
]
}
]
}
Question: Which way is recommended for storing trees?
Here are the operations that I want to perform on my data:
Add a node
Delete a node
Update a node
Get the json of whole tree
As I am planning to show this tree on UI using JsTree(you can recommend a better alternative to JsTree), which expects json data in nested format (way 2), I thought of storing the data in the same way instead of way 1.
If I store the json data in the db in way 1, then I will have to map a java object for each document/node and then manually create a tree object in java by pointing each parent to its corresponding children and then convert that java-tree-object back to json to get that nested json.
Jave object for each node looks like:
class Node {
private String title:
private List<Node> children;
}
It looks like you are going to lots of operations in different levels of nested nodes in the tree. Although MongoDB can store a structure like you describe, it is not very good at allowing you to do update at lots of nested levels.
Therefore I would recommend you to store each node as it's own document, and look at where you store the parent-child relations. Remember to optimise the schema for data operations.
I'd go with your "way 1" in this case. If you would not have to change the tree a lot, and you have say 1000x more read than write operations to the tree, then you could consider using "way 2" and just deal with the extra work it takes to update the nodes at a few levels deep.

How to ensure constraint that keys should be one of the items in some user defined array

I am using JSON for defining some configuration files and I want to validate them with json scheme. My problem is that I want to ensure that keys of some object should be subset of items of an array defined in same JSON:
ex:
Valid:
{
"files": ["file1", "file2"],
"filelocations": {
"file1": "/etc/globalconfigs/file1.conf",
"file2": "/usr/bin/file2.sh"
}
}
Invalid (otherkey is not in files):
{
"files": ["file1", "file2"],
"filelocations": {
"file1": "/etc/globalconfigs/file1.conf",
"otherkey": "/usr/bin/file2.sh"
}
}
etc. What I want is to ensure that keys of filelocations are found in files array.
Although in this example, we can change the structure of JSON by combining keys-values so that there is no need to have this kind of constraint, in my case I can't change JSON like this, so it is nice to have a validation mechanism for this.
How can I achieve this?
You cannot achieve this with JSON Schema, there are no combination of keywords which can guarantee this.
If you are adventurous (and I can even code that for you), you can use my JSON Schema API and code a custom keyword to fit your needs, however. It is doable.