Mongodb - Add Schema for existing collection - json

I have a collection in my MongoDB with 13 million of records. Unfortanelly, when I created this collection, no schema was created for it. I would like to know if there is any method that I could add an JSON schema beyond backup the entire the database, create the schema and upload all the data.

You can apply a JSON schema to an existing collection using the collMod command to add a new JSON schema to the collection https://docs.mongodb.com/manual/core/schema-validation/. An example below. However it will only apply to new write operations it will not run against existing documents in the collection.
db.runCommand( {
collMod: "contacts",
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone", "name" ],
properties: {
phone: {
bsonType: "string",
description: "must be a string and is required"
},
name: {
bsonType: "string",
description: "must be a string and is required"
}
}
} },
validationLevel: "moderate"
} )

Related

Is there a way to transform a JSON Schema definition into a Big Query Schema definition?

From https://cloud.google.com/bigquery/docs/schemas a Schema Definition File looks like
[
{
"name": string,
"type": string,
"mode": string,
"fields": [
{
object (TableFieldSchema)
}
],
"description": string,
"policyTags": {
"names": [
string
]
},
"maxLength": string,
"precision": string,
"scale": string,
"collation": string,
"defaultValueExpression": string
},
{
"name": string,
"type": string,
...
}
]
Is there any tool/product that can take a https://json-schema.org file, and convert it to the form that Big Query prefers?
You can detect a json schema from a file (which is stock into a bucket in the same GCP project for example) with using a external table link to your file. The data from your file will be print into Bigquery. (you can use command line too, i never use it but it exist too )
Example in csv (json is possible too) :
Create or replace external table projectGCP.DatasetsGCP.TableGCP OPTIONS ( format = 'CSV', uris = ['gs://nameofmybucket/*pattern_i_want_tobe_detect_inthe_namefile.csv'] )
After doing that, you can go to the table created just before and get Bigquery schema of the table.
Here more information how you can do it (You can provide the schema inline (on the command line) or you can provide a JSON file containing the schema definition, it's possible too) : https://cloud.google.com/bigquery/docs/external-table-definition

Extract attributes from a JSON node using JSON Path (non-array structure / no extra code)

I'm stuck using JSONPath to extract a list of attributes from a JSON Object / Node without using extra code, just using JSON Path.
Here is an example of the JSON Object
{
"MySchema": {
"fields": {
"FieldOne": {
"required": false,
"type": "long",
},
"FieldTwo": {
"required": false,
"type": "string",
},
"FieldThree": {
"required": false,
"type": "string",
}
}
}
This is just an example, for my production use case, the node ´fields´ contains hundreads of sub nodes. And I cannot parse them manually.
Getting the values of each "Field node" is easy by using :
$MySchema.fields..required
$MySchema.fields..type
But what about the JSONPath expression to use to get the following list (the name of each node only :
FieldOne
FieldTwo
FieldThree
I didn't find the answer in previous posts, most of them are related to arrays / lists.
Kind regards,

Can Filebeat parse JSON fields instead of the whole JSON object into kibana?

I am able to get a single JSON object in Kibana:
By having this in the filebeat.yml file:
output.elasticsearch:
hosts: ["localhost:9200"]
How can I get the individual elements in the JSON string. So say if I wanted to compare all the "pseudorange" fields of all my JSON objects. How would I:
Select "pseudorange" field from all my JSON messages to compare them.
Compare them visually in kibana. At the moment I can't even find the message let alone the individual fields in the visualisation tab...
I have heard of people using logstash to parse the string somehow but is there no way of doing this simply with filebeat? If there isn't then what do I do with logstash to help filter the individual fields in the json instead of have my message just one big json string that I cannot interact with?
I get the following output from output.console, note I am putting some information in <> to hide it:
"#timestamp": "2021-03-23T09:37:21.941Z",
"#metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.8.14",
"truncated": false
},
"message": "{\n\t\"Signal_data\" : \n\t{\n\t\t\"antenna type:\" : \"GPS\",\n\t\t\"frequency type:\" : \"GPS\",\n\t\t\"position x:\" : 0.0,\n\t\t\"position y:\" : 0.0,\n\t\t\"position z:\" : 0.0,\n\t\t\"pseudorange:\" : 20280317.359730639,\n\t\t\"pseudorange_error:\" : 0.0,\n\t\t\"pseudorange_rate:\" : -152.02620448094211,\n\t\t\"svid\" : 18\n\t}\n}\u0000",
"source": <ip address>,
"log": {
"source": {
"address": <ip address>
}
},
"input": {
"type": "udp"
},
"prospector": {
"type": "udp"
},
"beat": {
"name": <ip address>,
"hostname": "ip-<ip address>",
"version": "6.8.14"
},
"host": {
"name": "ip-<ip address>",
"os": {
<ubuntu info>
},
"id": <id>,
"containerized": false,
"architecture": "x86_64"
},
"meta": {
"cloud": {
<cloud info>
}
}
}
In Filebeat, you can leverage the decode_json_fields processor in order to decode a JSON string and add the decoded fields into the root obejct:
processors:
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 2
target: ""
overwrite_keys: true
add_error_key: false
Credit to Val for this. His answer worked however as he suggested my JSON string had a \000 at the end which stops it being JSON and prevented the decode_json_fields processor from working as it should...
Upgrading to version 7.12 of Filebeat (also ensure version 7.12 of Elasticsearch and Kibana because mismatched versions between them can cause issues) allows us to use the script processor: https://www.elastic.co/guide/en/beats/filebeat/current/processor-script.html.
Credit to Val here again, this script removed the null terminator:
- script:
lang: javascript
id: trim
source: >
function process(event) {
event.Put("message", event.Get("message").trim());
}
After the null terminator was removed the decode_json_fields processor did its job as Val suggested and I was able to extract the individual elements of the JSON field which allowed Kibana visualisation to look at the elements I wanted!

How to REFERENCE one collection from another collection in MongoDB using json schema validation

I have read many articles, posts, and stack overflow solutions for understanding how to reference one collection from another collection using JSON schema validation in MongoDB. I would appreciate it if you can give the solution to the problem with an example of how referencing works in JSON schema validation for MongoDB
Note: This is not an actual data model but something similar to the problem I am trying to solve
Q1) I have students' data and each student can have many courses.
The student collection JSON schema is:
db.createCollection("student",{
validator: { $jsonSchema: {
bsonType: "object",
required:["name", "dob", "course"],
properties: {
name :{
bsonType: "string",
maxLength:40,
},
age :{
bsonType: "int",
maxLength:3,
},
dob:{
bsonType: "string",
},
course:
{
### reference the courses collection to store data
},
}
}
}
)
and the courses collection JSON schema is this:
db.createCollection("course",{
validator: { $jsonSchema: {
bsonType: "object",
properties: {
name :{
bsonType: "string",
maxLength:40,
},
instructor:{
bsonType: "string",
maxLength:40,
},
credits{
bsonType : "int",
maxLength: 2
}
}
}
}
)
I do not want to embed the course collection in the student collection but reference the course collection from the student collection. How can I do this in JSON schema for Mongo DB since $ref cannot be used to link two collections in JSON schema in MongoDB?
I would also appreciate links explaining the solution to this problem with a SCHEMA DESIGN example.
Note: I have gone through the whole MongoDB documentation, and JSON schema documentation to find a solution to this problem.

Creating Ext.data.Model using JSON config

In the app we're developing, we create all the JSON at the server side using dinamically generated configs (JSON objects). We use that for stores (and other stuff, like GUIs), with a dinamically generated list of its data fields.
With a JSON like this:
{
"proxy": {
"type": "rest",
"url": "/feature/163",
"timeout": 600000
},
"baseParams": {
"node": "163"
},
"fields": [{"name": "id", "type": "int" },
{"name": "iconCls", "type": "auto"},
{"name": "text","type": "string"
},{ "name": "name", "type": "auto"}
],
"xtype": "jsonstore",
"autoLoad": true,
"autoDestroy": true
}, ...
Ext will gently create an "implicit model" with which I'll be able to work with, load it on forms, save it, delete it, etc.
What I want is to specify through a JSON config not the fields, but the model itself. Is this possible?
Something like:
{
model: {
name: 'MiClass',
extends: 'Ext.data.Model',
"proxy": {
"type": "rest",
"url": "/feature/163",
"timeout": 600000},
etc... }
"autoLoad": true,
"autoDestroy": true
}, ...
That way I would be able to create a whole JSON from the server without having to glue stuff using JS statements on the client side.
Best regards,
I don't see why not. The syntax to create a model class is similar to that of store and components:
Ext.define('MyApp.model.MyClass', {
extend:'Ext.data.Model',
fields:[..]
});
So if you take this apart you could call Ext.define(className,config);
where className is a string and config is a JSON object and both are generated on the server.
There's no way to achieve what I want.
The only way you can do it is by means of defining the fields of the Ext.data.Store and have it to generate the implicit model by using the fields configuration.