Import JSON with objects as nested to Elastic Search - json

i've log with thousands records of aggregated data in JSON:
{
"count": 25,
"domain": "domain.tld",
"geoips": {
"AU": 5,
"NZ": 20
},
"ips": {
"1.2.3.4": 5,
"1.2.3.5": 1,
"1.2.3.6": 1,
"1.2.3.7": 1,
"1.2.3.8": 1,
"1.2.3.9": 9,
"1.2.3.10": 7
},
"subdomains": {
"a.domain.tld": 1,
"b.domain.tld": 1,
"c.domain.tld": 1,
"domain.tld": 22
},
"tld": "tld",
"types": {
"1": 3,
"43": 22
}
}
and i have mapping on ES:
"mappings": {
"properties": {
"count": {
"type": "long"
},
"domain": {
"type": "keyword"
},
"ips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"geoips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"subdomains": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"tld": {
"type": "keyword"
},
"types": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
}
}
}
Is there any simple way how import these lines to ES as nested objects ? If i use a bulk insert without modification, the ES will modify mapping by adding a new field for each IP/subdomain/GeoIP instead add it as simple key/val object.
Or only one way is regenerate JSON to key/val nested fields ?

Your mapping is already very good but the data doesn't fit it since the nested data type expects an array of objects, not a single object. So you'll need to transform your nested objects into array of key-value pairs like so:
...
"ips": [
{
"key": "1.2.3.4",
"val": 5
},
{
"key": "1.2.3.5",
"val": 1
},
...
],
"subdomains": [
{
"key": "a.domain.tld",
"val": 1
},
{
"key": "b.domain.tld",
"val": 1
},
...
]
...

Related

JSON Schema with Nested Objects with different properties

The entire JSON file is rather large so I've only taken out the subsection I've had an issue with.
{
"diagrams": {
"5f759d15cd046720c28531dd": {
"_id": "5f759d15cd046720c28531dd",
"offsetX": 320,
"offsetY": 42,
"zoom": 80,
"modified": 1604279356,
"nodes": {
"5f9f5c3ccd046720c28531e4": {
"nodeID": "5f9f5c3ccd046720c28531e4",
"type": "start",
"coords": [
360,
120
],
"data": {
"name": "Start",
"color": "standard",
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e6"
}
],
"steps": []
}
},
"5f9f5c3ccd046720c28531e5": {
"nodeID": "5f9f5c3ccd046720c28531e5",
"type": "block",
"coords": [
760,
120
],
"data": {
"name": "Help Message",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e6",
"5f9f5c3ccd046720c28531e7"
]
}
},
"5f9f5c3ccd046720c28531e6": {
"nodeID": "5f9f5c3ccd046720c28531e6",
"type": "speak",
"data": {
"randomize": false,
"dialogs": [
{
"voice": "Alexa",
"content": "You said help. Do you want to continue?"
}
],
"ports": [
{
"type": "",
"target": "5f9f5c3ccd046720c28531e7"
}
]
}
},
"5f9f5c3ccd046720c28531e7": {
"nodeID": "5f9f5c3ccd046720c28531e7",
"type": "interaction",
"data": {
"name": "Choice",
"else": {
"type": "path",
"randomize": false,
"reprompts": []
},
"choices": [
{
"intent": "",
"mappings": []
},
{
"intent": "",
"mappings": []
}
],
"reprompt": null,
"ports": [
{
"type": "else",
"target": null
},
{
"type": "",
"target": null
},
{
"type": "",
"target": "5f9f5c3ccd046720c28531e9"
}
]
}
},
"5f9f5c3ccd046720c28531e8": {
"nodeID": "5f9f5c3ccd046720c28531e8",
"type": "block",
"coords": [
1170,
260
],
"data": {
"name": "Exit",
"color": "standard",
"steps": [
"5f9f5c3ccd046720c28531e9"
]
}
},
"5f9f5c3ccd046720c28531e9": {
"nodeID": "5f9f5c3ccd046720c28531e9",
"type": "exit",
"data": {
"ports": []
}
}
},
"children": [],
"creatorID": 42661,
"variables": [],
"name": "Help Flow",
"versionID": "5f759d15cd046720c28531db"
}
}
}
The Current JSON Schema Definition I have is:
{
"$schema":"http://json-schema.org/schema#",
"type":"object",
"properties":{
"diagrams":{
"type":"object"
}
},
"required":[
"diagrams",
]
}
The problem I am having is that within diagrams contains multiple objects with a random string as the name e.g "5f759d15cd046720c28531dd".
Then within that object there are properties such as (_id, offsetX) which I want to express as well as a nodes object, which again contains multiple objects with arbitrary names e.g ("5f9f5c3ccd046720c28531e4", "5f9f5c3ccd046720c28531e5", ...) which have a unique node definition where some nodes have different properties to other nodes (nodeID, type, data vs nodeID, type, data, coords).
My question is with all these arbitrary things such as random names as well as different properties per each node. How do I turn it into 1 JSON schema definition which covers all the cases of how a diagram/node can be made.
You can do this with additionalProperties or patternProperties.
additionalProperties applies to any property that isn't declared in properties or patternProperties.
{
"type": "object",
"additionalProperties": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
}
Your property names appear to always be hex numbers. If you want to enforce that those property names are always hex numbers, you can use patternProperties. Any property that matches the regex must conform to that schema.
{
"type": "object",
"patternProperties": {
"^[0-9a-f]{24}$": {
"type": "object",
"properties": {
"_id": { ... },
"offsetX": { ... },
...
}
}
},
"additionalProperties": false
}

How to specify JSON Schema if array is not required when there are no items in it?

Say I have some json data like this:
{
"quantity": 3,
"modifiers": [{}]
}
I do not want modifiers to be present if there is no intent of adding objects into modifiers.
This should be the correct way if there are no modifiers:
{
"quantity": 3,
}
and this should be correct with modifiers:
{
"quantity": 3,
"modifiers": [{
"testing": "some data here"
}]
}
How would I structure the JSON Schema to validate the above-most example to be incorrect?
edit:
Example of JSON Schema for the above examples:
{
"type": "object",
"properties": {
"quantity": {
"type": "integer",
"minimum": 1
},
"modifiers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"testing": {
"type": "string"
}
}
},
"minItems": 1
}
},
"required": [
"quantity"
]
}
(Draft 7 only) To not allow an array to be empty if present, do this:
"if": {
"required": [
"modifiers"
]
},
"then": {
"properties": {
"modifiers": {
"minItems": 1
}
}
}
This will ensure that this is invalid:
{
"quantity" : 1,
"modifiers":[]
}
However note that the following json would be valid:
{
"quantity" : 1,
"modifiers":[{}]
}
If you don't want that, then make sure to include the appropriate required field. This would be how the final schema look like based on the given example/schema above:
{
"type": "object",
"properties": {
"quantity": {
"type": "integer",
"minimum": 1
},
"modifiers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"testing": {
"type": "string"
}
},
"required": [
"testing"
]
},
"minItems": 1
}
},
"required": [
"quantity"
],
"if": {
"required": [
"modifiers"
]
},
"then": {
"properties": {
"modifiers": {
"minItems": 1
}
}
}
}
Now, this would also be invalid:
{
"quantity" : 1,
"modifiers":[{}]
}
And this would be valid:
{
"quantity" : 1,
"modifiers":[{"testing":"343"}]
}
You're only missing one thing. Just add "minProperties": 1 at /properties/modifiers/items.
{
"type": "object",
"properties": {
"quantity": {
"type": "integer",
"minimum": 1
},
"modifiers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"testing": {
"type": "string"
}
},
"minProperties": 1
},
"minItems": 1
}
},
"required": [
"quantity"
]
}

How can I explictly constrain multiple items in a JSON Schema array?

I am creating a JSON schema and want to define an array containing only exact matches for certain items:
An example of the sort of JSON (snippet) would look like:
{
"results":
[
{ "id": 1, "test": true, "volts": 700, "duration": 100 },
{ "id": 2, "test": false }
]
}
This seems to be a combination of OneOf and "additionalProperties": false but I can't work out how that should be used. So far I have:
{
"results":
{
"type": "array",
"items":
{
"type": "object",
"OneOf":
[
{
"id": { "type": "integer" },
"test": { "type": "boolean" },
"volts": { "type": "integer" },
"duration": { "type": "integer" }
},
{
"id": { "type": "integer" },
"test": { "type": "boolean" }
}
],
"additionalProperties": false
}
}
}
I'm using http://www.jsonschemavalidator.net/ to check my JSON.
But when I validate the following JSON against my schema it says it's valid; is the website incorrect or have I done something wrong?
{
"results": [
{
"fred": 7,
"id": 7,
"test": true,
"volts": 7,
"duration": 7
},
{
"fish": 7
}
]
}

JSON Schema for tree structure

I have to build tree like structure of Json data.Each node has an id (an integer, required), a label (a string, optional), and an array of child nodes (optional). Can you help me how to write JSON schema for this Json data. I need to set Id as required in child node as well.
{
"Id": 1,
"Label": "A",
"Child": [
{
"Id": 2,
"Label": "B",
"Child": [
{
"Id": 5,
"Label": "E"
}, {
"Id": 6,
"Label": "E"
}, {
"Id": 7,
"Label": "E"
}
]
}, {
"Id": 3,
"Label": "C"
}, {
"Id": 4,
"Label": "D",
"Child": [
{
"Id": 8,
"Label": "H"
}, {
"Id": 9,
"Label": "I"
}
]
}
]
}
A schema for this structure only needs a definition of a node and a reference to that node. The property Children (renamed from Child) references the node as well.
Here's the schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref": "#/definitions/node",
"definitions": {
"node": {
"properties": {
"Id": {
"type": "integer"
},
"Label": {
"type": "string"
},
"Children": {
"type": "array",
"items": {
"$ref": "#/definitions/node"
}
}
},
"required": [
"Id"
]
}
}
}

How to query nested structure in elasticsearch

Below are two mocked records from my elasticsearch index. I have millions of records in my ES. I am trying to query ES to get all the records that have non-empty/ non-null "tags" field. If a record doesn't have a tag ( like second record below) then I don't want to pull it from ES.
If "books" were not nested then googling around seems like the below query would have worked -
curl -XGET 'host:port/book_indx/book/_search?' -d '{
"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source"}}}}
}'
However I am not finding a solution to query the nested structure. I tried the below with no luck -
{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source.tags"}}}}}
{"query" : {"filtered" : {"filter" : {"exists" :{"field" : "_source":{"tags"}}}}}}
Any suggestions are really appreciated here! Thanks in advance.
{
"_shards": {
"failed": 0,
"successful": 12,
"total": 12
},
"hits": {
"hits": [
{
"_id": "book1",
"_index": "book",
"_source": {
"book_name": "How to Get Organized",
"publication_date": "2014-02-24T16:50:39+0000",
"tags": [
{
"category": "self help",
"topics": [
{
"name": "time management",
"page": 6198
},
{
"name": "calendar",
"page": 10
}
],
"id": "WEONWOIR234LI",
}
],
"last_updated": "2015-11-11T16:28:32.308+0000"
},
"_type": "book"
},
{
"_id": "book2",
"_index": "book",
"_source": {
"book_name": "How to Cook",
"publication_date": "2014-02-24T16:50:39+0000",
"tags": [],
"last_updated": "2015-11-11T16:28:32.308+0000"
},
"_type": "book"
}
],
"total": 1
},
"timed_out": false,
"took": 80
}
Mapping -
"book": {
"_id": {
"path": "message_id"
},
"properties": {
"book_name": {
"index": "not_analyzed",
"type": "string"
},
"publication_date": {
"format": "date_time||date_time_no_millis",
"type": "date"
},
"tags": {
"properties": {
"category": {
"index": "not_analyzed",
"type": "string"
},
"topic": {
"properties": {
"name": {
"index": "not_analyzed",
"type": "string"
},
"page": {
"index": "no",
"type": "integer"
}
}
},
"id": {
"index": "not_analyzed",
"type": "string"
}
},
"type": "nested"
},
"last_updated": {
"format": "date_time||date_time_no_millis",
"type": "date"
}
}
}
Since your tags field has a nested type, you need to use a nested filter in order to query it.
The following filtered query will correctly return only the first document above (i.e. with id book1)
{
"query": {
"filtered": {
"filter": {
"nested": {
"path": "tags",
"filter": {
"exists": {
"field": "tags"
}
}
}
}
}
}
}