How to send JSON file with Filebeat into Elasticsearch - json

I'm trying to send the content of a JSON file into Elasticsearch.
Each file contains only one simple JSON object (just attributes, no array, no nested objects). Filebeat sees the files but they're not sent to Elasticsearch (it's working with csv files so the connection is correct)...
Here is the JSON file (all in one line in the file but I passed it into a JSON formatter to be displayed here):
{
"IPID": "3782",
"Agent": "localhost",
"User": "vtom",
"Script": "/opt/vtom/scripts/scriptOK.ksh",
"Arguments": "",
"BatchQueue": "queue_ksh-json",
"VisualTOMServer": "labc",
"Job": "testJSONlogs",
"Application": "test_CAD",
"Environment": "TEST",
"JobRetry": "0",
"LabelPoint": "0",
"ExecutionMode": "NORMAL",
"DateName": "TEST_CAD",
"DateValue": "05/11/2022",
"DateStart": "2022-11-05",
"TimeStart": "20:58:14",
"StandardOutputName": "/opt/vtom/logs/TEST_test_CAD_testJSONlogs_221105-205814.o",
"StandardOutputContent": "_______________________________________________________________________\nVisual TOM context of the job\n \nIPID : 3782\nAgent : localhost\nUser : vtom\nScript : ",
"ErrorOutput": "/opt/vtom/logs/TEST_test_CAD_testJSONlogs_221105-205814.e",
"ErrorOutputContent": "",
"JsonOutput": "/opt/vtom/logs/TEST_test_CAD_testJSONlogs_221105-205814.json",
"ReturnCode": "0",
"Status": "Finished"
}
The input definition in Filebeat is (it's a merge of data from different web sources):
- type: filestream
id: vtomlogs
enabled: true
paths:
- /opt/vtom/logs/*.json
index: vtomlogs-%{+YYYY.MM.dd}
parsers:
- ndjson:
keys_under_root: true
overwrite_keys: true
add_error_key: true
expand_keys: true
The definition of the index template:
{
"properties": {
"IPID": {
"coerce": true,
"index": true,
"ignore_malformed": false,
"store": false,
"type": "integer",
"doc_values": true
},
"VisualTOMServer": {
"type": "keyword"
},
"Status": {
"type": "keyword"
},
"Agent": {
"type": "keyword"
},
"Script": {
"type": "text"
},
"User": {
"type": "keyword"
},
"ErrorOutputContent": {
"type": "text"
},
"ReturnCode": {
"type": "integer"
},
"BatchQueue": {
"type": "keyword"
},
"StandardOutputName": {
"type": "text"
},
"DateStart": {
"format": "yyyy-MM-dd",
"index": true,
"ignore_malformed": false,
"store": false,
"type": "date",
"doc_values": true
},
"Arguments": {
"type": "text"
},
"ExecutionMode": {
"type": "keyword"
},
"DateName": {
"type": "keyword"
},
"TimeStart": {
"format": "HH:mm:ss",
"index": true,
"ignore_malformed": false,
"store": false,
"type": "date",
"doc_values": true
},
"JobRetry": {
"type": "integer"
},
"LabelPoint": {
"type": "keyword"
},
"DateValue": {
"format": "dd/MM/yyyy",
"index": true,
"ignore_malformed": false,
"store": false,
"type": "date",
"doc_values": true
},
"JsonOutput": {
"type": "text"
},
"StandardOutputContent": {
"type": "text"
},
"Environment": {
"type": "keyword"
},
"ErrorOutput": {
"type": "text"
},
"Job": {
"type": "keyword"
},
"Application": {
"type": "keyword"
}
}
}
The file is seen by Filebeat but it does nothing with it...
0100","log.logger":"input.filestream","log.origin":{"file.name":"filestream/prospector.go","file.line":177},"message":"A new file /opt/vtom/logs/TEST_test_CAD_testJSONlogs_221106-124138.json has been found","service.name":"filebeat","id":"vtomlogs","prospector":"file_prospector","operation":"create","source_name":"native::109713280-64768","os_id":"109713280-64768","new_path":"/opt/vtom/logs/TEST_test_CAD_testJSONlogs_221106-124138.json","ecs.version":"1.6.0"}
My version of Elasticsearch is: 8.4.3
My version of Filebeat is: 8.5.0 (with allow_older_versions: true in my configuration file)
Thanks for your help

Related

how to create nested and custom json format for the datafarme

i want to create sub-categories from the existing data frame
data frame column consists of (sample table) my changes required at the columns level not any changes in the data like a set of columns are the and column names 3 different suffixes (few with similar column names and other column names)
example like
|payer_id|payer_name|halo_payer_name|delta_payer_name|halo_desc|delta_desc|halo_operations|delta_notes|halo_processed_data|delta_processed_data|extra|insurance_company|
I want it to be grouped in this halo group halo_payer_name|halo_desc|halo_operations|halo_processed_data|
I want it to be grouped in this delta group delta_payer_name|delta_desc|delta_notes|delta_processed_data|
and the remaining columns as one group
so when converted to JSON it would come in this layout
{
"schema": {
"fields": [{
"payer_details": [{
"name": "payer_id",
"type": "string"
},
{
"name": "payer_name",
"type": "string"
},
{
"name": "extra",
"type": "string"
},
{
"name": "insurance_company",
"type": "string"
}
]
},
{
"halo": [{
"name": "halo_payer_name",
"type": "string"
},
{
"name": "halo_desc",
"type": "string"
},
{
"name": "halo_operstions",
"type": "string"
},
{
"name": "halo_processed_data",
"type": "string"
}
]
}, {
"delta": [{
"name": "delta_payer_name",
"type": "string"
},
{
"name": "delta_desc",
"type": "string"
},
{
"name": "delta_notes",
"type": "string"
},
{
"name": "delta_processed_data",
"type": "string"
}
]
}
],
"pandas_version": "1.4.0"
},
"masterdata": [{
"payer_details": [{
"payer_id": "",
"payer_name": "",
"extra": "",
"insurance_company": ""
}],
"halo": [{
"halo_payer_name": "",
"halo_desc": "",
"halo_operations": "",
"halo_processed_data": "",
}],
"delta":[{
"delta_payer_name": "",
"delta_desc": "",
"delta_notes": "",
"delta_processed_data": "",
}]
}]
}
for this type of situation i couldn't find a solution as it is a column based grouping instead of data-based grouping
so came across this post today it helped with my situation (adding data from a data frame and using it to create looped data and insert it into a dict and then convert the whole into a JSON file)
the ref that was helpful to me is link
so the solution for this question goes like this
schema={
"schema": {
"fields": [{
"payer_details": [{
"name": "payer_id",
"type": "string"
},
{
"name": "payer_name",
"type": "string"
},
{
"name": "extra",
"type": "string"
},
{
"name": "insurance_company",
"type": "string"
}
]
},
{
"halo": [{
"name": "halo_payer_name",
"type": "string"
},
{
"name": "halo_desc",
"type": "string"
},
{
"name": "halo_operstions",
"type": "string"
},
{
"name": "halo_processed_data",
"type": "string"
}
]
}, {
"delta": [{
"name": "delta_payer_name",
"type": "string"
},
{
"name": "delta_desc",
"type": "string"
},
{
"name": "delta_notes",
"type": "string"
},
{
"name": "delta_processed_data",
"type": "string"
}
]
}
],
"pandas_version": "1.4.0"
},
"masterdata": []
}
derived the schema above as i have desired
payer_list=[]
for i in df.index:
case={
"payer_details": [{
"payer_id": "{}".format(df['payer_id'][i]),
"payer_name": "{}".format(df['payer_name'][i]),
"extra": "{}".format(df['extra'][i]),
"insurance_company": "{}".format(df['insurance_company'][i])
}],
"halo": [{
"halo_payer_name": "{}".format(df['halo_payer_name'][i]),
"halo_desc": "{}".format(df['halo_desc'][i]),
"halo_operations": "{}".format(df['halo_operations'][i]),
"halo_processed_data": "{}".format(df['halo_processed_data'][i]),
}],
"delta":[{
"delta_payer_name": "{}".format(df['delta_payer_name'][i]),
"delta_desc": "{}".format(df['delta_desc'][i]),
"delta_notes": "{}".format(df['delta_notes'][i]),
"delta_processed_data": "{}".format(df['delta_processed_data'][i]),
}]
}
payer_list.append(case)
schema["masterdata"] = payer_list
created and empty list and run the loop and included in the empty list and joined or linked to the schema

Can't understand this JSON schema from the Swish QR Code API

I'm trying to use an API but the documentation is really bad. I got this JSON schema but I don't understand it. What am I supposed to include in the request?
url: https://mpc.getswish.net/qrg-swish/api/v1/prefilled
I have tried this but it doesn't work:
{
"payee":{
"editable":{
"editable":"false"
},
"swishString":{
"value":"0721876507"
}
},
"size":600,
"border":20,
"transparent":false,
"format":"png"
}
Here's the JSON schema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Swish pre-filled qr code generator",
"description": "REST interface to get a QR code that the Swish app will interpret as a pre filled code",
"definitions": {
"editable": {
"description ": "Controls if user can modify this value in Swish app or not",
"type": "object",
"properties": {
"editable": {
"type": "boolean",
"default": false
}
}
},
"swishString": {
"type": "object",
"properties": {
"value": {
"type": "string",
"maxLength": 70
}
},
"required": [
"value"
]
},
"swishNumber": {
"type": "object",
"properties": {
"value": {
"type": "number"
}
},
"required": [
"value"
]
}
},
"type": "object",
"properties": {
"format": {
"enum": [
"jpg",
"png",
"svg"
]
},
"payee": {
"description": "Payment receiver",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishString"
}
]
},
"amount": {
"description": "Payment amount",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishNumber"
}
]
},
"message": {
"description": "Message for payment",
"allOf": [
{
"$ref": "#/definitions/editable"
},
{
"$ref": "#/definitions/swishString"
}
]
},
"size": {
"description": "Size of the QR code. The code is a square, so width and height are the same. Not required is the format is svg",
"value": "number",
"minimum": 300
},
"border": {
"description": "Width of the border.",
"type": "number"
},
"transparent": {
"description": "Select background color to be transparent. Do not work with jpg format.",
"type": "boolean"
}
},
"required": [
"format"
],
"anyOf": [
{
"required": [
"payee"
]
},
{
"required": [
"amount"
]
},
{
"required": [
"message"
]
}
],
"additionalProperties": false,
"maxProperties": 5
}
The API should return a QR code.
To be honest, I have not taken the time to learn JSON schema, but your example should probably look something like this:
{
"payee": {
"value": "0721876507",
"editable": false
},
"size": 600,
"border": 20,
"transparent": false,
"format": "png"
}
There are other parameters you may choose to utilize:
{
"payee": {
"value": "1239006032",
"editable": false
},
"message": {
"value": "LIV",
"editable": true
},
"amount": {
"value": 100,
"editable": true
},
"format": "png",
"size": 300,
"border": 0,
"transparent": true
}
Honestly, I think the developers behind the Swish APIs are trying to look smart by complicating things. They should, of course, have provided example JSON data instead of forcing consumers to understand their JSON schema. Also, I believe their published schema is wrong. The second example I provided works even though it doesn't validate according to the JSON schema ("Object property count 7 exceeds maximum count of 5").
Here is a minimal and pretty useless request that returns a valid QR-code
{
"format": "png",
"size": 300
}
And here is a more usable example that works
{
"format": "png",
"size": 300,
"transparent": false,
"amount": {
"value": 999.99,
"editable": true
},
"payee": {
"value": "0701000000",
"editable": false
},
"message": {
"value": "Hello",
"editable": false
}
}

Json validation using patternProperties does now work

I use the Github library "json-schema" to validate JSON against a custom schema. But I am having trouble validating the following JSON against my defined schema.
JSON holding 2 news entries:
[{
"bodytext": "Lorem ipsum",
"datetime": "2018-01-29T13:18:56+0100",
"falMedia": {
"00000000519602e500007f310096b8e2": {
"alternative": null,
"description": "The description",
"link": "",
"pid": 6043,
"processedFile": {
"publicUrl": "image01.png"
},
"title": null,
"uid": 52925
},
"0000000051960a1700007f310096b8e2": {
"alternative": null,
"description": "Another description",
"link": "",
"pid": 6043,
"processedFile": {
"publicUrl": "images/picture01.jpg"
},
"title": null,
"uid": 52927
}
},
"falRelatedFiles": [ ],
"istopnews": true,
"pid": 6043,
"teaser": "A short lorem teaser",
"title": "The message title",
"uid": 51911
},
{
"bodytext": "Hello World",
"datetime": "2018-01-29T13:24:33+0100",
"falMedia": [ ],
"falRelatedFiles": [ ],
"istopnews": false,
"pid": 339,
"teaser": null,
"title": "Just a short message",
"uid": 51915
}]
This is my schema against which I try to validate:
{
"$schema": "http://json-schema.org/draft-04/schema",
"title": "FPÖ News",
"description": "Schema for FPÖ News articles accesed by the mobile apps",
"type": "array",
"items": {
"additionalProperties": false,
"properties": {
"bodytext": {
"type": "string"
},
"datetime": {
"type": "string"
},
"falMedia": {
"anyOf": [
{
"type": "object",
"patternProperties": {
"^[0-9abcde]*$": {
"type": "object",
"properties": {
"alternative": {
"type": ["null", "string"]
},
"description": {
"type": "string"
},
"link": {
"type": "string"
},
"pid": {
"type": "integer"
},
"processedFile": {
"type": "object",
"properties": {
"additionalProperties": false,
"publicUrl": {
"type": "string"
}
}
},
"title": {
"type": "string"
},
"uid": {
"type": "integer"
}
},
"additionalProperties": false
}
}
},
{
"type": "array"
}
]
},
"falRelatedFiles": {
"type": "array"
},
"istopnews": {
"type": "boolean"
},
"pid": {
"type": "integer"
},
"teaser": {
"type": ["string", "null"]
},
"title": {
"type": "string"
},
"uid": {
"type": "integer"
}
},
"required": [
"bodytext",
"datetime",
"istopnews",
"pid",
"teaser",
"title",
"uid"
]
}
}
It seems to me that everything underneath the part "patternProperties" is not concidered when validating.
My issue is, that the keys for the objects underneath "falMedia" are generated dynamically. Therefore I do not know these values. I thought about using the "patternProperties". But when I change the values of the "falMedia" entries in the "news" to something which (according to the supplied regex under patternProperties) is not valid, the library still says that everything is fine.
Am I missing out on something here?
Is my schema false?
I would greatly appriciate any help.
Thanks
Klaus
Your problem seems to be that your regex should be "^[0-9abcdef]*$" instead of "^[0-9abcde]*$". Because you're missing the "f", patternProperties doesn't match any of the values and therefore doesn't constraint the properties in "falMedia".

Logstash + Elasticsearch template mapping fails to add to Elasticsearch

I'm trying to add a custom template for all logstash indexes in elasticsearch, however whenever I add one, logstash raises a 400 error on all the logs and fails to add anything to elasticsearch.
I'm adding the template using the REST API for elasticsearch:
POST _template/logstash
{
"order": 0,
"template" : "logstash*",
"settings": {
"index.refresh_interval": "5s"
},
"mappings": {
"_default_": {
"_all" : {
"enabled" : true,
"omit_norms": true
},
"dynamic_templates": [
{
"message_field": {
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string"
},
"match_mapping_type": "string",
"match": "message"
}
},
{
"string_fields": {
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string",
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
}
},
"match_mapping_type": "string",
"match": "*"
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"type": "object",
"properties": {
"location": {
"type": "geo_point"
}
}
},
"#version": {
"index": "not_analyzed",
"type": "string"
},
"#fields": {
"type": "object",
"dynamic": true,
"path": "full"
},
"#message": {
"type": "string",
"index": "analyzed"
},
"#source": {
"type": "string",
"index": "not_analyzed"
},
"method": {
"type": "string",
"index": "not_analyzed"
},
"requested": {
"type": "date",
"format": "dateOptionalTime",
"index": "not_analyzed"
},
"response_time": {
"type": "float",
"index": "not_analyzed"
},
"hostname": {
"type": "string",
"index": "not_analyzed"
},
"ip": {
"type": "string",
"index": "not_analyzed"
},
"error": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
you should try to add the template using logstash instead of using the rest api directly.
In your logstash configuration:
output {
elasticsearch {
# add additional configurations appropriately
template => # path to the template file you want to use
template_name => "logstash"
template_overwrite => true
}
}

configure an elasticsearch index with json not taking

I'm using the following json to configure elasticsearch. The goal is to set up the index and the type in one swoop (this is the requirement, setting up docker images). This is as far as I've gotten that will allow elasticsearch to start successfully. The problem is that the index isn't created yet it doesn't error. Other forms I've tried prevents the service from starting.
{
"cluster": {
"name": "MyClusterName"
},
"node": {
"name": "MyNodeName"
},
"indices": {
"number_of_shards": 4,
"index.number_of_replicas": 4
},
"index": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer",
"filter": "lowercase"
},
"my_lowercase_whitespace_analyzer": {
"tokenizer": "whitespace",
"filter": "lowercase"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
}
},
"index": {
"settings": {
"_id": "indexindexer"
},
"mappings": {
"inventoryIndex": {
"_id": {
"path": "indexName"
},
"_routing": {
"required": true,
"path": "indexName"
},
"properties": {
"indexName": {
"type": "string",
"index": "not_analyzed"
},
"startedOn": {
"type": "date",
"index": "not_analyzed"
},
"deleted": {
"type": "boolean",
"index": "not_analyzed"
},
"deletedOn": {
"type": "date",
"index": "not_analyzed"
},
"archived": {
"type": "boolean",
"index": "not_analyzed"
},
"archivedOn": {
"type": "date",
"index": "not_analyzed"
},
"failure": {
"type": "boolean",
"index": "not_analyzed"
},
"failureOn": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}
}
I may have a workaround using curl in a post-boot script but I would prefer to have the configuration handled in the config file.
Thanks!
It appears that elasticsearch will not allow all the configuration to be done in a single yml. The workaround I've found is to create an index template and place it in the <es-config>/templates/ dir then after spinning up the service I use curl to create the index. The index matching will catch it and provision it according to the template.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html