Im trying to extracting json objects and store it to hdfs. I'm targeting message attribute which is a6,b6,c6,d6,e6
json sample
{
"#timestamp":"2020-07-06T07:35:29.047Z",
"#metadata":{
"beat":"filebeat",
"type":"_doc",
"version":"7.7.1"
},
"log":{
"offset":91,
"file":{
"path":"C:\\Program Files\\Filebeat\\test-kafka\\test_csv.csv"
}
},
"message":"a6,b6,c6,d6,e6",
"input":{
"type":"log"
},
"ecs":{
"version":"1.5.0"
},
"host":{
"name":"host"
},
"agent":{
"version":"7.7.1",
"type":"filebeat",
"ephemeral_id":"0b4a288f-f7ac-4db9-835e-60ca07a45fff",
"hostname":"host",
"id":"5e2fec03-bbdc-4f91-acc9-4ab36c7268db"
}
}
GenerateFlowFile properties
JsonEvaluatePath properties
but problem JsonEvaluatePath not working as i expected, i thought it will extracting only message attribute.
hadoop#ambari:~$ hdfs dfs -cat /user/test/5a422f02-9074-4384-a3c9-f3e3ce7c2e40
{
"#timestamp":"2020-07-06T07:35:29.047Z",
"#metadata":{
"beat":"filebeat",
"type":"_doc",
"version":"7.7.1"
},
"log":{
"offset":91,
"file":{
"path":"C:\\Program Files\\Filebeat\\test-kafka\\test_csv.csv"
}
},
"message":"a6,b6,c6,d6,e6",
"input":{
"type":"log"
},
"ecs":{
"version":"1.5.0"
},
"host":{
"name":"host"
},
"agent":{
"version":"7.7.1",
"type":"filebeat",
"ephemeral_id":"0b4a288f-f7ac-4db9-835e-60ca07a45fff",
"hostname":"host",
"id":"5e2fec03-bbdc-4f91-acc9-4ab36c7268db"
}
}
Am i missing something?
Since you used EvaluateJsonPath with destination set as flow file attributes, it extracted message into a flow file attribute and the content of the flow file is still the same as it was before. You would need to use another processor like AttributesToJson before PutHDFS to rewrite the flow file content with the attributes you want. An alternative might be to set EvaluateJsonPath destination to flow file content, but I'm not sure if that produces valid json.
Related
I crawled a lot of JSON files in data folder, which all named by timestamp (./data/2021-04-05-12-00.json, ./data/2021-04-05-12-30.json, ./data/2021-04-05-13-00.json, ...).
Now I'm tring to use ELK stack to load those increasing JSON files.
The JSON file is pretty printed like:
{
"datetime": "2021-04-05 12:00:00",
"length": 3,
"data": [
{
"id": 97816,
"num_list": [1,2,3],
"meta_data": "{'abc', 'cde'}"
"short_text": "This is data 97816"
},
{
"id": 97817,
"num_list": [4,5,6],
"meta_data": "{'abc'}"
"short_text": "This is data 97817"
},
{
"id": 97818,
"num_list": [],
"meta_data": "{'abc', 'efg'}"
"short_text": "This is data 97818"
},
],
}
I tried using logstash multiline plugins to extract json file, but it seems like it will handle each file as an event. Is there any way to extract each record in JSON data fileds as an event ?
Also, what's the best practice for loading multiple increasing pretty-printed JSON files in ELK ?
Using multiline is correct if you want to handle each file as one input event.
Then you need to leverage the split filter in order to create one event for each element in the data array:
filter {
split {
field => "data"
}
}
So Logstash reads one file as a whole, it passes its content as a single event to the filter layer and then the split filter as shown above will spawn one new event for each element in the data array.
I want to get some data from a json file. I can access everything with the code below.
import json
with open('C:\\Users\\me\\Documents\\stdin.json', 'r', encoding='utf8', errors='ignore') as json_file:
data = json.load(json_file)
print("Type: ", type(data))
print("VM: ", data["Datacenter"])
The .json file looks like this:
{
"Datacenter":[
{
"Folder":[
{
"Folder":[
{
"VirtualMachine":[
{
"moid":"vm-239566",
"name":"DEV CentOS 6",
},
{
"moid":"vm-239464",
"name":"DEV Sles 12",
},
],
"moid":"group-v239127",
"name":"DEV-VMs"
},
],
"moid":"group-v78",
"name":"Test and Dev"
},
{
"VirtualMachine":[
{
"moid":"vm-66130",
"name":"Hyv16-clone",
}
],
"moid":"group-v77",
"name":"Templates"
}
],
"moid":"datacenter-21",
"name":"Datencenter"
}
],
"vSphereHost":"srv01",
"vSphereProductLine":"vpx",
"vSphereServer":"VMware vCenter Server",
"vSphereVersion":"xxx",
"version":"1.0",
"viewType":"VMs and Templates"
}
Note that the original json file was much bigger as I deleted lines for readabilty. Also note that I run everything from Command Line, as my IDE always gives me the error UnicodeEncodeError: 'charmap' codec can't encode characters in position 22910-22912: character maps to <undefined>
I tried to use data["VirtualMachine"] instead of data["Datacenter"] but then I get an error... TypeError: 'VirtualMachine' is an invalid keyword argument for this function.
So how can I get/print the moid and name of a VM? I am really new to coding and donĀ“t know how to deal with nested dictionarys
However your question does not seems clear but from whatever you have mentioned this can be help you deriving the value in nested json if you have already have dataframe created. You can go ahead and add get() till you reach what you require. Below is the sample that you can use
import json
data = data.apply(lambda x: json.loads(json.loads(x).get("Folder","{}")).get("moid") if x else None)
In a Jmeter Script, I need to process a http response e manipulate a json for send in next request, because actually this manipulation occurs in a Angular client.
My Http reponse:
[
{
"function":"nameA",
"rast":"F1",
"tag":"EE",
"probs":"0,987"
},
{
"function":"nameB",
"rast":"F2",
"tag":"SE",
"probs":"0,852"
},
{
"function":"nameC",
"rast":"F3",
"tag":"CE",
"probs":"0,754"
}
]
I need convert the result in json bellow to post in next request:
[
{
"function":"nameA",
"rast":"F1",
"type":{
"name":"EE"
},
"id":"alpha"
},
{
"function":"nameB",
"rast":"F2",
"type":{
"name":"SE"
},
"id":"alpha"
},
{
"function":"nameC",
"rast":"F3",
"type":{
"name":"CE"
},
"id":"alpha"
}
]
I filter the response with this JSON Extractor:
[*].["function", "rast", "tag"]
But now I need to solve other problems:
Add an id attribute (same for all functions)
Add an object with the name type.
Move the tag attribute into the object called type.
Rename the tag attribute to the name.
Add JSR223 PostProcessor as a child of the request which returns the original JSON
Put the following code into "Script" area:
def json = new groovy.json.JsonSlurper().parse(prev.getResponseData()).each { entry ->
entry << [type: [name: entry.get('tag')]]
entry.remove('tag')
entry.remove('probs')
entry.put('id', 'alpha')
}
def newJson = new groovy.json.JsonBuilder(json).toPrettyString()
log.info(newJson)
That's it, you should see the generated JSON in jmeter.log file.
If you need to have it in a JMeter Variable add the next line to the end of your script:
vars.put('newJson', newJson)
and you will be able to access generated value as ${newJson} where required
More information:
Groovy: Parsing and producing JSON
Apache Groovy - Why and How You Should Use It
I want to update an attribute in a JSON file with the value i get from other processor. Below is my Original JSON file.
{
"applicant": {
"applicant-id": null
"full-name": "Tyrion Lannister",
"mobile-number" : "8435739739",
"email-id" : "tyrionlannister_casterlyrock#gmail.com"
},
"product": {
"product-category" : "Credit Card",
"product-type" : "Super Value Card - Titanium"
}
}
Below is my EvaluavateJsonPath Config where I extracted the applicant-id attribute.
Below is my GenerateFlowFile processor which generate an id value.
Now I need the update the applicant-id attribute with the value (899872120) in the Original JSON as below.
{
"applicant": {
"applicant-id": 899872120
"full-name": "Tyrion Lannister",
"mobile-number" : "8435739739",
"email-id" : "tyrionlannister_casterlyrock#gmail.com"
},
"product": {
"product-category" : "Credit Card",
"product-type" : "Super Value Card - Titanium"
}
}
I tried to use MergeContent to merge the 2 flows and i can see the Applicant-Id attribute value in the flow file after the MergeContent processor. I tried using UpdateAttribue to update Applicant-Id but i'm not able get the update JSON record.
Below is my MergeContent Configuration.
Is there anything i'm missing?
As of NiFi 1.2.0, the JoltTransformJSON processor supports NiFi Expression Language, so if you have your id value in the attribute "Applicant-id", you can use it in a Default JOLT spec:
{
"applicant": {
"applicant-id": "${Applicant-id}"
}
}
That should transform your input JSON to your desired output JSON.
I have a simple json file which is :
{
"nodes":[
{"name":"Moe","group":1},
{"name":"Madih1","group":1},
{"name":"Madih2","group":1},
{"name":"Nora","group":1},
{"name":"Myna","group":1}
],
"links":[
{"source":35,"target":44,"value":1},
{"source":44,"target":35,"value":1},
{"source":45,"target":35,"value":1},
{"source":45,"target":44,"value":1},
{"source":35,"target":49,"value":1},
{"source":49,"target":35,"value":1}
]
}
when I save it use exactly the html code as shown in http://bl.ocks.org/4062045#index.html and address the above json, nothing appears on the cancas.
I appreciate it if you help me with this one as I am not very familiar with it. Moreover, it would be great if I know the minimum code required for drawing a graph like this using json.
Best,
The number of "source" and "target" refer to the index of the item in nodes array.
So you can change your json to following:
{
"nodes":[
{"name":"Moe","group":1},
{"name":"Madih1","group":1},
{"name":"Madih2","group":1},
{"name":"Nora","group":1},
{"name":"Myna","group":1}
],
"links":[
{"source":0,"target":1,"value":1},
{"source":1,"target":2,"value":1},
{"source":2,"target":3,"value":1},
{"source":3,"target":4,"value":1},
]
}
Then you can just copy the codes from http://bl.ocks.org/4062045#index.html example as the minimum code.
Remenber to change the json file to your own json file.
d3.json("path/to/your/json", function(error, graph) {
//codes
});