Parse JSON value into separate strings in telegraf.conf - json

I am using telegraf inputs.tail to parse my application log files. Input data is in json format like so:
{
"key1":"value1",
"key2":"value2",
"key3":"value3a^value3b"
}
Question - How to parse the value of key3 so that I can write value3a and value3b to two separate tags/columns in influxDB?
telegraf.conf snippet:
[[inputs.tail]]
files = ["/path/to/log/server.log"]
data_format = "json"
tag_keys = [
"key1",
"key2",
"key3"
]
Been breaking my head with this problem for 2 days now. Any help would be much appreciated!
Thanks in advance.

Related

Parse multi level JSON with Ruby

I am trying to parse the JSON file below. The problem is I cannot return "Mountpoint" as a key. It only gets parsed as a value. This is the command I am using to parse it json_data = JSON.parse(readjson). The reason I guess that it's a key is because if I run json_data.keys only EncryptionStatus and SwitchName are returned. Any help would be greatly appreciated.
{
"EncryptionStatus": [
{
"MountPoint": "C:",
"VolumeStatus": "FullyEncrypted"
},
{
"MountPoint": "F:",
"VolumeStatus": "FullyEncrypted"
},
{
"MountPoint": "G:",
"VolumeStatus": "FullyEncrypted"
},
{
"MountPoint": "H:",
"VolumeStatus": "FullyEncrypted"
}
],
"SwitchName": [
"LAN",
"WAN"
]
}
I tried using dig as a part of my JSON.parse but that didn't seem to help me.
JSON data can have multiple levels.
Your JSON document is a
Hash (Dictionary/Map/Object in other languages) that has two keys ("EncryptionStatus", "SwitchName"),
The value for the "EncryptionStatsu" key is an Array of Hashes (with keys "MountPoint" and "VolumeStatus").
# assuming your JSON is in a file called "input.json"
data = File.read("input.json")
json = JSON.parse(data)
json["EncryptionStatus"].each do |encryption_status|
puts "#{encryption_status["MountPoint"]} is #{encryption_status["VolumeStatus"]}"
end
This will print out
C: is FullyEncrypted
F: is FullyEncrypted
G: is FullyEncrypted
H: is FullyEncrypted
If you want to access a specific item you can look at the dig method. E.g.
json.dig("EncryptionStatus", 3)
Would return the information for mountpoint "H"

How to parse a kafka topic which contains data in the form of nested json?

I am trying to read a kafka topic and stream it to my sink. To read the data, I wrote the following code.
topic data in json:
{
"HiveData": {
"Tablename": "HiveTablename1",
"Rowcount": "3213423",
"lastupdateddate": "2021-02-24 13:04:14"
},
"HbaseData": [
{
"Tablename": "HbaseTablename1",
"Rowcount": "23543",
"lastupdateddate": "2021-02-23 12:03:11"
}
],
"PostgresData": [
{
"Tablename": "PostgresTablename1",
"Rowcount": "23454345",
"lastupdateddate": "2021-02-23 12:03:11"
}
]
}
Below is the code I wrote for parsing the topic:
def streamData(): DataFrame = {
val kafkaDF = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "server:port")
.option("subscribe", "topic_name")
.load()
kafkaDF.select(from_json(col("HiveData"), topic_schema).as("HiveData")).selectExpr("HiveData.tablename as table", "HiveData.Rowcount as rowcount", "HiveData.lastupdateddate as lastupdateddate")
kafkaDF
}
But this code works if the json is in the format of:
{"Tablename": "HiveTablename1","Rowcount": "3213423","lastupdateddate": "2021-02-24 13:04:14"}
I want to parse the json and get HiveData into a seperate dataframe and a seperate dataframe for HBaseData and the same for PostgresData. The code I wrote is working if the json data is in a single line.
Could anyone let me know how to parse the data into multiple dataframes if it is in nested format as mentioned in the beginning of this question ?
Any help is much appreciated.
Try adding
option("multiline", "true")

Compare two Json files using Apache Spark

I am new to Apache Spark and I am trying to compare two json files.
My requirement is to find out that which key/value is added, removed or modified and what is its path.
To explain my problem, I am sharing the code which I have tried with a small json sample here.
Sample Json 1 is:
{
"employee": {
"name": "sonoo",
"salary": 57000,
"married": true
} }
Sample Json 2 is:
{
"employee": {
"name": "sonoo",
"salary": 58000,
"married": true
} }
My code is:
//Compare two multiline json files
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Load first json file
val jsonData_1 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_1.json").values)
//Load second json file
val jsonData_2 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_2.json").values)
//Compare both json files
jsonData_2.except(jsonData_1).show(false)
The output which I get on executing this code is:
+--------------------+
|employee |
+--------------------+
|{true, sonoo, 58000}|
+--------------------+
But here only one field i.e. salary was modified so output should be only the updated field with its path.
Below is the expected output details:
[
{
"op" : "replace",
"path" : "/employee/salary",
"value" : 58000
}
]
Can anyone point me in the right direction?
Assuming each json has an identifier, and that you have two json groups (e.g. folders), you need to compare b/w the jsons in the two groups:
Load the jsons from each group into a dataframe, providing a schema matching the structure of the son. After this, you have two dataframes.
Compare the jsons (by now rows in a dataframe) by joining on the identifiers, looking for mismatched values.

Processing JSON arrays

I have a JSON file arranged in this pattern:
[
{
"Title ID": "4224031",
"Overtime Status": "Non-Exempt",
"Shift rates": "No Shift rates",
"On call rates": "No On call rates"
},
[
{
"Step: 1.0": [
"$38.87",
"(38.870000)"
]
}
]
][
{
"Title ID": "4225031",
"Overtime Status": "Non-Exempt",
"Shift rates": "No Shift rates",
"On call rates": "No On call rates"
},
[
{
"Step: 1.0": [
"$38.87",
"(38.870000)"
]
}
]
]
I am trying to get it into a Pandas DataFrame. I have tried opening a connection to the JSON file and running JSON.load(s). Unfortunately, I get JSON decode errors like: "JSONDecodeError: Extra data: line 16 column 2 (char 182)". When running the JSON through a linter, I see that there might be an issue with the way the JSON is presented in the file. The parts between the brackets are valid but when wrapped in brackets, become invalid. I have then tried to get at the dictionaries with the wrapping brackets but have not been able to make much progress. Does anyone have tips on how I can successfully access this JSON data and get it into a pandas DataFrame?
The json is invalid beacuase it has more than one root in this representation.
This has to be like this
jsonObject = [{"1":"3"}], [{"4":"5"}]
Hacks that I am able to think of are replace these brackets ][ to this ],[ by find and replace in editor. You'll be able to then create a dataframe as its a list now.
Second, if its not a one time job, then you need to write a regex that can do this for you in text cleaning pipeline(or code). I'm not good at writing of working regex(sorry mate).
I found a solution.
First, after examining the JSON data in a linter, I found that I had some extra brackets and braces at different points. So, I am running the data through a regex that cleans out the unnecessary brackets and braces.
Next, I run each line, which now looks like a string dictionary through json.loads
Finally, I call pd.DataFrame(pd.json_normalize(data)) to get my desired pandas dataframe.
Thanks for the help from commenters.

Decode resulting json and save into db - Django + Postgres

I have a model like this:
class MyClass(models.Model):
typea = models.CharField(max_length=128)
typeb = models.CharField(max_length=128)
If for example, the resulting json from an API is like this:
{
"count": 75,
"results": [
{
"typea": "This tipe",
"typeb": "A B type",
"jetsons": [],
"data": [
"https://myurl.com/api/data/2/",
],
"created": "2014-12-15T12:31:42.547000Z",
"edited": "2017-04-19T10:56:06.685592Z",
},
I need to parse this result and save typea and typeb into the database, I'm kind of confused on how to do this.
I mean, there is the JSONField on Django, but I don't think that will work for me, since I need to save some specific nested string of the json dict.
Any example or idea on how to achieve this?
I mean, my confusion is on how to parse this and "extract" the data I need for my specific fields.
Thank You
You can always do an import json and use json.load(json_file_handle) to create a dictionary and extract the values you need. All you need is to open the .json file (you can use with open("file.json", "r") as json_file_handle) and load the data.