Clikhouse + Amazon SNS notification - json

I try to insert a Amazon SNS notification eventype = Open to ClickHouse, the Json schema is complex so I don't how I can create my table ( with nested inside a nested ... )
{
"eventType":"Open",
"mail":{
"commonHeaders":{
"from":[
"sender#example.com"
],
"messageId":"EXAMPLE7c191be45-e9aedb9a-02f9-4d12-a87d-dd0099a07f8a-000000",
"subject":"Message sent from Amazon SES",
"to":[
"recipient#example.com"
]
},
"destination":[
"recipient#example.com"
],
"headers ":[
{
"name":"X-SES-CONFIGURATION-SET",
"value":"ConfigSet"
},
{
"name":"X-SES-MESSAGE-TAGS",
"value":"myCustomTag1=myCustomValue1, myCustomTag2=myCustomValue2"
},
{
"name":"From",
"value":"sender#example.com"
},
{
"name":"To",
"value":"recipient#example.com"
},
{
"name":"Subject",
"value":"Message sent from Amazon SES"
},
{
"name":"MIME-Version",
"value":"1.0"
},
{
"name":"Content-Type",
"value":"multipart/alternative; boundary=\"XBoundary\""
}
],
"headersTruncated":false,
"messageId":"EXAMPLE7c191be45-e9aedb9a-02f9-4d12-a87d-dd0099a07f8a-000000",
"sendingAccountId":"123456789012",
"source":"sender#example.com",
"tags":{
"myCustomTag1":[
"myCustomValue1"
],
"myCustomTag2":[
"myCustomValue2"
],
"ses:caller-identity":[
"ses-user"
],
"ses:configuration-set":[
"ConfigSet"
],
"ses:from-domain":[
"example.com"
],
"ses:source-ip":[
"192.0.2.0"
]
},
"timestamp":"2017-08-09T21:59:49.927Z"
},
"open":{
"ipAddress":"192.0.2.1",
"timestamp":"2017-08-09T22:00:19.652Z",
"userAgent":"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_3 like Mac OS X) AppleWebKit/603.3.8 (KHTML, like Gecko) Mobile/14G60"
}
}
I tried INSERT INTO Open FORMAT JSONEachRow and INSERT INTO Open FORMAT JSONCompact but doesn't work.
Thank you.

You should transform your JSON to more simple form without nesting and use JSONEachRow.
Or insert data to CH as JSONAsString and transform using JSONExtract
create table i(J String) Engine=Null;
create table f(a String, i Int64, f Float64) Engine=MergeTree order by a;
create materialized view vv to f
as select (JSONExtract(J, 'Tuple(String,Tuple(Int64,Float64))') as x),
x.1 as a,
x.2.1 as i,
x.2.2 as f
from i;
echo '{"s": "val1", "b2": {"i": 42, "f": 0.1}}' |clickhouse-client -q "insert into i format JSONAsString"
select * from f
┌─a────┬──i─┬───f─┐
│ val1 │ 42 │ 0.1 │
└──────┴────┴─────┘

Related

Table from nested list, struct

I have this json data:
consumption_json = """
{
"count": 48,
"next": null,
"previous": null,
"results": [
{
"consumption": 0.063,
"interval_start": "2018-05-19T00:30:00+0100",
"interval_end": "2018-05-19T01:00:00+0100"
},
{
"consumption": 0.071,
"interval_start": "2018-05-19T00:00:00+0100",
"interval_end": "2018-05-19T00:30:00+0100"
},
{
"consumption": 0.073,
"interval_start": "2018-05-18T23:30:00+0100",
"interval_end": "2018-05-18T00:00:00+0100"
}
]
}
"""
and I would like to covert the results list to an Arrow table.
I have managed this by first converting it to python data structure, using python's json library, and then converting that to an Arrow table.
import json
consumption_python = json.loads(consumption_json)
results = consumption_python['results']
table = pa.Table.from_pylist(results)
print(table)
pyarrow.Table
consumption: double
interval_start: string
interval_end: string
----
consumption: [[0.063,0.071,0.073]]
interval_start: [["2018-05-19T00:30:00+0100","2018-05-19T00:00:00+0100","2018-05-18T23:30:00+0100"]]
interval_end: [["2018-05-19T01:00:00+0100","2018-05-19T00:30:00+0100","2018-05-18T00:00:00+0100"]]
But, for reasons of performance, I'd rather just use pyarrow exclusively for this.
I can use pyarrow's json reader to make a table.
reader = pa.BufferReader(bytes(consumption_json, encoding='ascii'))
table_from_reader = pa.json.read_json(reader)
And 'results' is a struct nested inside a list. (Actually, everything seems to be nested).
print(table_from_reader['results'].type)
list<item: struct<consumption: double, interval_start: timestamp[s], interval_end: timestamp[s]>>
How do I turn this into a table directly?
following this https://stackoverflow.com/a/72880717/3617057
I can get closer...
import pyarrow.compute as pc
flat = pc.list_flatten(table_from_reader["results"])
print(flat)
[
-- is_valid: all not null
-- child 0 type: double
[
0.063,
0.071,
0.073
]
-- child 1 type: timestamp[s]
[
2018-05-18 23:30:00,
2018-05-18 23:00:00,
2018-05-18 22:30:00
]
-- child 2 type: timestamp[s]
[
2018-05-19 00:00:00,
2018-05-18 23:30:00,
2018-05-17 23:00:00
]
]
flat is a ChunkedArray whose underlying arrays are StructArray. To convert it to a table, you need to convert each chunks to a RecordBatch and concatenate them in a table:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(s)
for s in flat.iterchunks()
]
)
If flat is just a StructArray (not a ChunkedArray), you can call:
pa.Table.from_batches(
[
pa.RecordBatch.from_struct_array(flat)
]
)

TTN V3 (MQTT JSON) -> Telegraf -> Grafana / Sensor data from Dragino LSE01 does not apear

I have a problem with Telegraf. I have a Dragino LSE01-8 sensor which is registered on TTN v3. I can check the decoded payload by subscribing to the topic "v3/lse01-8#ttn/devices/+/up".
But when I want to grab the data from Influx, I can not get "temp_SOIL" and "water_SOIL", although the data appears in JSON. "conduct_SOIL" is no problem. But I don't know why. Can somebody give me a hint?
Another sensor (Dragino LHT 65) works fine with all data I want to access.
It's possible to get this data from the Influx-Database:
uplink_message_decoded_payload_BatV
uplink_message_decoded_payload_Mod
uplink_message_decoded_payload_conduct_SOIL
uplink_message_decoded_payload_i_flag
uplink_message_decoded_payload_s_flag
uplink_message_f_cnt
uplink_message_f_port
uplink_message_locations_user_latitude
uplink_message_locations_user_longitude
uplink_message_rx_metadata_0_channel_index
uplink_message_rx_metadata_0_channel_rssi
uplink_message_rx_metadata_0_location_altitude
uplink_message_rx_metadata_0_location_latitude
uplink_message_rx_metadata_0_location_longitude
uplink_message_rx_metadata_0_rssi
uplink_message_rx_metadata_0_snr
uplink_message_rx_metadata_0_timestamp
uplink_message_settings_data_rate_lora_bandwidth
uplink_message_settings_data_rate_lora_spreading_factor
uplink_message_settings_timestamp
## Feuchtigkeitssensor Dragino LSE01-8
[[inputs.mqtt_consumer]]
name_override = "TTN-LSE01"
servers = ["tcp://eu1.cloud.thethings.network:1883"]
qos = 0
connection_timeout = "30s"
topics = [ "v3/lse01-8#ttn/devices/+/up" ]
client_id = "telegraf"
username = "lse01-8#ttn"
password = "NNSXS.LLSNSE67AP..................P67Q.Q...........HPG............KJA..........." //
data_format = "json"
This is the JSON data I can get (I changed some data in order not to send any passwords or tokens).
{
"end_device_ids":{
"device_id":"eui-a8.40.141.bbe4",
"application_ids":{
"application_id":"lse01-8"
},
"dev_eui":"A8...40.BE...4",
"join_eui":"A8.40.010.1",
"dev_addr":"2.9F.....8"
},
"correlation_ids":[
"as:up:01G4WDNS..P3C3R...RK56VQ...KT7N076",
"gs:conn:01G4H2F.ETRG.V2QER...RQ.0K1MGZ44",
"gs:up:host:01G4H2F.ETWRZX.4PFN.A2M.6RDKD4",
"gs:uplink:01G4WDN.N7B6P.J8E.JS.503F1",
"ns:uplink:01G4WDNSFM.MCYYEZZ1.KY.4M78",
"rpc:/ttn.lorawan.v3.GsNs/HandleUplink:01G4W.NSFM29Z3.PABYW...43",
"rpc:/ttn.lorawan.v3.NsAs/HandleUplink:01G4W....VTQ4DMKBF"
],
"received_at":"2022-06-06T11:51:18.979353604Z",
"uplink_message":{
"session_key_id":"AYE...j+DM....A==",
"f_port":2,
"f_cnt":292,
"frm_payload":"DSQAAAcVB4AADBA=",
"decoded_payload":{
"BatV":3.364,
"Mod":0,
"conduct_SOIL":12,
"i_flag":0,
"s_flag":1,
"temp_DS18B20":"0.00",
"temp_SOIL":"19.20",
"water_SOIL":"18.13"
},
"rx_metadata":[
{
"gateway_ids":{
"gateway_id":"lr8",
"eui":"3.6201F0.058.....00"
},
"time":"2022-06-06T11:51:00.289713Z",
"timestamp":4283143007,
"rssi":-47,
"channel_rssi":-47,
"snr":7,
"location":{
"latitude":51.______________,
"longitude":6.__________________,
"altitude":25,
"source":"SOURCE_REGISTRY"
},
"uplink_token":"ChsKG________________________________",
"channel_index":2
}
],
"settings":{
"data_rate":{
"lora":{
"bandwidth":125000,
"spreading_factor":7
}
},
"coding_rate":"4/5",
"frequency":"868500000",
"timestamp":4283143007,
"time":"2022-06-06T11:51:00.289713Z"
},
"received_at":"2022-06-06T11:51:18.772518399Z",
"consumed_airtime":"0.061696s",
"locations":{
"user":{
"latitude":51._________________,
"longitude":6.__________________4,
"source":"SOURCE_REGISTRY"
}
},
"version_ids":{
"brand_id":"dragino",
"model_id":"lse01",
"hardware_version":"_unknown_hw_version_",
"firmware_version":"1.1.4",
"band_id":"EU_863_870"
},
"network_ids":{
"net_id":"000013",
"tenant_id":"ttn",
"cluster_id":"eu1",
"cluster_address":"eu1.cloud.thethings.network"
}
}
}

pandas json normalize key error with a particular json attribute

I have a json as:
mytestdata = {
"success": True,
"message": "",
"data": {
"totalCount": 95,
"goal": [
{
"user_id": 123455,
"user_email": "john.smith#test.com",
"user_first_name": "John",
"user_last_name": "Smith",
"people_goals": [
{
"goal_id": 545555,
"goal_name": "test goal name",
"goal_owner": "123455",
"goal_narrative": "",
"goal_type": {
"id": 1,
"name": "Team"
},
"goal_create_at": "1595874095",
"goal_modified_at": "1595874095",
"goal_created_by": "123455",
"goal_updated_by": "123455",
"goal_start_date": "1593561600",
"goal_target_date": "1601424000",
"goal_progress": "34",
"goal_progress_color": "#ff9933",
"goal_status": "1",
"goal_permission": "internal,team",
"goal_category": [],
"goal_owner_full_name": "John Smith",
"goal_team_id": "766754",
"goal_team_name": "",
"goal_workstreams": []
}
]
}
]
}
}
I am trying to display all details in "people_goals" along with "user_last_name", "user_first_name","user_email", "user_id" with json_normalize.
So far I am able to display "people_goals", "user_first_name","user_email" with the code
df2 = pd.json_normalize(data=mytestdata['data'], record_path=['goal', 'people_goals'],
meta=[['goal','user_first_name'], ['goal','user_last_name'], ['goal','user_email']], errors='ignore')
However I am having issue when trying to include ['goal', 'user_id'] in the meta=[]
The error is:
TypeError Traceback (most recent call last)
<ipython-input-192-b7a124a075a0> in <module>
7 df2 = pd.json_normalize(data=mytestdata['data'], record_path=['goal', 'people_goals'],
8 meta=[['goal','user_first_name'], ['goal','user_last_name'], ['goal','user_email'], ['goal','user_id']],
----> 9 errors='ignore')
10
11 # df2 = pd.json_normalize(data=mytestdata['data'], record_path=['goal', 'people_goals'])
The only difference I see for 'user_id' is that it is not a string
Am I missing something here?
Your code works on my platform. I've migrated away from using record_path and meta parameters for two reasons. a) they are difficult to work out b) there are compatibility issues between versions of pandas
Therefore I now use approach of use json_normalize() multiple times to progressively expand JSON. Or use pd.Series. Have included both as examples.
df = pd.json_normalize(data=mytestdata['data']).explode("goal")
df = pd.concat([df, df["goal"].apply(pd.Series)], axis=1).drop(columns="goal").explode("people_goals")
df = pd.concat([df, df["people_goals"].apply(pd.Series)], axis=1).drop(columns="people_goals")
df = pd.concat([df, df["goal_type"].apply(pd.Series)], axis=1).drop(columns="goal_type")
df.T
df2 = pd.json_normalize(pd.json_normalize(
pd.json_normalize(data=mytestdata['data']).explode("goal").to_dict(orient="records")
).explode("goal.people_goals").to_dict(orient="records"))
df2.T
print(df.T.to_string())
output
0
totalCount 95
user_id 123455
user_email john.smith#test.com
user_first_name John
user_last_name Smith
goal_id 545555
goal_name test goal name
goal_owner 123455
goal_narrative
goal_create_at 1595874095
goal_modified_at 1595874095
goal_created_by 123455
goal_updated_by 123455
goal_start_date 1593561600
goal_target_date 1601424000
goal_progress 34
goal_progress_color #ff9933
goal_status 1
goal_permission internal,team
goal_category []
goal_owner_full_name John Smith
goal_team_id 766754
goal_team_name
goal_workstreams []
id 1
name Team

Python: JSON to Dictionary

Two examples for a JSON request. Both examples should have the correct JSON syntax, yet only the second version seems to be translatable to a dictionary.
#doesn't work
string_js3 = """{"employees": [
{
"FNAME":"FTestA",
"LNAME":"LTestA",
"SSN":6668844441
},
{
"FNAME":"FTestB",
"LNAME":"LTestB",
"SSN":6668844442
}
]}
"""
#works
string_js4 = """[
{
"FNAME":"FTestA",
"LNAME":"LTestA",
"SSN":6668844441
},
{
"FNAME":"FTestB",
"LNAME":"LTestB",
"SSN":6668844442
}]
"""
This gives an error, while the same with string_js4 works
L1 = json.loads(string_js3)
print(L1[0]['FNAME'])
So I have 2 questions:
1) Why doesn't the first version work
2) Is there a simple way to make the first version also work?
Both of these strings are valid JSON. Where you are getting stuck is in how you are accessing the resulting data structures.
L1 (from string_js3) is a (nested) dict;
L2 (from string_js4) is a list of dicts.
Walkthrough:
import json
string_js3 = """{
"employees": [{
"FNAME": "FTestA",
"LNAME": "LTestA",
"SSN": 6668844441
},
{
"FNAME": "FTestB",
"LNAME": "LTestB",
"SSN": 6668844442
}
]
}"""
string_js4 = """[{
"FNAME": "FTestA",
"LNAME": "LTestA",
"SSN": 6668844441
},
{
"FNAME": "FTestB",
"LNAME": "LTestB",
"SSN": 6668844442
}
]"""
L1 = json.loads(string_js3)
L2 = json.loads(string_js4)
The resulting objects:
L1
{'employees': [{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]}
L2
[{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]
type(L1), type(L2)
(dict, list)
1) Why doesn't the first version work?
Because calling L1[0] is trying to return the value from the key 0, and that key doesn't exist. From the docs, "It is an error to extract a value using a non-existent key." L1 is a dictionary with just one key:
L1.keys()
dict_keys(['employees'])
2) Is there a simple way to make the first version also work?
There are several ways, but it ultimately depends on what your larger problem looks like. I'm going to assume you want to modify the Python code rather than the JSON files/strings themselves. You could do:
L3 = L1['employees'].copy()
You now have a list of dictionaries that resembles L2:
L3
[{'FNAME': 'FTestA', 'LNAME': 'LTestA', 'SSN': 6668844441},
{'FNAME': 'FTestB', 'LNAME': 'LTestB', 'SSN': 6668844442}]

Write/read recursive structure S4 objects

I have a recursive structure of S4 objects , that can be presented ( this is a simple version) by theses 2 classes:
cl2 <-
setClass("cl2",
representation(
id = "numeric",
date="Date"),
prototype = list(
date=Sys.Date(),
id=sample(1:100,1)
)
)
cl1 <-
setClass("cl1",
representation(
date="Date",
cl2 = "cl2"
),
prototype = list(
date=Sys.Date()
)
)
I would like to save/load objects of type cl1. I opt to use json format(suitable for unstructured objects). The problem is with dates. Dates are coerced to numeric? Is there an option/solution to get dates in the right format when I serialize the object? Note that the objects can contains other objects ( recursive structure) so I would like that all dates are in the good format.
cat(RJSONIO::toJSON(cl1(),pretty=TRUE))
{
"date" : 16861,
"cl2" : {
"id" : 90,
"date" : 16861
}
}
A solution can be to replace dates by character. But I will loose the validation mechanism of S4 object and I should implement the date validation for all objects. Thanks in advance for any help.
An expected output should be like :
{
"date" :"2016-03-01",
"cl2" : {
"id" : 76,
"date" : "2016-03-01"
}
}
Reading the documentation of toJSON I found an interesting parameter:
force unclass/skip objects of classes with no defined JSON mapping
So I tried and I think this would match you need as you can simply ignore the class entry:
> s <- jsonlite::toJSON(cl1(),force=TRUE,auto_unbox=TRUE,pretty=TRUE)
> s
{
"date": "2016-03-01",
"cl2": {
"date": "2016-03-01",
"id": 67,
"class": "cl2"
},
"class": "cl1"
}
Drawback: This is still no loadable "as-is" to s4 objects with fromJSON as it will give a named list back, analyzing the list recursively to recreate S4 objects is doable, but you'll have to create the necessary as implementation to turn a named list to your classes, for your example:
setAs('list', 'cl2',
function(from, to) {
new(to, id=from[['id']], date=as.Date(from[['date']]))
})
setAs('list','cl1',
function(from, to) {
new(to,date=as.Date(from[['date']],cl2=as(from[['cl2']],'cl2')))
})
With a dummy input from previous output:
input <- '
{
"date": "2016-03-05",
"cl2": {
"date": "2016-02-01",
"id": 83,
"class": "cl2"
},
"class": "cl1"
}'
This gives:
> as(fromJSON(input),'cl1')
An object of class "cl1"
Slot "date":
[1] "2016-03-05"
Slot "cl2":
An object of class "cl2"
Slot "id":
[1] 67
Slot "date":
[1] "2016-03-01"
I let you adapt this to your real use case, probably using fromJSON(input,FALSE) to get a 'pure' list to coerce with lapply for example if you have multiples instances of your cl1 class in the json input.
One option is to use the jsonlite package to serialize. Indeed jsonlite::tojson respects date and serilze them in well formated form. The problem is jsonlite::toJSON is not defined for S4 objects. My solution is to coerce the object to a list and then seralize it:
## S4 method to coerce any S4 object to a list
setMethod("as.list",signature(x="ANY"),
function(x) {
Map(
function(y) if (isS4(slot(x,y))) as.list(slot(x,y)) else slot(x,y)
,slotNames(class(x)))
})
## coercion
jsonlite::toJSON(as.list(cl1()),pretty=TRUE,auto_unbox=TRUE)
{
"date": "2016-03-01",
"cl2": {
"id": 24,
"date": "2016-03-01"
}
}
udpdate
in as.list I replace lapply by Map to create a named list.
For the recursive reading of S4 classes from JSON you can use a similar approach:
library(RJSONIO)
createParser <- function(className) {
setAs("list", className, function(from, to) {
to <- new(to)
for (n in names(from)) {
if (isS4(slot(to, n))) {
c <- class(slot(to, n))[[1]]
o <- as(from[[n]], c)
slot(to, n) = o
} else {
slot(to, n) = from[[n]]
}
}
to
})
}
Name <- setClass("Name", slots=c("first"="character", "last"="character"))
createParser("Name")
Customer <- setClass("Customer", slots=c("name"="Name", "age"="numeric"))
createParser("Customer")
Case <- setClass("Case", slots=c("customer"="Customer"))
createParser("Case")
c1 <- Case(customer=Customer(name=Name(first="Mika", last="R"), age=100))
j <- RJSONIO::toJSON(c1)
l <- RJSONIO::fromJSON(j, simplify = FALSE)
as(l, "Case")