Enforce and check JSON format in Azure Table Storage - json

Currently, I use Azure Table Storage to store some configuration data (see below for an example). The data is kind of unstructured, so I store it into the table using JSON. The whole document is stored in a database field.
{
"group1": [
"value1",
"value2",
"value3",
"subgroup": [
"value1",
"value2"
]
],
"othergroup": [
"value1"
]
}
Is there a way to at least enforce a valid JSON string to be stored in the database? Because Table Storage under the hood store is as a string, there is no validity with respect to JSON at all. Neither is the JSON automatically formatted.
Previously at another project I worked with MongoDB and Studio 3T and was really happy with how you could manage JSON in it. Then, it was simply not possible to store an invalid JSON string into MongoDB.
However, Studio 3T is not usable for Table Storage, as far as I know..

Just to summarize, Azure table storage / Azure Storage Explorer does not support json validation.
It's properties are just strings for no-sql storage. As of now, you should implement your own logic for json validation.
Hope it can help others who have the same issue.

Related

Importing Well-Structured JSON Data into ElasticSearch via Cloud Watch

Is is there known science for getting JSON data logged via Cloud Watch imported into an Elasticsearch instance as well structured JSON?
That is -- I'm logging JSON data during the execution of an Amazon Lambda function.
This data is available via Amazon's Cloud Watch service.
I've been able to import this data into an elastic search instance using functionbeat, but the data comes in as an unstructured message.
"_source" : {
"#timestamp" : "xxx",
"owner" : "xxx",
"message_type" : "DATA_MESSAGE",
"cloud" : {
"provider" : "aws"
},
"message" : ""xxx xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx INFO {
foo: true,
duration_us: 19418,
bar: 'BAZ',
duration_ms: 19
}
""",
What I'm trying to do is get a document indexed into elastic that has a foo field, duration_us field, bar field, etc. Instead of one that has a plain text message field.
It seems like there are a few different ways to do this, but I'm wondering if there's a well trod path for this sort of thing using elastic's default tooling, or if I'm doomed to one more one-off hack.
Functionbeat is a good starting point and will allow you to keep it as "serverless" as possible.
To process the JSON, you can use the decode_json_fields processor.
The problem is that your message isn't really JSON though. Possible solutions I could think of:
A dissect processor that extracts the JSON message to pass it on to the decode_json_fields — both in the Functionbeat. I'm wondering if trim_chars couldn't be abused for that — trim any possible characters except for curly braces.
If that is not enough, you could do all the processing in Elasticsearch's Ingest pipeline where you probably stitch this together with a Grok processor and then the JSON processor.
Only log a JSON message if you can to make your life simpler; potentially move the log level into the JSON structure.

Why does the glTF schema define enums like this?

If I search for "enum" in the glTF 2.0 schema, I see a lot of definitions of enums like:
"type": {
"description": "Specifies if the camera uses a perspective or orthographic projection.",
"gltf_detailedDescription": "Specifies if the camera uses a perspective or orthographic projection. Based on this, either the camera's `perspective` or `orthographic` property will be defined.",
"anyOf": [
{
"enum": [ "perspective" ]
},
{
"enum": [ "orthographic" ]
},
{
"type": "string"
}
]
},
(from the camera schema)
I have several questions about this:
I don't understand why this is anyOf instead of oneOf? My understanding is that a camera type is EITHER perspective or orthographic, and my understanding of json schema is that 'anyOf' allows validation against multiple values in the array).
I don't understand the "type":"string" field? To me that reads as though any string value would be valid? This seems inconsistent with the camera definition of glTF?
There are multiple instances of enums like this. see also:
here here
Thanks in advance for any clarity someone can provide.
At the time (2017) we were using JSON schema draft v4, and support for enums was not up to where we needed it to be. Previously there had been a simple list of enums, but I requested there to be per-enum descriptions in the schema. This better documents the individual enum values in the schema, and allows formatting software to display the description of an individual enum value. I filed an issue on that here:
https://github.com/KhronosGroup/glTF/issues/891
Further down that issue, a problem was uncovered with oneOf that made it incompatible with TypeScript, and a decision was made to switch to anyOf instead. You can still only choose one of the available enums, in spite of this change.
Later, in the Pull Request that implemented this change, one of the spec editors explained that the extra "type" : "string" on the end there is to allow future forwards-compatibility. Basically this means that glTF 2.0 extensions are allowed (and encouraged) to define new enum values that don't exist in core glTF 2.0 schema, and they may do so without violating the schema. They cannot arbitrarily add new fields, however, as the schema is strict about that. New fields must be placed into an extension or extras object of the appropriate name. But new enums can go right in the same field where the existing enums are now.
Ultimately, we ended up with a schema that may be a little cumbersome for humans to look at, but works well in a wide variety of validation software that deals with JSON schemas. And the humans can just look at the Properties Reference README instead of the raw schema files, it's easier on the eyes.

Kafka Connect transforming JSON string to actual JSON

I'm trying to figure out whether it's possible to transform JSON values that are stored as strings into actual JSON structures using Kafka Connect.
I tried looking for such a transformation but couldn't find one. As an example, this could be the source:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":"{\"Tags\":[{\"TagID\":1,\"TagName\":\"Java\"},{\"TagID\":2,\"TagName\":\"Kafka\"}]}"
}
And this is my goal:
{
"UserID":2105058535,
"DocumentID":2105058535,
"RandomJSON":{
"Tags":[
{
"TagID":1,
"TagName":"Java"
},
{
"TagID":2,
"TagName":"Kafka"
}
]
}
}
I'm trying to make these transformations for Elasticsearch sink connector if it makes a difference.
I know I can use Logstash together with JSON filter in order to do this, but I'd like to know whether there's a way to do it using just Kafka Connect.
Sounds like this would be a Single Message Transform (thus applicable to any connector, not just ES), but there aren't any out of the box doing what you describe. The API is documented here.
I had a similar issue but in reverse. I had the data in Json and I needed to convert some of it into a Json string representation to store it in Cassandra using the Cassandra Sink. I ended up creating a Kafka stream app that reads from the topic and then output the Json object to another topic that is read by the connector.
topic document <- read by your kafka stream with a call to mapValues or create a Jackson POJO that serializes as you want, and then write value to -> topic document.elasticsearch
You can use FromJson converter.
Please check this link for more details
https://jcustenborder.github.io/kafka-connect-documentation/projects/kafka-connect-json-schema/transformations/examples/FromJson.inline.html

How to update json field in Firebase DB with JMeter HTTP Request

I'm working with JMeter to make some HTTP Requests to my Firebase Database. I am able to create json data with a regular request, as well as with a CSV file. I'm wondering if it's possible to update, or add to, a json object.
My json data looks something like what is below. Let's say I wanted to add a boolean node called "sold", to which I could make equal to true or false. Could I create it within that json object? If so, could I also make it so that only fields with a specific "name" get updated?
{
"Price": "5.00",
"name": "buyer#gmail.com",
"seller_name": "seller#gmail.com",
"time": 1496893589683,
}
Looking into Updating Data with PATCH chapter of Saving Data article you can update a single field using HTTP PATCH Method.
JMeter supports HTTP PATCH method since version 2.8 so you should be in a position to use it as well in your test.

How to replicate a foreign remote database into a local database? (CouchDB / MongoDB)

I would like to extend the data model of a remote database that is available via a web service interface. Data can be requested via HTTP GET and is delivered as JSON (example request). Other formats are supported as well.
// URL of the example request.
http://data.wien.gv.at/daten/wfs?service=WFS&request=GetFeature&version=1.1.0&typeName=ogdwien:BAUMOGD&srsName=EPSG:4326&outputFormat=json&maxfeatures=5
First object of the JSON answer.
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": "BAUMOGD.3390628",
"geometry": {
"type": "Point",
"coordinates": [
16.352910973544105,
48.143425569989326
]
},
"geometry_name": "SHAPE",
"properties": {
"BAUMNUMMER": "1022 ",
"GEBIET": "Strassen",
"STRASSE": "Jochen-Rindt-Strasse",
"ART": "Gleditsia triacanthos (Lederhülsenbaum)",
"PFLANZJAHR": 1995,
"STAMMUMFANG": 94,
"KRONENDURCHMESSER": 9,
"BAUMHOEHE": 11
}
},
...
My idea is to extend the data model (e.g. add a text field) on my own server and therefore mirror the database somehow. I stumbled into CouchDB and its document-based architecture which feels suitable to handle those aforementioned JSON objects. Now, I ask for advise on how to replicate the foreign database initially and on a regularly basis.
Do you think CouchDB is a good choice? I also thought about MongoDB. If possible, I would like to avoid building a full Rails backend to setup the replication. What do you recommend?
If the remote database is static (data doesn't change), then it could work. You just have to find a way to iterate all records. Once you figured that out, the rest is simple as a pie: 1) query data; 2) store the response in a local database; 3) modify as you see fit.
If remote data changes, you'll have many troubles going this way (you'll have to re-sync in the same fashion every once in a while). What I'd do instead is create a local database with only new fields and a reference to the original piece of data. That is, when you request data from remote service, you also look if you have something in the local db and merge those two before processing the final result.