Remove a property in SQL Server JSON data with serialized JSON as a property inside - json

I have some JSON data in SQL Server so that the JsonEntity within this JSON data is actually serialized JSON data passed from code. I need to update this data using SQL Server, so that I need to remove one of the property within the serialized JSON.
The following is a sample data item.
{
"EventID": 901416,
"SiteID": 11394,
"JsonEntity": "{\"ContactDTO\":{\"Contact\":{\"ContactByEmail\":true,\"ContactBySMS\":false,\"ContactID\":\"400002\",\"CustomerID\":\"300001\"},\"RecipientType\":\"Responder\"},\"ContactName\":\"Tester\", \"Method\":\"Email\"}",
"TemplateID": 20001
}
I need to remove all occurrences of ContactDTO from within the Json so that the data should look like
{
"EventID": 901416,
"SiteID": 11394,
"JsonEntity":"{\"ContactName\":\"Tester\",\"Method\":\"Email\"}",
"TemplateID": 20001
}
I have been fiddling around with JSON_MODIFY and JSON_VALUE without much success

Related

Azure Data Factory - Azure Dataflow, json being converted/serialized into a different format string

For context I have Azure Dataflow reading from CosmosDB where the table has two "schemas". By this I mean one format has a field called "data" that is a JSON representation of data I need. The other format also has a field called "data" but the field is just a compressed string and I believe this is what causes the issue I'm having.
When dataflow reads the field from the source, the JSON gets turned into a non-json format through some form of serialization because when the projection gets imported, the data field is being read as a string and not a complex object. (Example below) I suspect this is because the two "data" fields have the same name.
I unfortunately cannot change how the data is being stored in CosmosDB so I cannot change one of the field names to another value.
Is there any way to prevent this, I would like to keep it in JSON format with quotes and colons etc, instead of what I have below.
Example of data in CosmosDB:
{
"Name": "sample",
"ValueInfo": [
{
"Field1": "foo",
"Field2": "bar"
}
]
}
How it looks in ADF
{
Name=sample,
ValueInfo=[
{
Field1=foo,
Field2=bar
}
]
}

How can I get Boomi to return valid JSON

I am querying records from Salesforce and trying to return the record set as a JSON array of records.
Unfortunately, it returns every record as if it was a single record as the complete JSON rather than an array element in the same JSON object.
{
"AppointmentID": "a046g00000Nyk6oAAB"
}{
"AppointmentID": "a046g00000NyjhfAAB"
}{
"AppointmentID": "a046g00000NygSfAAJ"
}
There are no commas between the records. So I built the array into the JSON response and get:
{
"Appointments": [
{
"AppointmentID": "a046g00000Nyk6oAAB"
}
]
}{
"Appointments": [
{
"AppointmentID": "a046g00000NyjhfAAB"
}
]
}{
"Appointments": [
{
"AppointmentID": "a046g00000NygSfAAJ"
}
]
}
and it sends each record as the entire JSON template rather than a element of the array. Again, it also does not send commas back between the elements. I can work with a less than ideal structure but I need valid JSON returned.
Lastly, I tried to modify the results with a Data Process Shape using s Search and Replace
searching for: \}\{
replacing with \}\,\{
trying for force a comma between the braces, but the search never finds any matches even though this is a valid Javascript regex search.
Any suggestions would be greatly appreciated.
Final/Fixed Map
It's likely that the destination profile is incorrect and that you manually created the JSON profile. I would write the JSON out that you're expecting with all of the fields and then import (when you open the JSON profile, it's a blue button in the top right).
Also, Salesforce usually returns each record as 1 document and not combined. So, it's likely multiple documents are coming out of the map and you'll need to do a combine (data process shape).

Process events from Event hub using pyspark - Databricks

I have a Mongo change stream (a pymongo application) that is continuously getting the changes in collections. These change documents as received by the program are sent to Azure Event Hubs. A Spark notebook has to read the documents as they get into Event Hub and do Schema matching (match the fields in the document with spark table columns) with the spark table for that collection. If there are fewer fields in the document than in the table, columns have to be added with Null.
I am reading the events from Event Hub like below.
spark.readStream.format("eventhubs").option(**config).load().
As said in the documentation, the original message is in the 'body' column of the dataframe that I am converting to string. Now I have got the Mongo document as a JSON string in a streaming dataframe. I am facing below issues.
I need to extract the individual fields in the mongo document. This is needed to compare what fields are present in the spark table and what is not in Mongo document. I saw a function called get_json_object(col,path). This essentially returns a string again and I cannot individually select all the columns.
If from_json can be used to convert the JSON string to Struct type, I cannot specify the Schema because we have close to 70 collections (corresponding number of spark tables as well) each sending Mongo docs with fields from 10 to 450.
If I can convert the JSON string in streaming dataframe to a JSON object whose schema can be inferred by the dataframe (something like how read.json can do), I can use SQL * representation to extract the individual columns, do few manipulations & then save the final dataframe to the spark table. Is it possible to do that? What is the mistake I am making?
Note: Stram DF doesn't support collect() method to individually extract the JSON string from underlying rdd and do the necessary column comparisons. Using Spark 2.4 & Python in Azure Databricks environment 4.3.
Below is the sample data I get in my notebook after reading the events from event hub and casting it to string.
{
"documentKey": "5ab2cbd747f8b2e33e1f5527",
"collection": "configurations",
"operationType": "replace",
"fullDocument": {
"_id": "5ab2cbd747f8b2e33e1f5527",
"app": "7NOW",
"type": "global",
"version": "1.0",
"country": "US",
"created_date": "2018-02-14T18:34:13.376Z",
"created_by": "Vikram SSS",
"last_modified_date": "2018-07-01T04:00:00.000Z",
"last_modified_by": "Vikram Ganta",
"last_modified_comments": "Added new property in show_banners feature",
"is_active": true,
"configurations": [
{
"feature": "tip",
"properties": [
{
"id": "tip_mode",
"name": "Delivery Tip Mode",
"description": "Tip mode switches the display of tip options between percentage and amount in the customer app",
"options": [
"amount",
"percentage"
],
"default_value": "tip_percentage",
"current_value": "tip_percentage",
"mode": "multiple or single"
},
{
"id": "tip_amount",
"name": "Tip Amounts",
"description": "List of possible tip amount values",
"default_value": 0,
"options": [
{
"display": "No Tip",
"value": 0
}
]
}
]
}
]
}
}
I would like to separate and take out the full_document in the sample above. When I use get_json_object, I am getting the full_document in another streaming dataframe as JSON string and not as an object. As you can see, there are some array types in full_document which I can explode (documentation says that explode is supported in streaming DF, but havent tried) but there are some objects also (like struct type) which I would like to extract the individual fields. I cannot use the SQL '*' notation because what get_json_object returns is a string and not the object itself.
Its convincing that this much varied Schema of the JSON would be better with schema mentioned explicitly. So I took it like, in a streaming environment with very different Schema of the incoming stream, its always better to specify the schema. So I am proceeding with get_json_object and from_json and reading the schema through a file.

How to read invalid JSON format amazon firehose

I've got this most horrible scenario in where i want to read the files that kinesis firehose creates on our S3.
Kinesis firehose creates files that don't have every json object on a new line, but simply a json object concatenated file.
{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}
Now is this a scenario not supported by normal JSON.parse and i have tried working with following regex: .scan(/({((\".?\":.?)*?)})/)
But the scan only works in scenario's without nested brackets it seems.
Does anybody know an working/better/more elegant way to solve this problem?
The one in the initial anwser is for unquoted jsons which happens some times. this one:
({((\\?\".*?\\?\")*?)})
Works for quoted jsons and unquoted jsons
Besides this improved it a bit, to keep it simpler.. as you can have integer and normal values.. anything within string literals will be ignored due too the double capturing group.
https://regex101.com/r/kPSc0i/1
Modify the input to be one large JSON array, then parse that:
input = File.read("input.json")
json = "[#{input.rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
You might want to combine the first two to save some memory:
json = "[#{File.read('input.json').rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
This assumes that } followed by some whitespace followed by { never occurs inside a key or value in your JSON encoded data.
As you concluded in your most recent comment, the put_records_batch in firehose requires you to manually put delimiters in your records to be easily parsed by the consumers. You can add a new line or some special character that is solely used for parsing, % for example, which should never be used in your payload.
Other option would be sending record by record. This would be only viable if your use case does not require high throughput. For that you may loop on every record and load as a stringified data blob. If done in Python, we would have a dictionary "records" having all our json objects.
import json
def send_to_firehose(records):
firehose_client = boto3.client('firehose')
for record in records:
data = json.dumps(record)
firehose_client.put_record(DeliveryStreamName=<your stream>,
Record={
'Data': data
}
)
Firehose by default buffers the data before sending it to your bucket and it should end up with something like this. This will be easy to parse and load in memory in your preferred data structure.
[
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
},
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
}
]

What is the best practice of testing JSON data?

I'm developing a server-side RESTful application which serves json data for its client applications. And I have to test so many various json outputs.
Each json has many properties and their validation methods are different like below a sample json.
So such my use-case, do you know good libraries or web services to test json data flexibly?
{
"system": { // data structure validation
"time": 1234566, // data type validation
"version": "0.0.1" // string matching validation
},
"app": { // data structure validation
"id": "1234", // string matching validation
"command": "do something", // string matching validation
"data": { // data structure validation
"hoge": "xxx", // data type validation
"fuga": 123 // data type validation
}
}
}
What do you mean by validating? validating the JSON structure or the data inside your JSON object?
To validate the structure you can parse it to another datatype like dictionary and see if you get any error while.
But to validate the data inside the object you need to validate each object in a very specific way for that object.
Use this link Just Place your JSON code and Test it. Quite easy.