How to deserialize json string with an extra "" - json

I have input Json string that I am trying to deserialize
{
"ID":1,
"Details":{
"Product":""Boston,saline"",
"cost":150.0
}
}
or
{
"ID":1,
"Details":{
"Product":"Boston "Sample"",
"cost":150.0
}
}
When I try to use $JsonConvert.DeserializeObject<JObject>(input) it gives me error saying "After parsing a value an unexpected character was encountered" and this is expected. Is there a way we can deserialize this kind of strings?
Thanks!

Could you use .replace from native String method to replace all double double quotes with a single double quote.

You can’t.
Say a user is able to create a product named ", "x": "y. This will be represented as (incorrectly serialized to)
"Details": {
"Product": "", "x": "y",
"cost": 150.0
}
That is valid JSON, so there’s no syntactic way to detect that anything is wrong.
Even worse, what if an attacker creates a product named ", "cost": 1.0, "x": "y. This will be represented as (with some formatting applied)
"Details": {
"Product": "",
"cost": 1.0,
"x": "y",
"cost": 150.0
}
What will your deserializer do with that? Will it accept it? (Some will.) Will it use the first "cost" or the second "cost"—that is, will it allow an attacker to lower the cost? And would it be possible for an attacker to create a duplicate "cost" after the real one? And even if everything is currently okay when deserializing that string, will it continue to work after upgrading to a new version of the library?
As of 2017, Json.net accepts such JSON and uses the second duplicate value (see Json.net no longer throws in case of duplicate). But this behavior changed between versions 6 and 8. Will it change again?
The only real fix is to fix whatever is generating this JSON.

The stuff in your examples is not JSON;
that is why you are having problems parsing it with a JSON parser.
Consider fixing either how the data is stored or how it is read.
Option 1: Fix how it is stored.
It looks like the string value "blammy"
is actually stored in your DB as something like: \"blammy\".
This is zero percent correct.
The value that gets stored in your DB column should be blammy
(notice, no quotes).
Fix that problem.
Option 2: Fix how is is read
Something is reading the column value from your DB.
Change that thing to remove outer double quotes from string data.
Then,
the (incorrectly) stored value \"blammy\" would be
read (and fixed) to be blammy.
This will only work with example 1 in your question.
You will need to do something else if you are getting example 2.

Related

How to read invalid JSON format amazon firehose

I've got this most horrible scenario in where i want to read the files that kinesis firehose creates on our S3.
Kinesis firehose creates files that don't have every json object on a new line, but simply a json object concatenated file.
{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}{"param1":"value1","param2":numericvalue2,"param3":"nested {bracket}"}
Now is this a scenario not supported by normal JSON.parse and i have tried working with following regex: .scan(/({((\".?\":.?)*?)})/)
But the scan only works in scenario's without nested brackets it seems.
Does anybody know an working/better/more elegant way to solve this problem?
The one in the initial anwser is for unquoted jsons which happens some times. this one:
({((\\?\".*?\\?\")*?)})
Works for quoted jsons and unquoted jsons
Besides this improved it a bit, to keep it simpler.. as you can have integer and normal values.. anything within string literals will be ignored due too the double capturing group.
https://regex101.com/r/kPSc0i/1
Modify the input to be one large JSON array, then parse that:
input = File.read("input.json")
json = "[#{input.rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
You might want to combine the first two to save some memory:
json = "[#{File.read('input.json').rstrip.gsub(/\}\s*\{/, '},{')}]"
data = JSON.parse(json)
This assumes that } followed by some whitespace followed by { never occurs inside a key or value in your JSON encoded data.
As you concluded in your most recent comment, the put_records_batch in firehose requires you to manually put delimiters in your records to be easily parsed by the consumers. You can add a new line or some special character that is solely used for parsing, % for example, which should never be used in your payload.
Other option would be sending record by record. This would be only viable if your use case does not require high throughput. For that you may loop on every record and load as a stringified data blob. If done in Python, we would have a dictionary "records" having all our json objects.
import json
def send_to_firehose(records):
firehose_client = boto3.client('firehose')
for record in records:
data = json.dumps(record)
firehose_client.put_record(DeliveryStreamName=<your stream>,
Record={
'Data': data
}
)
Firehose by default buffers the data before sending it to your bucket and it should end up with something like this. This will be easy to parse and load in memory in your preferred data structure.
[
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
},
{
"metadata": {
"schema_id": "4096"
},
"payload": {
"zaza": 12,
"price": 20,
"message": "Testing sendnig the data in message attribute",
"source": "coming routing to firehose"
}
}
]

Translating JSON values using io.circe

I have a function in scala that translates a value and produces a string.
strOut = translate(strIn)
Suppose the following JSON object:
{
"id": "c730433b-082c-4984-3d56-855c243265f0",
"standard": "stda",
"timestamp": "tsx000",
"stdparms" : {
"stdparam1": "a",
"stdparam2": "b"
}
}
and the following mapping provided by the translation function:
"stda" -> "stdb"
"tsx000" -> "tsy000"
"a" -> "f"
"b" -> "g"
What is the best way to translate the whole JSON object using the translate function? My goal is to obtain the following result:
{
"id": "c730433b-082c-4984-3d56-855c243265f0",
"standard": "stdb",
"timestamp": "tsy000",
"stdparms" : {
"stdparam1": "f",
"stdparam2": "g"
}
}
I must use the io.circe library due to project related matters.
If you know beforehand which fields you want to translate, or what translations apply to that field, you can use Cursors to traverse the JSON tree. Or if the fields themselves are fixed (you always know what fields to expect) Optics may require less code.
When you get to the right leaf, you apply the translation.
However, when you don't know what could apply when/where it might be easier to find/replace using string methods.
Note that the JSON you provided as an example is not valid JSON by the way.

query with filter in JsonPath

I have a structure like this:
{
title: [
{lang: “en”, value: “snickers”},
{lang: “ru”, value: “сниккерс”}
]
}
I need to take a result such this:
“snickers”
I finished with query:
$.title[?(#.lang="en")].value
but it doesn't work
JsonPath always returns an array of results, so you will not be able to get a single value such as "snickers" on its own. You will be able to get ["snickers"] though. To get it working you need to use a double equal sign instead of single:
$.title[?(#.lang=="en")].value
Also note that another reason you may be having issues is that your double quotes in the json are not the standard double quote characters. Depending on how you consume the json you may also need to wrap the property names in double quotes. The json standard is quite strict, and if you are parsing your json from text it has to follow these rules. Here is version of the json corrected in this way:
{
"title": [
{"lang": "en", "value": "snickers"},
{"lang": "ru", "value": "сниккерс"}
]
}
I have tested the above query with the corrected json using http://www.jsonquerytool.com.

Invalid JSON but validates on JSONLint

I have the following JSON which validates on JSONLint.com but when I pass it to JSON.parse() I get the error
SyntaxError: JSON.parse: unexpected character
...0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...
This is apparently the last "correct": line
var theJSON = JSON.parse({
"data": [
{
"wrong": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"correct": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
"wrong": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"correct": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
"wrong": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"correct": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
"wrong": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
"correct": "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
}
]
});
You, like many, have confused JavaScript's literal syntax with JSON. This happens a lot as JSON uses a subset of JavaScript's literal syntax so it looks a lot alike. JSON, however, is always a string. It is a serialized data scheme for porting data structures between langs/platforms.
Also confusing is that a string of JSON which has been output by any platform can be copied and pasted right into JavaScript and used. Again, this is because of the shared syntax. Having pasted such output right into JavaScript, however, one is no longer using JSON--they are now writing JavaScript in literal syntax. That is, unless, you pasted it between quotes and properly escaped the resulting string. But there's no sense in doing so as then it needs to be parsed in order to end up with what you already had.
JSON.parse() is a method for unserializing data which had been serialized into JSON. It expects a string because, well, JSON is a string. You're passing an object (in literal syntax). It does not need parsing...it is already the thing you want.
Wrapping your object literal in single quotes would make the code work, but it would be pointless to do so as the parse would simply result in what you already have.
Your code would be better written if you replaced the variable named theJSON with one named theObject and made it look as such:
var theObject = {
data: [
{
wrong: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
correct: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
wrong: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
correct: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
wrong: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
correct: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
},
{
wrong: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0",
correct: "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
}
]
};
Whatever code wanted to use the parse result should be fine once you've done it.

How do you represent a JSON array of strings?

This is all you need for valid JSON, right?
["somestring1", "somestring2"]
I'll elaborate a bit more on ChrisR awesome answer and bring images from his awesome reference.
A valid JSON always starts with either curly braces { or square brackets [, nothing else.
{ will start an object:
{ "key": value, "another key": value }
Hint: although javascript accepts single quotes ', JSON only takes double ones ".
[ will start an array:
[value, value]
Hint: spaces among elements are always ignored by any JSON parser.
And value is an object, array, string, number, bool or null:
So yeah, ["a", "b"] is a perfectly valid JSON, like you could try on the link Manish pointed.
Here are a few extra valid JSON examples, one per block:
{}
[0]
{"__comment": "json doesn't accept comments and you should not be commenting even in this way", "avoid!": "also, never add more than one key per line, like this"}
[{ "why":null} ]
{
"not true": [0, false],
"true": true,
"not null": [0, 1, false, true, {
"obj": null
}, "a string"]
}
Your JSON object in this case is a list. JSON is almost always an object with attributes; a set of one or more key:value pairs, so you most likely see a dictionary:
{ "MyStringArray" : ["somestring1", "somestring2"] }
then you can ask for the value of "MyStringArray" and you would get back a list of two strings, "somestring1" and "somestring2".
Basically yes, JSON is just a javascript literal representation of your value so what you said is correct.
You can find a pretty clear and good explanation of JSON notation on http://json.org/
String strJson="{\"Employee\":
[{\"id\":\"101\",\"name\":\"Pushkar\",\"salary\":\"5000\"},
{\"id\":\"102\",\"name\":\"Rahul\",\"salary\":\"4000\"},
{\"id\":\"103\",\"name\":\"tanveer\",\"salary\":\"56678\"}]}";
This is an example of a JSON string with Employee as object, then multiple strings and values in an array as a reference to #cregox...
A bit complicated but can explain a lot in a single JSON string.