python read json file into a dictionary can't use 'is' keyword for comparing string values - json

I am using json.load() to read a json config file into a dictionary, but I couldn't use is keyword for comparing string values.
config = json.load(open(config_file))
if config['some_key'] is 'abc':
# code block
even the value of config['some_key'] is abc, the if block is passed somehow. But if I changed the code to,
if config['some_key'] == 'abc':
# code block
the if code body will get executed. I am wondering what is the issue. Since string is a python object, so should be no problem using is for value comparison.

Related

Processing JSON from a .txt file and converting to a DataFrame in Julia

Cross posting from Julia Discourse in case anyone here has any leads.
I’m just looking for some insight into why the below code is returning a dataframe containing just the first line of my json file. If you’d like to try working with the file I’m working with, you can download the aminer_papers_0.zip from the Microsoft Open Academic Graph site, I’m using the first file in that group of files.
using JSON3, DataFrames, CSV
file_name = "path/aminer_papers_0.txt"
json_string = read(file_name, String)
js = JSON3.read(json_string)
df = DataFrame([js])
The resulting DataFrame has just one line, but the column titles are correct, as is the first line. To me the mystery is why the rest isn’t getting processed. I think I can rule out that read() is only reading the first JSON object, because I can index into the resulting object and see many JSON objects:
enter image description here
My first guess was maybe the newline \n was causing escape issues, and tried to use chomp to get rid of them, but couldn’t get it to work.
Anyway - any help would be greatly appreciated!
I think the problem is that the file is in JSON Lines format, and the JSON3 library only returns the first valid JSON value that it finds at the start of a string unless told otherwise.
tl;dr
Call JSON3.read with the keyword argument jsonlines=true.
Why?
By default, JSON3 interprets a string passed to its read function as a single "JSON text", defined by RFC 8259 section 1.3.2:
A JSON text is a serialized value....
(My emphasis on the use of the indefinite singular article "a.") A "JSON value" is defined in section 1.3.3:
A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true.
A string with multiple JSON values in it is technically multiple "JSON texts." It is up to the parser to determine what part of the string argument you give it is a JSON text, and the authors of JSON3 chose as the default behavior to parse from the start of the string to the end of the first valid JSON value.
In order to get JSON3 to read the string as multiple JSON values, you have to give it the keyword option jsonlines=true, which is documented as:
jsonlines: A Bool indicating that the json_str contains newline delimited JSON strings, which will be read into a JSON3.Array of the JSON values. See jsonlines for reference. [default false]
Example
Take for example this simple string:
two_values = "3.14\n2.72"
Each one of these lines is a valid JSON serialization of a number. However, when passed to JSON3.read, only the first is parsed:
using JSON3
#assert JSON3.read(two_values) == 3.14
Using jsonlines=true, both values are parsed and returned as a JSON3.Array struct:
#assert JSON3.read(two_values, jsonlines=true) == [3.14, 2.72]
Other Packages
The JSON.jl library, which people might use by default given the name, does not implement parsing of JSON Lines strings at all, leaving it up to the caller to properly split the string as needed:
using JSON
JSON.parse(two_values)
# ERROR: Expected end of input
# Line: 1
# Around: ...3.14 2.72...
# ^
A simple way to implement reading multiple values is to use eachline:
#assert [JSON.parse(line) for line in eachline(IOBuffer(two_values))] == [3.14, 2.72]

Convert JSON string to dictionary in ROBOT using json.loads() - why are triple quotes needed?

I've read that json.loads() can be used by Robot Framework to convert a JSON string to a dictionary in this post: Json handling in ROBOT
So if you define a dictionary-like string like this:
${json_string} Set Variable {"key1": "value1", "key2": "value2", "key3": "value3"}
You can then use the following to convert it to a dictionary:
${dict} Evaluate json.loads('''${json_string}''') json
My question is simple - why are the triple quotes needed here to surround the argument?
If single quotes are used an exception is thrown stating a string must be used:
${dict} Evaluate json.loads('${json_string}') json
(Edit) The above is a bad example, it actually works. If double quotes are used, though, it fails for SyntaxError: invalid syntax.
If no quotes at all are used an error occurs that indicates that the variable is a dictionary - but in Robot it isn't a dictionary (TypeError: the JSON object must be str, bytes or bytearray, not dict):
${dict} Evaluate json.loads(${json_string}) json
If Robot's Convert To String is used on the ${json_string} variable and then that new variable is passed to the json.loads() method the same TypeError occurs stating a string must be used, not a dictionary - but it has been converted to a string:
${json_string2} Convert To String ${json_string}
${dict} Evaluate json.loads(${json_string2}) json
What are the triple quotes accomplishing that are not being accomplished by the other two? This seems to be an aspect of Robot framework...
I'll go ahead and follow up in an answer since it's simpler that way, I think does answer your follow-up question in this comment.
I'm not very familiar with Robot Framework but if I understand correctly all it's doing in .robot files when you use ${variable} substitution is simple string templating: so when you pass a ${variable} into an expression, no matter what type the underlying variable is, it always substitutes its string representation, so the expression you're trying to evaluate is literally:
json.loads({'key': "value1', 'key2': 'value2'})
This is why you need the """ (in principle you could use just ", but """ would be much safer). Incidentally, if you converted the above dict to its standard string representation in Python it would not be valid JSON, because JSON requires double-quotes not single-quotes.
The official documentation on this is a little confusing and seems to contradict itself (I think it was not written by a native English speaker which is OK):
When a variable is used in the expressing using the normal ${variable} syntax, its value is replaces before the expression is evaluated. This means that the value used in the expression will be the string representation of the variable value, not the variable value itself. This is not a problem with numbers and other objects that have a string representation that can be evaluated directly, but with other objects the behavior depends on the string representation.
I say it contradicts itself because first is says "its value [is replaced] before the expression is evaluated". But then it says "the value used in the expression will be the string representation of the variable value" (which is not the same as the value itself). The latter explanation seems to be the correct one though.
However, it seems with Evaluate you can also use the syntax $variable to replace the literal value of the variable instead of its string representation:
Starting from Robot Framework 2.9, variables themselves are automatically available in the evaluation namespace. They can be accessed using special variable syntax without the curly braces like $variable. These variables should never be quoted, and in fact they are not even replaced inside strings.
But in your case you wrote:
${json_string} Set Variable {"key1": "value1", "key2": "value2", "key3": "value3"}
As I understand it "Set Variable" just stores the value as a string literal. So indeed it should suffice then to run (as you did):
${dict} Evaluate json.loads('''${json_string}''') json
In your final example "Convert To String" is not doing anything, because as you wrote ${json_string} already replaces as a string. You just have to understand that the result of json.loads(${json_string}) is a Python expression where ${json_string} is just replaced literally with the contents of that template variable--it is not the same as passing a str value to json.loads(). The latter I believe may be achievable as
${dict} Evaluate json.loads($json_string) json
at least, going by the docs. But I have not tested this.

Understanding NewtonSoft in PowerShell

I making a foray into the world of JSON parsing and NewtonSoft and I'm confused, to say the least.
Take the below PowerShell script:
$json = #"
{
"Array1": [
"I am string 1 from array1",
"I am string 2 from array1"
],
"Array2": [
{
"Array2Object1Str1": "Object in list, string 1",
"Array2Object1Str2": "Object in list, string 2"
}
]
}
"#
#The newtonSoft way
$nsObj = [Newtonsoft.Json.JsonConvert]::DeserializeObject($json, [Newtonsoft.Json.Linq.JObject])
$nsObj.GetType().fullname #Type = Newtonsoft.Json.Linq.JObject
$nsObj[0] #Returns nothing. Why?
$nsObj.Array1 #Again nothing. Maybe because it contains no key:value pairs?
$nsObj.Array2 #This does return, maybe because has object with kv pairs
$nsObj.Array2[0].Array2Object1Str1 #Returns nothing. Why? but...
$nsObj.Array2[0].Array2Object1Str1.ToString() #Cool. I get the string this way.
$nsObj.Array2[0] #1st object has a Path property of "Array2[0].Array2Object1Str1" Great!
foreach( $o in $nsObj.Array2[0].GetEnumerator() ){
"Path is: $($o.Path)"
"Parent is: $($o.Parent)"
} #??? Why can't I see the Path property like when just output $nsObj.Array2[0] ???
#How can I find out what the root parent (Array2) is for a property? Is property even the right word?
I'd like to be able to find the name of the root parent for any given position. So above, I'd like to know that the item I'm looking at (Array2Object1Str1) belongs to the Array2 root parent.
I think I'm not understanding some fundamentals here. Is it possible to determine the root parent? Also, any help in understanding my comments in the script would be great. Namely why I can't return things like path or parent, but can see it when I debug in VSCode.
dbc's answer contains helpful background information, and makes it clear that calling the NewtonSoft Json.NET library from PowerShell is cumbersome.
Given PowerShell's built-in support for JSON parsing - via the ConvertFrom-Json and ConvertTo-Json cmdlets - there is usually no reason to resort to third-party libraries (directly[1]), except in the following cases:
When performance is paramount.
When the limitations of PowerShell's JSON parsing must be overcome (lack of support for empty key names and keys that differ in letter case only).
When you need to work with the Json.NET types and their methods rather than with the method-less "property-bag" [pscustomobject] instances ConvertFrom-Json constructs.
While working with NewtonSoft's Json.NET directly in PowerShell is awkward, it is manageable, if you observe a few rules:
Lack of visible output doesn't necessarily mean that there isn't any output at all:
Due to a bug in PowerShell (as of v7.0.0-preview.4), [JValue] instances and [JProperty] instances containing them produce no visible output by default; access their (strongly typed) .Value property instead (e.g., $nsObj.Array1[0].Value or $nsProp.Value.Value (sic))
To output the string representation of a [JObject] / [JArray] / [JProperty] / [JValue] instance, do not rely on output as-is (e.g, $nsObj), use explicit stringification with .ToString() (e.g., $nsObj.ToString()); while string interpolation (e.g., "$nsObj") does generally work, it doesn't with [JValue] instances, due to the above-mentioned bug.
[JObject] and [JArray] objects by default show a list of their elements' instance properties (implied Format-List applied to the enumeration of the objects); you can use the Format-* cmdlets to shape output; e.g., $nsObj | Format-Table Path, Type.
Due to another bug (which may have the same root cause), as of PowerShell Core 7.0.0-preview.4, default output for [JObject] instances is actually broken in cases where the input JSON contains an array (prints error format-default : Target type System.Collections.IEnumerator is not a value type or a non-abstract class. (Parameter 'targetType')).
To numerically index into a [JObject] instance, i.e. to access properties by index rather than by name, use the following idiom: #($nsObj)[<n>], where <n> is the numerical index of interest.
$nsObj[<n>] actually should work, because, unlike C#, PowerShell exposes members implemented via interfaces as directly callable type members, so the numeric indexer that JObject implements via the IList<JToken> interface should be accessible, but isn't, presumably due to this bug (as of PowerShell Core 7.0.0-preview.4).
The workaround based on #(...), PowerShell's array-subexpression operator, forces enumeration of a [JObject] instance to yield an array of its [JProperty] members, which can then be accessed by index; note that this approach is simple, but not efficient, because enumeration and construction of an aux. array occurs; however, given that a single JSON object (as opposed to an array) typically doesn't have large numbers of properties, this is unlikely to matter in practice.
A reflection-based solution that accesses the IList<JToken> interface's numeric indexer is possible, but may even be slower.
Note that additional .Value-based access may again be needed to print the result (or to extract the strongly typed property value).
Generally, do not use the .GetEnumerator() method; [JObject] and [JArray] instances are directly enumerable.
Keep in mind that PowerShell may automatically enumerate such instances in contexts where you don't expect it, notably in the pipeline; notably, when you send a [JObject] to the pipeline, it is its constituent [JProperty]s that are sent instead, individually.
Use something like #($nsObj.Array1).Value to extract the values of an array of primitive JSON values (strings, numbers, ...) - i.e, [JValue] instances - as an array.
The following demonstrates these techniques in context:
$json = #"
{
"Array1": [
"I am string 1 from array1",
"I am string 2 from array1",
],
"Array2": [
{
"Array2Object1Str1": "Object in list, string 1",
"Array2Object1Str2": "Object in list, string 2"
}
]
}
"#
# Deserialize the JSON text into a hierarchy of nested objects.
# Note: You can omit the target type to let Newtonsoft.Json infer a suitable one.
$nsObj = [Newtonsoft.Json.JsonConvert]::DeserializeObject($json)
# Alternatively, you could more simply use:
# $nsObj = [Newtonsoft.Json.Linq.JObject]::Parse($json)
# Access the 1st property *as a whole* by *index* (index 0).
#($nsObj)[0].ToString()
# Ditto, with (the typically used) access by property *name*.
$nsObj.Array1.ToString()
# Access a property *value* by name.
$nsObj.Array1[0].Value
# Get an *array* of the *values* in .Array1.
# Note: This assumes that the array elements are JSON primitives ([JValue] instances.
#($nsObj.Array1).Value
# Access a property value of the object contained in .Array2's first element by name:
$nsObj.Array2[0].Array2Object1Str1.Value
# Enumerate the properties of the object contained in .Array2's first element
# Do NOT use .GetEnumerator() here - enumerate the array *itself*
foreach($o in $nsObj.Array2[0]){
"Path is: $($o.Path)"
"Parent is: $($o.Parent.ToString())"
}
[1] PowerShell Core - but not Windows PowerShell - currently (v7) actually uses NewtonSoft's Json.NET behind the scenes.
You have a few separate questions here:
$nsObj[0] #Returns nothing. Why?
This is because nsObj corresponds to a JSON object, and, as explained in this answer to How to get first key from JObject?, JObject does not directly support accessing properties by integer index (rather than property name).
JObject does, however, implement IList<JToken> explicitly so if you could upcast nsObj to such a list you could access properties by index -- but apparently it's not straightforward in PowerShell to call an explicitly implemented method. As explained in the answers to How can I call explicitly implemented interface method from PowerShell? it's necessary to do this via reflection.
First, define the following function:
Function ChildAt([Newtonsoft.Json.Linq.JContainer]$arg1, [int]$arg2)
{
$property = [System.Collections.Generic.IList[Newtonsoft.Json.Linq.JToken]].GetProperty("Item")
$item = $property.GetValue($nsObj, #([System.Object]$arg2))
return $item
}
And then you can do:
$firstItem = ChildAt $nsObj 0
Try it online here.
#??? Why can't I see the Path property like when just output $nsObj.Array2[0] ???
The problem here is that JObject.GetEnumerator() does not return what you think it does. Your code assumes it returns the JToken children of the object, when in fact it is declared as
public IEnumerator<KeyValuePair<string, JToken>> GetEnumerator()
Since KeyValuePair<string, JToken> doesn't have the properties Path or Parent your output method fails.
JObject does implement interfaces like IList<JToken> and IEnumerable<JToken>, but it does so explicitly, and as mentioned above calling the relevant GetEnumerator() methods would require reflection.
Instead, use the base class method JContainer.Children(). This method works for both JArray and JObject and returns the immediate children in document order:
foreach( $o in $nsObj.Array2[0].Children() ){
"Path is: $($o.Path)"
"Parent is: $($o.Parent)"
}
Try it online here.
$nsObj.Array1 #Again nothing. Maybe because it contains no key:value pairs?
Actually this does return the value of Array1, if I do
$nsObj.Array1.ToString()
the JSON corresponding to the value of Array1 is displayed. The real issue seems to be that PowerShell doesn't know how to automatically print a JArray with JValue contents -- or even a simple, standalone JValue. If I do:
$jvalue = New-Object Newtonsoft.Json.Linq.JValue 'my jvalue value'
'$jvalue' #Nothing output
$jvalue
'$jvalue.ToString()' #my jvalue value
$jvalue.ToString()
Then the output is:
$jvalue
$jvalue.ToString()
my jvalue value
Try it online here and, relatedly, here.
Thus the lesson is: when printing a JToken hierarchy in PowerShell, always use ToString().
As to why printing a JObject produces some output while printing a JArray does not, I can only speculate. JToken implements the interface IDynamicMetaObjectProvider which is also implemented by PSObject; possibly something about the details of how this is implemented for JObject but not JValue or JArray are compatible with PowerShell's information printing code.

passing variable to json file while matching response in karate

I'm validating my response from a GET call through a .json file
match response == read('match_response.json')
Now I want to reuse this file for various other features as only one field in the .json varies. Let's say this param in the json file is "varyingField"
I'm trying to pass this field every time I am matching the response but not able to
def varyingField = 'VARIATION1'
match response == read('match_response.json') {'varyingField' : '#(varyingField)'}}
In the json file I have
"varyingField": "#(varyingField)"
You are trying to use an argument to read for a JSON file ? Sorry such a thing is not supported in Karate, please read the docs.
Use this pattern:
create a JSON file that has all your "happy path" values set
use the read() syntax to load the file (which means this is re-usable across multiple tests)
use the set keyword to update only the field for your scenario or negative test
For more details, refer this answer: https://stackoverflow.com/a/51896522/143475

Use a period in a field name in a Matlab struct

I'm using webwrite to post to an api. One of the field names in the json object I'm trying to setup for posting is odata.metadata. I'm making a struct that looks like this for the json object:
json = struct('odata.metadata', metadata, 'odata.type', type, 'Name', name,);
But I get an error
Error using struct
Invalid field name "odata.metadata"
Here's the json object I'm trying to use in Matlab. All strings for simplicity:
{
"odata.metadata": "https://website.com#Element",
"odata.type": "Blah.Blah.This.That",
"Name": "My Object"
}
Is there a way to submit this json object or is it a lost cause?
Field names are not allowed to have dots in them. The reason why is because this will be confused with accessing another nested structure within the structure itself.
For example, doing json.odata.metadata would be interpreted as json being a struct with a member whose field name is odata where odata has another member whose field name is metadata. This would not be interpreted as a member with the combined field name as odata.metadata. You're going to have to rename the field to something else or change the convention of your field name slightly.
Usually, the convention is to replace dots with underscores. An automated way to take care of this if you're not willing to manually rename the field names yourself is to use a function called matlab.lang.makeValidName that takes in a string and converts it into a valid field name. This function was introduced in R2014a. For older versions, it's called genvarname.
For example:
>> matlab.lang.makeValidName('odata.metadata')
ans =
odata_metadata
As such, either replace all dots with _ to ensure no ambiguities or use matlab.lang.makeValidName or genvarname to take care of this for you.
I would suggest using a a containers.Map instead of a struct to store your data, and then creating your JSON string by iterating over the Map filednames and appending them along with the data to your JSON.
Here's a quick demonstration of what I mean:
%// Prepare the Map and the Data:
metadata = 'https://website.com#Element';
type = 'Blah.Blah.This.That';
name = 'My Object';
example_map = containers.Map({'odata.metadata','odata.type','Name'},...
{metadata,type,name});
%// Convert to JSON:
JSONstr = '{'; %// Initialization
map_keys = keys(example_map);
map_vals = values(example_map);
for ind1 = 1:example_map.Count
JSONstr = [JSONstr '"' map_keys{ind1} '":"' map_vals{ind1} '",'];
end
JSONstr =[JSONstr(1:end-1) '}']; %// Finalization (get rid of the last ',' and close)
Which results in a valid JSON string.
Obviously if your values aren't strings you'll need to convert them using num2str etc.
Another alternative you might want to consider is the JSONlab FEX submission. I saw that its savejson.m is able to accept cell arrays - which can hold any string you like.
Other alternatives may include any of the numerous Java or python JSON libraries which you can call from MATLAB.
I probably shouldn't add this as an answer - but you can have '.' in a struct fieldname...
Before I go further - I do not advocate this and it will almost certainly cause bugs and a lot of trouble down the road... #rayryeng method is a better approach
If your struct is created by a mex function which creates a field that contains a "." -> then you will get what your after.
To create your own test see the Mathworks example and modify accordingly.
(I wont put the full code here to discourage the practice).
If you update the char example and compile to test_mex you get:
>> obj = test_mex
obj =
Doublestuff: [1x100 double]
odata.metadata: 'This is my char'
Note: You can only access your custom field in Matlab using dynamic fieldnames:
obj.('odata.metadata')
You need to use a mex capability to update it...