TCL Dict to JSON - json

I am trying to convert a dict into JSON format and not seeing any easy method using TclLib Json Package. Say, I have defined a dict as follows :
set countryDict [dict create USA {population 300 capital DC} Canada {population 30 capital Ottawa}]
I want to convert this to json format as shown below:
{
"USA": {
"population": 300,
"captial": "DC"
},
"Canada": {
"population": 30,
"captial": "Ottawa"
}
}
("population" is number and capital is string). I am using TclLib json package (https://wiki.tcl-lang.org/page/Tcllib+JSON) . Any help would be much appreciated.

There's two problems with the “go straight there” approach that you appear to be hoping for:
Tcl's type system is extremely different to JSON's; in Tcl, every value is a (subtype of) string, but JSON expects objects, arrays, numbers and strings to wholly different things.
The capital becomes captial. For bonus fun. (Hopefully that's just a typo on your part, but we'll cope.)
I'd advise using rl_json for this; it's a much more capable package that treats JSON as a fundamental type. (It's even better at it when it comes to querying into the JSON structure.)
package require rl_json
set result {{}}; # Literal empty JSON object
dict for {countryID data} $countryDict {
rl_json::json set result $countryID [rl_json::json template {{
"population": "~N:population",
"captial": "~S:capital"
}} $data]
# Yes, that was {{ … }}, the outer ones are for Tcl & the inner ones for a JSON object
}
puts [rl_json::json pretty $result]
That produces almost exactly the output you asked for, except with different indentation. $result is the “production” version of the output that you can work with for further processing, but which has no excess whitespace at all (which is a great choice when you're dealing with documents over 100MB long).
Notes:
The initial JSON object could have been done like this:
set result "{}"
that would have worked just as well (and been the same Tcl bytecode).
json set puts an item into an object or array; that's exactly what we want here (in a dict for to go over the input data).
json template takes an optional dictionary for mapping substitution names in the template to values; that's perfect for your use case. Otherwise we'd have had to do dict with data {} to map the contents of the dictionary into variables, and that's less than perfect when the input data isn't strictly controlled.
The contents of template argument to json template is itself JSON. The ~N: prefix in a leaf string value says “replace this with a number from the substitution called…”, and ~S: says “replace this with a string from the substitution called…”. There are others.

Related

Processing JSON from a .txt file and converting to a DataFrame in Julia

Cross posting from Julia Discourse in case anyone here has any leads.
I’m just looking for some insight into why the below code is returning a dataframe containing just the first line of my json file. If you’d like to try working with the file I’m working with, you can download the aminer_papers_0.zip from the Microsoft Open Academic Graph site, I’m using the first file in that group of files.
using JSON3, DataFrames, CSV
file_name = "path/aminer_papers_0.txt"
json_string = read(file_name, String)
js = JSON3.read(json_string)
df = DataFrame([js])
The resulting DataFrame has just one line, but the column titles are correct, as is the first line. To me the mystery is why the rest isn’t getting processed. I think I can rule out that read() is only reading the first JSON object, because I can index into the resulting object and see many JSON objects:
enter image description here
My first guess was maybe the newline \n was causing escape issues, and tried to use chomp to get rid of them, but couldn’t get it to work.
Anyway - any help would be greatly appreciated!
I think the problem is that the file is in JSON Lines format, and the JSON3 library only returns the first valid JSON value that it finds at the start of a string unless told otherwise.
tl;dr
Call JSON3.read with the keyword argument jsonlines=true.
Why?
By default, JSON3 interprets a string passed to its read function as a single "JSON text", defined by RFC 8259 section 1.3.2:
A JSON text is a serialized value....
(My emphasis on the use of the indefinite singular article "a.") A "JSON value" is defined in section 1.3.3:
A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true.
A string with multiple JSON values in it is technically multiple "JSON texts." It is up to the parser to determine what part of the string argument you give it is a JSON text, and the authors of JSON3 chose as the default behavior to parse from the start of the string to the end of the first valid JSON value.
In order to get JSON3 to read the string as multiple JSON values, you have to give it the keyword option jsonlines=true, which is documented as:
jsonlines: A Bool indicating that the json_str contains newline delimited JSON strings, which will be read into a JSON3.Array of the JSON values. See jsonlines for reference. [default false]
Example
Take for example this simple string:
two_values = "3.14\n2.72"
Each one of these lines is a valid JSON serialization of a number. However, when passed to JSON3.read, only the first is parsed:
using JSON3
#assert JSON3.read(two_values) == 3.14
Using jsonlines=true, both values are parsed and returned as a JSON3.Array struct:
#assert JSON3.read(two_values, jsonlines=true) == [3.14, 2.72]
Other Packages
The JSON.jl library, which people might use by default given the name, does not implement parsing of JSON Lines strings at all, leaving it up to the caller to properly split the string as needed:
using JSON
JSON.parse(two_values)
# ERROR: Expected end of input
# Line: 1
# Around: ...3.14 2.72...
# ^
A simple way to implement reading multiple values is to use eachline:
#assert [JSON.parse(line) for line in eachline(IOBuffer(two_values))] == [3.14, 2.72]

Understanding NewtonSoft in PowerShell

I making a foray into the world of JSON parsing and NewtonSoft and I'm confused, to say the least.
Take the below PowerShell script:
$json = #"
{
"Array1": [
"I am string 1 from array1",
"I am string 2 from array1"
],
"Array2": [
{
"Array2Object1Str1": "Object in list, string 1",
"Array2Object1Str2": "Object in list, string 2"
}
]
}
"#
#The newtonSoft way
$nsObj = [Newtonsoft.Json.JsonConvert]::DeserializeObject($json, [Newtonsoft.Json.Linq.JObject])
$nsObj.GetType().fullname #Type = Newtonsoft.Json.Linq.JObject
$nsObj[0] #Returns nothing. Why?
$nsObj.Array1 #Again nothing. Maybe because it contains no key:value pairs?
$nsObj.Array2 #This does return, maybe because has object with kv pairs
$nsObj.Array2[0].Array2Object1Str1 #Returns nothing. Why? but...
$nsObj.Array2[0].Array2Object1Str1.ToString() #Cool. I get the string this way.
$nsObj.Array2[0] #1st object has a Path property of "Array2[0].Array2Object1Str1" Great!
foreach( $o in $nsObj.Array2[0].GetEnumerator() ){
"Path is: $($o.Path)"
"Parent is: $($o.Parent)"
} #??? Why can't I see the Path property like when just output $nsObj.Array2[0] ???
#How can I find out what the root parent (Array2) is for a property? Is property even the right word?
I'd like to be able to find the name of the root parent for any given position. So above, I'd like to know that the item I'm looking at (Array2Object1Str1) belongs to the Array2 root parent.
I think I'm not understanding some fundamentals here. Is it possible to determine the root parent? Also, any help in understanding my comments in the script would be great. Namely why I can't return things like path or parent, but can see it when I debug in VSCode.
dbc's answer contains helpful background information, and makes it clear that calling the NewtonSoft Json.NET library from PowerShell is cumbersome.
Given PowerShell's built-in support for JSON parsing - via the ConvertFrom-Json and ConvertTo-Json cmdlets - there is usually no reason to resort to third-party libraries (directly[1]), except in the following cases:
When performance is paramount.
When the limitations of PowerShell's JSON parsing must be overcome (lack of support for empty key names and keys that differ in letter case only).
When you need to work with the Json.NET types and their methods rather than with the method-less "property-bag" [pscustomobject] instances ConvertFrom-Json constructs.
While working with NewtonSoft's Json.NET directly in PowerShell is awkward, it is manageable, if you observe a few rules:
Lack of visible output doesn't necessarily mean that there isn't any output at all:
Due to a bug in PowerShell (as of v7.0.0-preview.4), [JValue] instances and [JProperty] instances containing them produce no visible output by default; access their (strongly typed) .Value property instead (e.g., $nsObj.Array1[0].Value or $nsProp.Value.Value (sic))
To output the string representation of a [JObject] / [JArray] / [JProperty] / [JValue] instance, do not rely on output as-is (e.g, $nsObj), use explicit stringification with .ToString() (e.g., $nsObj.ToString()); while string interpolation (e.g., "$nsObj") does generally work, it doesn't with [JValue] instances, due to the above-mentioned bug.
[JObject] and [JArray] objects by default show a list of their elements' instance properties (implied Format-List applied to the enumeration of the objects); you can use the Format-* cmdlets to shape output; e.g., $nsObj | Format-Table Path, Type.
Due to another bug (which may have the same root cause), as of PowerShell Core 7.0.0-preview.4, default output for [JObject] instances is actually broken in cases where the input JSON contains an array (prints error format-default : Target type System.Collections.IEnumerator is not a value type or a non-abstract class. (Parameter 'targetType')).
To numerically index into a [JObject] instance, i.e. to access properties by index rather than by name, use the following idiom: #($nsObj)[<n>], where <n> is the numerical index of interest.
$nsObj[<n>] actually should work, because, unlike C#, PowerShell exposes members implemented via interfaces as directly callable type members, so the numeric indexer that JObject implements via the IList<JToken> interface should be accessible, but isn't, presumably due to this bug (as of PowerShell Core 7.0.0-preview.4).
The workaround based on #(...), PowerShell's array-subexpression operator, forces enumeration of a [JObject] instance to yield an array of its [JProperty] members, which can then be accessed by index; note that this approach is simple, but not efficient, because enumeration and construction of an aux. array occurs; however, given that a single JSON object (as opposed to an array) typically doesn't have large numbers of properties, this is unlikely to matter in practice.
A reflection-based solution that accesses the IList<JToken> interface's numeric indexer is possible, but may even be slower.
Note that additional .Value-based access may again be needed to print the result (or to extract the strongly typed property value).
Generally, do not use the .GetEnumerator() method; [JObject] and [JArray] instances are directly enumerable.
Keep in mind that PowerShell may automatically enumerate such instances in contexts where you don't expect it, notably in the pipeline; notably, when you send a [JObject] to the pipeline, it is its constituent [JProperty]s that are sent instead, individually.
Use something like #($nsObj.Array1).Value to extract the values of an array of primitive JSON values (strings, numbers, ...) - i.e, [JValue] instances - as an array.
The following demonstrates these techniques in context:
$json = #"
{
"Array1": [
"I am string 1 from array1",
"I am string 2 from array1",
],
"Array2": [
{
"Array2Object1Str1": "Object in list, string 1",
"Array2Object1Str2": "Object in list, string 2"
}
]
}
"#
# Deserialize the JSON text into a hierarchy of nested objects.
# Note: You can omit the target type to let Newtonsoft.Json infer a suitable one.
$nsObj = [Newtonsoft.Json.JsonConvert]::DeserializeObject($json)
# Alternatively, you could more simply use:
# $nsObj = [Newtonsoft.Json.Linq.JObject]::Parse($json)
# Access the 1st property *as a whole* by *index* (index 0).
#($nsObj)[0].ToString()
# Ditto, with (the typically used) access by property *name*.
$nsObj.Array1.ToString()
# Access a property *value* by name.
$nsObj.Array1[0].Value
# Get an *array* of the *values* in .Array1.
# Note: This assumes that the array elements are JSON primitives ([JValue] instances.
#($nsObj.Array1).Value
# Access a property value of the object contained in .Array2's first element by name:
$nsObj.Array2[0].Array2Object1Str1.Value
# Enumerate the properties of the object contained in .Array2's first element
# Do NOT use .GetEnumerator() here - enumerate the array *itself*
foreach($o in $nsObj.Array2[0]){
"Path is: $($o.Path)"
"Parent is: $($o.Parent.ToString())"
}
[1] PowerShell Core - but not Windows PowerShell - currently (v7) actually uses NewtonSoft's Json.NET behind the scenes.
You have a few separate questions here:
$nsObj[0] #Returns nothing. Why?
This is because nsObj corresponds to a JSON object, and, as explained in this answer to How to get first key from JObject?, JObject does not directly support accessing properties by integer index (rather than property name).
JObject does, however, implement IList<JToken> explicitly so if you could upcast nsObj to such a list you could access properties by index -- but apparently it's not straightforward in PowerShell to call an explicitly implemented method. As explained in the answers to How can I call explicitly implemented interface method from PowerShell? it's necessary to do this via reflection.
First, define the following function:
Function ChildAt([Newtonsoft.Json.Linq.JContainer]$arg1, [int]$arg2)
{
$property = [System.Collections.Generic.IList[Newtonsoft.Json.Linq.JToken]].GetProperty("Item")
$item = $property.GetValue($nsObj, #([System.Object]$arg2))
return $item
}
And then you can do:
$firstItem = ChildAt $nsObj 0
Try it online here.
#??? Why can't I see the Path property like when just output $nsObj.Array2[0] ???
The problem here is that JObject.GetEnumerator() does not return what you think it does. Your code assumes it returns the JToken children of the object, when in fact it is declared as
public IEnumerator<KeyValuePair<string, JToken>> GetEnumerator()
Since KeyValuePair<string, JToken> doesn't have the properties Path or Parent your output method fails.
JObject does implement interfaces like IList<JToken> and IEnumerable<JToken>, but it does so explicitly, and as mentioned above calling the relevant GetEnumerator() methods would require reflection.
Instead, use the base class method JContainer.Children(). This method works for both JArray and JObject and returns the immediate children in document order:
foreach( $o in $nsObj.Array2[0].Children() ){
"Path is: $($o.Path)"
"Parent is: $($o.Parent)"
}
Try it online here.
$nsObj.Array1 #Again nothing. Maybe because it contains no key:value pairs?
Actually this does return the value of Array1, if I do
$nsObj.Array1.ToString()
the JSON corresponding to the value of Array1 is displayed. The real issue seems to be that PowerShell doesn't know how to automatically print a JArray with JValue contents -- or even a simple, standalone JValue. If I do:
$jvalue = New-Object Newtonsoft.Json.Linq.JValue 'my jvalue value'
'$jvalue' #Nothing output
$jvalue
'$jvalue.ToString()' #my jvalue value
$jvalue.ToString()
Then the output is:
$jvalue
$jvalue.ToString()
my jvalue value
Try it online here and, relatedly, here.
Thus the lesson is: when printing a JToken hierarchy in PowerShell, always use ToString().
As to why printing a JObject produces some output while printing a JArray does not, I can only speculate. JToken implements the interface IDynamicMetaObjectProvider which is also implemented by PSObject; possibly something about the details of how this is implemented for JObject but not JValue or JArray are compatible with PowerShell's information printing code.

Parsing large JSON file with Scala and JSON4S

I'm working with Scala in IntelliJ IDEA 15 and trying to parse a large twitter record json file and count the total number of hashtags. I am very new to Scala and the idea of functional programming. Each line in the json file is a json object (representing a tweet). Each line in the file starts like so:
{"in_reply_to_status_id":null,"text":"To my followers sorry..
{"in_reply_to_status_id":null,"text":"#victory","in_reply_to_screen_name"..
{"in_reply_to_status_id":null,"text":"I'm so full I can't move"..
I am most interested in a property called "entities" which contains a property called "hastags" with a list of hashtags. Here is an example:
"entities":{"hashtags":[{"text":"thewayiseeit","indices":[0,13]}],"user_mentions":[],"urls":[]},
I've browsed the various scala frameworks for parsing json and have decided to use json4s. I have the following code in my Scala script.
import org.json4s.native.JsonMethods._
var json: String = ""
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line
val data = parse(json)
My logic here is that I am trying to read each line from twitter38.json into a string and then parse the entire string with parse(). The parse function is throwing an error claiming:
"Type mismatch, expected: Nothing, found:String."
I have seen examples that use parse() on strings that hold json objects such as
val jsontest =
"""{
|"name" : "bob",
|"age" : "50",
|"gender" : "male"
|}
""".stripMargin
val data = parse(jsontest)
but I have received the same error. I am coming from an object oriented programming background, is there something fundamentally wrong with the way I am approaching this problem?
You have most likely incorrectly imported dependencies to your Intellij project or modules into your file. Make sure you have the following lines imported:
import org.json4s.native.JsonMethods._
Even if you correctly import this module, parse(String: json) will not work for you, because you have incorrectly formed a json. Your json String will look like this:
"""{"in_reply_...":"someValue1"}{"in_reply_...":"someValues2"}"""
but should look as follows to be a valid json that can be parsed:
"""{{"in_reply_...":"someValue1"},{"in_reply_...":"someValues2"}}"""
i.e. you need starting and ending brackets for the json, and a comma between each line of tweets. Please read the json4s documenation for more information.
Although being almost 6 years old, I think this question deserves another try.
JSON format has a few misunderstandings in people's minds, especially how they are stored and how they are read back.
JSON documents, are stored as either a single object having all the other fields, or an array of multiple object possibly in same format. this second part is important because arrays in almost every programming language are defined by angle brackets and values separated by commas (note here I used a person object as my single value):
[
{"name":"John","surname":"Doe"},
{"name":"Jane","surname":"Doe"}
]
also note that everything except brackets, numbers and booleans are enclosed in quotes when written into file.
however, there is another use that is not official but preferred to transfer datasets easily where every object, or document as in nosql/mongo language, are stored in a new line like this:
{"name":"John","surname":"Doe"}
{"name":"Jane","surname":"Doe"}
so for the question, OP has a document written in this second form, but tries an algorithm written to read the first form. following code has few simple changes to achieve this, and the user must read the file knowing that:
var json: String = "["
for (line <- io.Source.fromFile("twitter38.json").getLines) json += line + ","
json=json.splitAt(json.length()-1)._1
json+= "]"
val data = parse(json)
PS: although #sbrannon, has the correct idea, the example he/she gave has mistakenly curly braces instead of angle brackets to surround the data.
EDIT: I have added json=json.splitAt(json.length()-1)._1 because the code above ends with a trailing comma which will cause parse error per the JSON format definition.

JSON interface with UNIX

Am very new to JSON interaction, I have few doubts regarding it. Below are the basic one
1) How could we call/invoke/open JSON file through Unix, I mean let suppose I have a metedata file in JSON, then how should I fetch/update the value backforth from JSON file.
2) Need the example, on how to interact it.
3) How Unix Shell is compatible to JSON, whether is there any other tech/language/tool which is better than shell script.
Thanks,
Nikhil
JSON is just text following a specific format.
Use a text editor and follow the rules. Some editors with "JSON modes" will help with [invalid] syntax highlighting, indenting, brace matching..
A "Unix Shell" has nothing directly to do with JSON - how does a shell relate to text or XML files?
There are some utilities for dealing with JSON which might be of use such as jq - but it really depends on what needs to be done with the JSON (which is ultimately just text).
Json is a format to store strings, bools, numbers, lists, and dicts and combinations thereof (dicts of numbers pointing to lists containing strings etc.). Probably the data you want to store has some kind of structure which fits into these types. Consider that and think of a valid representation using the types given above.
For example, if your text configuration looks something like this:
Section
Name=Marsoon
Size=34
Contents
foo
bar
bloh
EndContents
EndSection
Section
Name=Billition
Size=103
Contents
one
two
three
EndContents
EndSection
… then this looks like a list of dicts which contain some strings and numbers and one list of strings. A valid representation of that in Json would be:
[
{
"Name": "Marsoon",
"Size": 34,
"Contents": [
"foo", "bar", "bloh"
]
},
{
"Name": "Billition",
"Size": 103,
"Contents": [
"one", "two", "three"
]
},
]
But in case you know that each such dictionary has a different Name and always the same fields, you don't have to store the field names and can use the Name as a key of a dictionary; so you can also represent it as a dict of strings pointing to lists containing numbers and lists of strings:
{
"Marsoon": [
34, [ "foo", "bar", "bloh" ]
],
"Billition": [
103, [ "one", "two", "three" ]
]
}
Both are valid representations of your original text configuration. How you'd choose depends mainly on the question whether you want to stay open for later changes of the data structure (the first solution is better then) or if you want to avoid bureaucratic overhead.
Such a Json can be stored as a simple text file. Use any text editor you like for this. Notice that all whitespace is optional. The last example could also be written in one line:
{"Marsoon":[34,["foo","bar","bloh"]],"Billition":[103,["one","two","three"]]}
So sometimes a computer-generated Json might be hard to read and would need an editor at least capable of handling very long lines.
Handling such a Json file in a shell script will not be easy just because the shell has no notion of the idea of such complex types. The most complicated it can handle properly is a dict of strings pointing to strings (bash arrays). So I propose to have a look for a more suitable language, e. g. Python. In Python you can handle all these structures quite efficiently and with very readable code:
import json
with open('myfile.json') as jsonFile:
data = json.load(jsonFile)
print data[0]['Contents'][2] # will print "bloh" in the first example
# or:
print data['Marsoon'][1][2] # will print "bloh" in the second example

mochijson2 or mochijson

I'm encoding some data using mochijson2.
But I found that it behaves strange on strings as lists.
Example:
mochijson2:encode("foo").
[91,"102",44,"111",44,"111",93]
Where "102", "111", "111" are $f, $o, $o encoded as strings
44 are commas and 91 and 93 are square brakets.
Of course if I output this somewhere I'll get string "[102,111,111]" which is obviously not that what I what.
If i try
mochijson2:encode(<<"foo">>).
[34,<<"foo">>,34]
So I again i get a list of two doublequotes and binary part within which can be translated to binary with list_to_binary/1
Here is the question - why is it so inconsistent. I understand that there is a problem distingushing erlang list that should be encoded as json array and erlang string which should be encoded as json string, but at least can it output binary when i pass it binary?
And the second question:
Looks like mochijson outputs everything nice (cause it uses special tuple to designate arrays {array, ...})
mochijson:encode(<<"foo">>).
"\"foo\""
What's the difference between mochijson2 and mochijson? Performance? Unicode handling? Anything else?
Thanks
My guess is that the decision in mochijson is that it treats a binary as a string, and it treats a list of integers as a list of integers. (Un?)fortunately strings in Erlang are in fact a list of integers.
As a result your "foo", or in other words, your [102,111,111] is translated into text representing "[102,111,111]". In the second case your <<"foo">> string becomes "foo"
Regarding the second question, mochijson seems to always return a string, whereas mochijson2 returns an iodata type. Iodata is basically a recursive list of strings, binaries and iodatas (in fact iolists). If you only intend to send the result "through the wire", it is more efficient to just nest them in a list than convert them to a flat string.