Determining if a perl scalar is a string or integer - json

From what I read, perl doesn't have a real type for integer or string, for him any $variable is scalar.
Recently I had a lot of trouble with an old script that was generating JSON objects required by another process for which values inside JSON must be integers, after some debug I found that because a simple print function:
JSON::encode_json
was generating a string instead an integer, here's my example:
use strict;
use warnings;
use JSON;
my $score1 = 998;
my $score2 = 999;
print "score1: ".$score1."\n";
my %hash_object = ( score1 => $score1, score2 => $score2 );
my $json_str = encode_json(\%hash_object); # This will work now
print "$json_str";
And it outputs:
score1: 998
{"score1":"998","score2":999}
Somehow perl variables have a type or at least this is how JSON::encode_json thinks.
Is there a way to find this type programmatically and why this type is changed when making an operation like the concatenation above?

First, it is somewhat incorrect. Yes, every values is scalar, but scalars have separate numeric and string values. That's all you need to know for this particular question, so I'll leave details out.
What you really need is to take a look at MAPPING / PERL -> JSON / simple scalars section of documentation for JSON module:
JSON::XS and JSON::PP will encode undefined scalars as JSON null
values, scalars that have last been used in a string context before
encoding as JSON strings, and anything else as number value.
(The docs are actually slightly wrong here. It is not "last been used in string context" it should be "was ever used in string context" - once scalar obtains string value it won't go away until you explicitly write a new number into it.)
You can force the type to be a string by stringifying it:
my $x = 3.1; # some variable containing a number
"$x"; # stringified
$x .= ""; # another, more awkward way to stringify
print $x; # perl does it for you, too, quite often
Incidentally this comment above is exactly about why your 998 turned to string.
You can force the type to be a number by numifying it:
my $x = "3"; # some variable containing a string
$x += 0; # numify it, ensuring it will be dumped as a number
$x *= 1; # same thing, the choice is yours.

Perl scalars may hold both a string and a scalar value simultaneously. Either of those is valid as a JSON value, so the string value is always used if there is one
This is a rare problem, as most software should accept either a quoted or unquoted representation of a number. Everything in a JSON document is originally a string, and significant data may be lost by requiring the module to apply an arbitrary string → float conversion before it hands you the data
But if you really need to write a numeric value without quotation marks then you can coerce a scalar to behave as a numeric type: just add zero to it.
Likewise, if a scalar variable holds only a numeric variable then you can force Perl to generate and store its equivalent string just by using it as a string. The most obvious method for me is to enclose it in quotation marks
This code appears to do what you want
use strict;
use warnings 'all';
use feature 'say';
use JSON;
my ($score1, $score2) = (998, 999);
print "score1: $score1\n";
my %data = ( score1 => 0+$score1, score2 => 0+$score2 );
say encode_json(\%data);
output
score1: 998
{"score1":998,"score2":999}

Related

How to marshal a predicate from JSON in Prolog?

In Python it is common to marshal objects from JSON. I am seeking similar functionality in Prolog, either swi-prolog or scryer.
For instance, if we have JSON stating
{'predicate':
{'mortal(X)', ':-', 'human(X)'}
}
I'm hoping to find something like load_predicates(j) and have that data immediately consulted. A version of json.dumps() and loads() would also be extremely useful.
EDIT: For clarity, this will allow interoperability with client applications which will be collecting rules from users. That application is probably not in Prolog, but something like React.js.
I agree with the commenters that it would be easier to convert the JSON data to a .pl file in the proper format first and then load that.
However, you can load the predicates from JSON directly, convert them to a representation that Prolog understands, and use assertz to add them to the knowledge base.
If indeed the data contains all the syntax needed for a predicate (as is the case in the example data in the question) then converting the representation is fairly simple as you just need to concatenate the elements of the list into a string and then create a term out of the string. Note that this assumption skips step 2 in the first comment by Guy Coder.
Note that the Prolog JSON library is rather strict in which format it accepts: only double quotes are valid as string delimiters, and lists with singleton values (i.e., not key-value pairs) need to use the notation [a,b,c] instead of {a,b,c}. So first the example data needs to be rewritten:
{"predicate":
["mortal(X)", ":-", "human(X)"]
}
Then you can load it in SWI-Prolog. Minimal working example:
:- use_module(library(http/json)).
% example fact for testing
human(aristotle).
load_predicate(J) :-
% open the file
open(J, read, JSONstream, []),
% parse the JSON data
json_read(JSONstream, json(L)),
% check for an occurrence of the predicate key with value L2
member(predicate=L2, L),
% concatenate the list into a string
atomics_to_string(L2, S),
% create a term from the string
term_string(T, S),
% add to knowledge base
assertz(T).
Example run:
?- consult('mwe.pl').
true.
?- load_predicate('example_predicate.json').
true.
?- mortal(X).
X = aristotle.
Detailed explanation:
The predicate json_read stores the data in the following form:
json([predicate=['mortal(X)', :-, 'human(X)']])
This is a list inside a json term with one element for each key-value pair. The element has the syntax key=value. In the call to json_read you can already strip the json() term and store the list directly in the variable L.
Then member/2 is used to search for the compound term predicate=L2. If you have more than one predicate in the JSON file then you should turn this into a foreach or in a recursive call to process all predicates in the list.
Since the list L2 already contains a syntactically well-formed Prolog predicate it can just be concatenated, turned into a term using term_string/2 and asserted. Note that in case the predicate is not yet in the required format, you can construct a predicate out of the various pieces using built-in predicate manipulation functionality, see https://www.swi-prolog.org/pldoc/doc_for?object=copy_predicate_clauses/2 for some pointers.

Processing JSON from a .txt file and converting to a DataFrame in Julia

Cross posting from Julia Discourse in case anyone here has any leads.
I’m just looking for some insight into why the below code is returning a dataframe containing just the first line of my json file. If you’d like to try working with the file I’m working with, you can download the aminer_papers_0.zip from the Microsoft Open Academic Graph site, I’m using the first file in that group of files.
using JSON3, DataFrames, CSV
file_name = "path/aminer_papers_0.txt"
json_string = read(file_name, String)
js = JSON3.read(json_string)
df = DataFrame([js])
The resulting DataFrame has just one line, but the column titles are correct, as is the first line. To me the mystery is why the rest isn’t getting processed. I think I can rule out that read() is only reading the first JSON object, because I can index into the resulting object and see many JSON objects:
enter image description here
My first guess was maybe the newline \n was causing escape issues, and tried to use chomp to get rid of them, but couldn’t get it to work.
Anyway - any help would be greatly appreciated!
I think the problem is that the file is in JSON Lines format, and the JSON3 library only returns the first valid JSON value that it finds at the start of a string unless told otherwise.
tl;dr
Call JSON3.read with the keyword argument jsonlines=true.
Why?
By default, JSON3 interprets a string passed to its read function as a single "JSON text", defined by RFC 8259 section 1.3.2:
A JSON text is a serialized value....
(My emphasis on the use of the indefinite singular article "a.") A "JSON value" is defined in section 1.3.3:
A JSON value MUST be an object, array, number, or string, or one of the following three literal names: false, null, true.
A string with multiple JSON values in it is technically multiple "JSON texts." It is up to the parser to determine what part of the string argument you give it is a JSON text, and the authors of JSON3 chose as the default behavior to parse from the start of the string to the end of the first valid JSON value.
In order to get JSON3 to read the string as multiple JSON values, you have to give it the keyword option jsonlines=true, which is documented as:
jsonlines: A Bool indicating that the json_str contains newline delimited JSON strings, which will be read into a JSON3.Array of the JSON values. See jsonlines for reference. [default false]
Example
Take for example this simple string:
two_values = "3.14\n2.72"
Each one of these lines is a valid JSON serialization of a number. However, when passed to JSON3.read, only the first is parsed:
using JSON3
#assert JSON3.read(two_values) == 3.14
Using jsonlines=true, both values are parsed and returned as a JSON3.Array struct:
#assert JSON3.read(two_values, jsonlines=true) == [3.14, 2.72]
Other Packages
The JSON.jl library, which people might use by default given the name, does not implement parsing of JSON Lines strings at all, leaving it up to the caller to properly split the string as needed:
using JSON
JSON.parse(two_values)
# ERROR: Expected end of input
# Line: 1
# Around: ...3.14 2.72...
# ^
A simple way to implement reading multiple values is to use eachline:
#assert [JSON.parse(line) for line in eachline(IOBuffer(two_values))] == [3.14, 2.72]

How to parse a text file to csv file using Perl

I am learning Perl and would like to parse a text file to csv file using Perl. I have a loop that generates the following text file:
//This part is what outputs on the text file
for $row(#$data) {
while(my($key,$value) = each(%$row)) {
print "${key}=${value}, ";
}
print "\n";
}
Text File Output:
name=Mary, id=231, age=38, weight=130, height=5.05, speed=26.233, time=30,
time=25, name=Jose, age=30, id=638, weight=150, height=6.05, speed=20.233,
age=40, weight=130, name=Mark, id=369, speed=40.555, height=5.07, time=30
CSV File Desired Output:
name,age,weight,height,speed,time
Mary,38,130,5.05,26.233,30,
Jose,30,150,6.05,20.233,25,
Mark,40,130,5.04,40.555,30
Any good feedback is welcome!
The key part here is how to manipulate your data so to extract what need be printed for each line. Then you are best off using a module to produce valid CSV, and Text::CSV is very good.
A program using an array of small hashrefs, mimicking data in the question
use strict;
use warnings;
use feature 'say';
use Text::CSV;
my #data = (
{ name => 'A', age => 1, weight => 10 },
{ name => 'B', age => 2, weight => 20 },
);
my $csv = Text::CSV->new({ binary => 1, auto_diag => 2 });
my $outfile = 'test.csv';
open my $ofh, '>', $outfile or die "Can't open $outfile: $!";
# Header, also used below for order of values for fields
my #hdr = qw(name age weight);
$csv->say($ofh, \#hdr);
foreach my $href (#data) {
$csv->say($ofh, [ #{$href}{#hdr} ]);
}
The values from hashrefs in a desired order are extracted using a hashref slice #{$href}{#hdr}, what is in general
#{ expression returning hash reference } { list of keys }
This returns a list of values for the given list of keys, from the hashref that the expression in the block {} must return. That is then used to build an arrayref (an anonymous array here, using []), what the module's say method needs in order to make and print a string of comma-separated-values† from that list of values.
Note a block that evaluates to a hash reference, used instead of a hash name that is used for a slice of a hash. This is a general rule that
Anywhere you'd put an identifier (or chain of identifiers) as part of a variable or subroutine name, you can replace the identifier with a BLOCK returning a reference of the correct type.
Some further comments
Look over the supported constructor's attributes; there are many goodies
For very simple data you can simply join fields with a comma and print
say $ofh join ',', #{$href}{#hdr};
But it is far safer to use a module to construct a valid CSV record. With the right choice of attributes in the constructor it can handle whatever is legal to embed in fields (some of what can take quite a bit of work to do correctly by hand) and it calls things which aren't
I list column names explicitly. Instead, you can fetch the keys and then sort in a desired order, but this will again need a hard-coded list for sorting
The program creates the file test.csv and prints to it the expected header and data lines.
† But separating those "values" with commas may involve a whole lot more than merely what the acronym for the "CSV format" stands for. A variety of things may come between those commas, including commas, newlines, and whatnot. This is why one is best advised to always use a library. Seeing constructor's options is informative.
The following commentary referred to the initial question. In the meanwhile the problems this addresses were corrected in OP's code and the question updated. I'm still leaving this text for some general comments that can be useful.
As for the code in the question and its output, there is almost certainly an issue with how the data is processed to produce #data, judged by the presence of keys HASH(address) in the output.
That string HASH(0x...) is output when one prints a variable which is a hash reference (what cannot show any of hash's content). Perl handles such a print by stringifying (producing a printable string out of something which is more complex) the reference in that way.
There is no good reason to have a hash reference for a hash key. So I'd suggest that you review your data and its processing and see how that comes about. (Or briefly show this, or post another question with it if it isn't feasible to add that to this one.)
One measure you can use to bypass that is to only use a list of keys that you know are valid, like I show above; however, then you may be leaving some outright error unhandled. So I'd rather suggest to find what is wrong.

Convert JSON string to dictionary in ROBOT using json.loads() - why are triple quotes needed?

I've read that json.loads() can be used by Robot Framework to convert a JSON string to a dictionary in this post: Json handling in ROBOT
So if you define a dictionary-like string like this:
${json_string} Set Variable {"key1": "value1", "key2": "value2", "key3": "value3"}
You can then use the following to convert it to a dictionary:
${dict} Evaluate json.loads('''${json_string}''') json
My question is simple - why are the triple quotes needed here to surround the argument?
If single quotes are used an exception is thrown stating a string must be used:
${dict} Evaluate json.loads('${json_string}') json
(Edit) The above is a bad example, it actually works. If double quotes are used, though, it fails for SyntaxError: invalid syntax.
If no quotes at all are used an error occurs that indicates that the variable is a dictionary - but in Robot it isn't a dictionary (TypeError: the JSON object must be str, bytes or bytearray, not dict):
${dict} Evaluate json.loads(${json_string}) json
If Robot's Convert To String is used on the ${json_string} variable and then that new variable is passed to the json.loads() method the same TypeError occurs stating a string must be used, not a dictionary - but it has been converted to a string:
${json_string2} Convert To String ${json_string}
${dict} Evaluate json.loads(${json_string2}) json
What are the triple quotes accomplishing that are not being accomplished by the other two? This seems to be an aspect of Robot framework...
I'll go ahead and follow up in an answer since it's simpler that way, I think does answer your follow-up question in this comment.
I'm not very familiar with Robot Framework but if I understand correctly all it's doing in .robot files when you use ${variable} substitution is simple string templating: so when you pass a ${variable} into an expression, no matter what type the underlying variable is, it always substitutes its string representation, so the expression you're trying to evaluate is literally:
json.loads({'key': "value1', 'key2': 'value2'})
This is why you need the """ (in principle you could use just ", but """ would be much safer). Incidentally, if you converted the above dict to its standard string representation in Python it would not be valid JSON, because JSON requires double-quotes not single-quotes.
The official documentation on this is a little confusing and seems to contradict itself (I think it was not written by a native English speaker which is OK):
When a variable is used in the expressing using the normal ${variable} syntax, its value is replaces before the expression is evaluated. This means that the value used in the expression will be the string representation of the variable value, not the variable value itself. This is not a problem with numbers and other objects that have a string representation that can be evaluated directly, but with other objects the behavior depends on the string representation.
I say it contradicts itself because first is says "its value [is replaced] before the expression is evaluated". But then it says "the value used in the expression will be the string representation of the variable value" (which is not the same as the value itself). The latter explanation seems to be the correct one though.
However, it seems with Evaluate you can also use the syntax $variable to replace the literal value of the variable instead of its string representation:
Starting from Robot Framework 2.9, variables themselves are automatically available in the evaluation namespace. They can be accessed using special variable syntax without the curly braces like $variable. These variables should never be quoted, and in fact they are not even replaced inside strings.
But in your case you wrote:
${json_string} Set Variable {"key1": "value1", "key2": "value2", "key3": "value3"}
As I understand it "Set Variable" just stores the value as a string literal. So indeed it should suffice then to run (as you did):
${dict} Evaluate json.loads('''${json_string}''') json
In your final example "Convert To String" is not doing anything, because as you wrote ${json_string} already replaces as a string. You just have to understand that the result of json.loads(${json_string}) is a Python expression where ${json_string} is just replaced literally with the contents of that template variable--it is not the same as passing a str value to json.loads(). The latter I believe may be achievable as
${dict} Evaluate json.loads($json_string) json
at least, going by the docs. But I have not tested this.

Use a period in a field name in a Matlab struct

I'm using webwrite to post to an api. One of the field names in the json object I'm trying to setup for posting is odata.metadata. I'm making a struct that looks like this for the json object:
json = struct('odata.metadata', metadata, 'odata.type', type, 'Name', name,);
But I get an error
Error using struct
Invalid field name "odata.metadata"
Here's the json object I'm trying to use in Matlab. All strings for simplicity:
{
"odata.metadata": "https://website.com#Element",
"odata.type": "Blah.Blah.This.That",
"Name": "My Object"
}
Is there a way to submit this json object or is it a lost cause?
Field names are not allowed to have dots in them. The reason why is because this will be confused with accessing another nested structure within the structure itself.
For example, doing json.odata.metadata would be interpreted as json being a struct with a member whose field name is odata where odata has another member whose field name is metadata. This would not be interpreted as a member with the combined field name as odata.metadata. You're going to have to rename the field to something else or change the convention of your field name slightly.
Usually, the convention is to replace dots with underscores. An automated way to take care of this if you're not willing to manually rename the field names yourself is to use a function called matlab.lang.makeValidName that takes in a string and converts it into a valid field name. This function was introduced in R2014a. For older versions, it's called genvarname.
For example:
>> matlab.lang.makeValidName('odata.metadata')
ans =
odata_metadata
As such, either replace all dots with _ to ensure no ambiguities or use matlab.lang.makeValidName or genvarname to take care of this for you.
I would suggest using a a containers.Map instead of a struct to store your data, and then creating your JSON string by iterating over the Map filednames and appending them along with the data to your JSON.
Here's a quick demonstration of what I mean:
%// Prepare the Map and the Data:
metadata = 'https://website.com#Element';
type = 'Blah.Blah.This.That';
name = 'My Object';
example_map = containers.Map({'odata.metadata','odata.type','Name'},...
{metadata,type,name});
%// Convert to JSON:
JSONstr = '{'; %// Initialization
map_keys = keys(example_map);
map_vals = values(example_map);
for ind1 = 1:example_map.Count
JSONstr = [JSONstr '"' map_keys{ind1} '":"' map_vals{ind1} '",'];
end
JSONstr =[JSONstr(1:end-1) '}']; %// Finalization (get rid of the last ',' and close)
Which results in a valid JSON string.
Obviously if your values aren't strings you'll need to convert them using num2str etc.
Another alternative you might want to consider is the JSONlab FEX submission. I saw that its savejson.m is able to accept cell arrays - which can hold any string you like.
Other alternatives may include any of the numerous Java or python JSON libraries which you can call from MATLAB.
I probably shouldn't add this as an answer - but you can have '.' in a struct fieldname...
Before I go further - I do not advocate this and it will almost certainly cause bugs and a lot of trouble down the road... #rayryeng method is a better approach
If your struct is created by a mex function which creates a field that contains a "." -> then you will get what your after.
To create your own test see the Mathworks example and modify accordingly.
(I wont put the full code here to discourage the practice).
If you update the char example and compile to test_mex you get:
>> obj = test_mex
obj =
Doublestuff: [1x100 double]
odata.metadata: 'This is my char'
Note: You can only access your custom field in Matlab using dynamic fieldnames:
obj.('odata.metadata')
You need to use a mex capability to update it...