How does Ruby JSON.parse differ to OJ.load in terms of allocating memory/Object IDs - json

This is my first question and I have tried my best to find an answer - I have looked everywhere for an answer but haven't managed to find anything concrete to answer this in both the oj docs and ruby json docs and here.
Oj is a gem that serves to improve serialization/deserialization speeds and can be found at: https://github.com/ohler55/oj
I noticed this difference when I tried to dump and parse a hash with a NaN contained in it, twice, and compared the two, i.e.
# Create json Dump
dump = JSON.dump ({x: Float::NAN})
# Create first JSON load
json_load = JSON.parse(dump, allow_nan: true)
# Create second JSON load
json_load_2 = JSON.parse(dump, allow_nan: true)
# Create first OJ load
oj_load = Oj.load(dump, :mode => :compat)
# Create second OJload
oj_load_2 = Oj.load(dump, :mode => :compat)
json_load == json_load_2 # Returns true
oj_load == oj_load_2 # Returns false
I always thought NaN could not be compared to NaN so this confused me for a while until I realised that json_load and json_load_2 have the same object ID and oj_load and oj_load_2 do not.
Can anyone point me in the direction of where this memory allocation/object ID allocation occurs or how I can control that behaviour with OJ?
Thanks and sorry if this answer is floating somewhere on the internet where I could not find it.
Additional info:
I am running Ruby 1.9.3.
Here's the output from my tests re object IDs:
puts Float::NAN.object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id
70129392082680
70129387898880
70129387898880
puts Float::NAN.object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id
70255410134280
70255410063100
70255410062620
Perhaps I am doing something wrong?

I believe that is a deep implementation detail. Oj does this:
if (ni->nan) {
rnum = rb_float_new(0.0/0.0);
}
I can't find a Ruby equivalent for that, Float.new doesn't appear to exist, but it does create a new Float object every time (from an actual C's NaN it constructs on-site), hence different object_ids.
Whereas Ruby's JSON module uses (also in C) its own JSON::NaN Float object everywhere:
CNaN = rb_const_get(mJSON, rb_intern("NaN"));
That explains why you get different NaNs' object_ids with Oj and same with Ruby's JSON.
No matter what object_ids the resulting hashes have, the problem is with NaNs. If they have the same object_ids, the enclosing hashes are considered equal. If not, they are not.
According to the docs, Hash#== uses Object#== for values that only outputs true if and only if the argument is the same object (same object_id). This contradicts NaN's property of not being equal to itself.
Spectacular. Inheritance gone haywire.
One could, probably, modify Oj's C code (and even make a pull request with it) to use a constant like Ruby's JSON module does. It's a subtle change, but it's in the spirit of being compat, I guess.

Related

Is there an equivalent to Octave's "lookfor" command in Julia?

How to seek a concrete function in Julia? lookfor would make it in Matlab / GNU Octave, how can it be done here?
The equivalent of lookfor is apropos:
julia> apropos("fourier transform")
Base.DFT.fft
julia> apropos("concatenate")
Base.:*
Base.hcat
Base.cat
Base.flatten
Base.mapslices
Base.vcat
Base.hvcat
Base.SparseArrays.blkdiag
Core.#doc
Other useful functions, depending on what you're looking for, include methodswith, which can give you a list of methods that are specified for a particular argument type:
julia> methodswith(Regex)
25-element Array{Method,1}:
==(a::Regex, b::Regex) at regex.jl:370
apropos(io::IO, needle::Regex) at docs/utils.jl:397
eachmatch(re::Regex, str::AbstractString) at regex.jl:365
eachmatch(re::Regex, str::AbstractString, ovr::Bool) at regex.jl:362
...
A comment on Fengyang's great answer. Notice the difference between searching at the Julia prompt with apropos("first") vs typing a question mark and the word ?first.
As you type the question mark, the prompt changes to a question mark. If you type "first", this is equivalent to the above (with less typing). If you type first without quote-marks, you get a search over the variables that are exported by the modules currently loaded.
Illustration:
help?>"first"
Base.ExponentialBackOff
Base.moduleroot
... # rest of the output removed
help?>first
search: first firstindex popfirst! pushfirst! uppercasefirst lowercasefirst
first(coll)
Get the first element of an iterable collection. Return the start point of
an AbstractRange even if it is empty.
... # rest of the output removed
If you search for a string, case is ignored. If you want to search within the DataFrames module, type using DataFrames before searching.

Saving JSON data to DB in Zotonic

Im trying to write a small app that retrieves a JSON file (it contains a list of items, which all have some properties), saves its contents to the DB and then displays some of it later on. I have Zotonic up and running, and generating some HTML is no problem.
ATM i'm stuck trying to figure out how to define a custom resource and how to get the data from the JSON in the DB. When the data is there I should be fine, that part seems covered ok by the documentation.
I wrote some standalone erlang scripts that fetch the data and I noticed that Zotonic has a library for decoding JSON so that part should be fine. Any tips on where to put which code or where to look further?
The z_db module allows for creating custom tables by using:
z_db:create_table(Table, Cols, Context).
The Table variable is your table name which can be either an atom or a list containing a single atom.
The Cols is a list of column definitions, which are defined by records. Currently the record definition (you can find this in include/zotonic.hrl) is:
-record(column_def, {name, type, length, is_nullable=true, default, primary_key}).
See Erlang docs on records for more info on records
Example code which I put in users/sites/[sitename]/models/m_[sitename].erl:
init(Context) ->
case z_db:table_exists(?table,Context) of
false ->
z_db:create_table(tablename,
[
#column_def{name=id, type="serial"},
#column_def{name=gid, type="integer", is_nullable=false},
#column_def{name=magnitude, type="real"},
#column_def{name=depth, type="real"},
#column_def{name=location, type="character varying"},
#column_def{name=time, type="integer"},
#column_def{name=date, type="integer"}
], Context);
true -> ok
end,
ok.
Pay attention to what options of the record you specify. Most of the errors I got were e.g. from specifying a length on the integer fields.
The models/m_sitename:init/1 does not get called on site start. The sitename:init/1 does get called so I call the init function there to ensure the table exists. Example:
init(Context) ->
m_sitename:init(Context).
It is called by Zotonic with the Context variable of the site automatically. You can get this variable manually as well with z:c(sitename).. So if you call the m_sitename:init(Context). from somewhere else you would do:
m_sitename:init(z:c(sitename)).
Next, insertion in the DB can be done with:
z_db:insert(Table, PropList, Context).
Where Table is again an atom or a list containing a single atom representing the table name. Context is the same as above.
PropList is a property list which is a list containing tuples consisting of two elements where the first is an atom and the second is its associated value/property. Example:
PropList = [
{row, Value},
{anotherrow, AnotherValue}
].
Table = tablename.
Context = z:c(sitename).
z_db:insert(Table, PropList, Context).
See Erlang docs on Property Lists for more info on property lists.
=== The dependencies have been updated so if you build from source the step directly below is no longer needed ===
The JSON part is bit more tricky. Included with Zotonic are mochijson2 and as a secondary dependency also jiffy. The latest version of jiffy contains jiffy:decode/2 which allows you to specify maps as a return type. Much more readable than the standard {struct, {struct, <<"">>}} monster. To update to the latest version edit the line in deps/twerl/rebar.config that says
{jiffy, ".*", {git, "https://github.com/davisp/jiffy.git", {tag, "0.8.3"}}},
to
{jiffy, ".*", {git, "https://github.com/davisp/jiffy.git", {tag, "0.14.3"}}},
Now run z:m(). in the Zotonic shell. (you must do this after every change in your code).
Now check in the Zotonic shell if there is a jiffy:decode/2 available by typing jiffy: <tab>, it will show a list of available functions and their arity.
To retrieve a JSON file from the internet run:
{ok, {{_, 200, _}, _, Body}} = httpc:request(get, {"url-to-JSON-here", []}, [], [])
Which will yield the variable Body with the contents. See Erlang docs on http client for more info on this call.
Next convert the contents of Body to Erlang terms with:
JsonData = jiffy:decode(Body, [return_maps]).
What you have to do next depends a lot on the structure of your JSON resource. Keep in mind that everything is now in binary UTF-8 encoded strings! If you print JsonData to screen (just enter JsonData. in your Zotonic/Erlang shell) you will see a lot of #map{<<"key"", <<"Value">>} this.
My data was nested so I had to extract the needed data like this:
[{_,ItemList}|_] = ListData.
This gave me a list of maps, and in order to deal with them as individual items I used the following function:
get_maps([]) ->
done;
get_maps([First|Rest]) ->
Map = maps:get(<<"properties">>, First),
case is_map(Map) of
true ->
map_to_proplist(Map),
get_maps(Rest);
false -> done
end,
done;
get_maps(_) ->
done.
As you might remember, the z_db:insert/3 function needs a property list to populate rows, so that what the call to map_to_proplist/1 is for. How this function looks is completely dependent on how your data looks but as an example here is what worked for me:
map_to_proplist(Map) ->
case is_map(Map) of
true ->
{Value1,_} = string:to_integer(binary_to_list(maps:get(<<"key1">>, Map))),
{Value2,_} = string:to_float(binary_to_list(maps:get(<<"key2">>, Map))),
{Value3,_} = string:to_float(binary_to_list(maps:get(<<"key3">>, Map))),
Value4 = binary_to_list(maps:get(<<"key4">>, Map)),
{Value5,_} = string:to_integer(binary_to_list(maps:get(<<"key5">>, Map))),
{Value6,_} = string:to_integer(binary_to_list(maps:get(<<"key6">>, Map))),
PropList = [{rowname1, Value1}, {rowname2, Value2}, {rowname3, Value3}, {rowname4, Value4}, {rowname5, Value5}, {rowname6, Value6}],
m_sitename:insert_items(PropList,z:c(sitename)),
ok;
false ->
ok
end.
See the documentation on string:to_list/1 as to why the tuples are needed when casting. The call to m_sitename:insert_items(PropList,z:c(sitename)) calls the z_db:insert/3 in models/m_sitename.erl but wrapped in a catch:
insert_items(PropList,Context) ->
(catch z_db:insert(?table, PropList, Context)).
Ok, quite a long post but this should get you up and running if you were looking for this answer.
The above was done with Zotonic 0.13.2 on Erlang/OTP 18.
A repost (except the JSON part) of my post in the Zotonic Developers group.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

Using JSON in Haskell to Serialize a Record

I've been working on a very small program to grab details about Half Life 2 servers (using the protocol-srcds library). The workflow is pretty straightforward; it takes a list of servers from a file, queries each of them, and writes the output out to another file (which is read by a PHP script for display, as I'm tied to vBulletin). Would be nice if it was done in SQL or something, but seeing as I'm still just learning, that's a step too far for now!
Anyway, my question relates to serialization, namely, serializing to JSON. For now, I've written a scrappy helper function jsonify, such that:
jsonify (Just (SRCDS.GameServerInfo serverVersion
serverName
serverMap
serverMod
serverModDesc
serverAppId
serverPlayers
serverMaxPlayers
serverBots
serverType
serverOS
serverPassword
serverSecure
serverGameVersioning)) =
toJSObject [ ("serverName", serverName)
, ("serverMap", serverMap)
, ("serverPlayers", show serverPlayers)
, ("serverMaxPlayers", show serverMaxPlayers) ]
(I'm using the Text.JSON package). This is obviously not ideal. At this stage, however, I don't understand using instances to define serializers for records, and my attempts to do so met a wall of frustration in the type system.
Could someone please walk me through the "correct" way of doing this? How would I go about defining an instance that serializes the record? What functions should I use in the instance (showJSON?).
Thanks in advance for any help.
You might want to consider using Data.Aeson instead which might be regarded as the successor to Text.JSON.
With aeson you define separate instances for serialize/deserializing (with Text.JSON you have to define both directions even if you need only one, otherwise the compile will annoy you -- unless you silence the warning somehow), and it provides a few operators making defining instances a bit more compact, e.g. the example from #hammar's answer can be written a little bit less noisy as shown below with the aeson API:
instance ToJSON SRCDS.GameServerInfo where
toJSON (SRCDS.GameServerInfo {..}) = object
[ "serverName" .= serverName
, "serverMap" .= serverMap
, "serverPlayers" .= serverPlayers
, "serverMaxPlayers" .= serverMaxPlayers
]
One simple thing you can do is use record wildcards to cut down on the pattern code.
As for your type system problems, it's hard to give help without seeing error messages and what you've tried so far, however I suspect one thing that might be confusing is that the result of toJSObject will have to be wrapped in a JSObject data constructor as the return type of showJSON is supposed to be a JSValue. Similarly, the values of your object should also be of type JSValue. The easiest way to do this is to use their JSON instance and call showJSON to convert the values.
instance JSON SRCDS.GameServerInfo where
showJSON (SRCDS.GameServerInfo {..}) =
JSObject $ toJSObject [ ("serverName", showJSON serverName)
, ("serverMap", showJSON serverMap)
, ("serverPlayers", showJSON serverPlayers)
, ("serverMaxPlayers", showJSON serverMaxPlayers) ]

Is it okay to rely on automatic pass-by-reference to mutate objects?

I'm working in Python here (which is actually pass-by-name, I think), but the idea is language-agnostic as long as method parameters behave similarly:
If I have a function like this:
def changefoo(source, destination):
destination["foo"] = source
return destination
and call it like so,
some_dict = {"foo": "bar"}
some_var = "a"
new_dict = changefoo(some_var, some_dict)
new_dict will be a modified version of some_dict, but some_dict will also be modified.
Assuming the mutable structure like the dict in my example will almost always be similarly small, and performance is not an issue (in application, I'm taking abstract objects and changing into SOAP requests for different services, where the SOAP request will take an order of magnitude longer than reformatting the data for each service), is this okay?
The destination in these functions (there are several, it's not just a utility function like in my example) will always be mutable, but I like to be explicit: the return value of a function represents the outcome of a deterministic computation on the parameters you passed in. I don't like using out parameters but there's not really a way around this in Python when passing mutable structures to a function. A couple options I've mulled over:
Copying the parameters that will be mutated, to preserve the original
I'd have to copy the parameters in every function where I mutate them, which seems cumbersome and like I'm just duplicating a lot. Plus I don't think I'll ever actually need the original, it just seems messy to return a reference to the mutated object I already had.
Just use it as an in/out parameter
I don't like this, it's not immediately obvious what the function is doing, and I think it's ugly.
Create a decorator which will automatically copy the parameters
Seems like overkill
So is what I'm doing okay? I feel like I'm hiding something, and a future programmer might think the original object is preserved based on the way I'm calling the functions (grabbing its result rather than relying on the fact that it mutates the original). But I also feel like any of the alternatives will be messy. Is there a more preferred way? Note that it's not really an option to add a mutator-style method to the class representing the abstract data due to the way the software works (I would have to add a method to translate that data structure into the corresponding SOAP structure for every service we send that data off too--currently the translation logic is in a separate package for each service)
If you have a lot of functions like this, I think your best bet is to write a little class that wraps the dict and modifies it in-place:
class DictMunger(object):
def __init__(self, original_dict):
self.original_dict = original_dict
def changefoo(source)
self.original_dict['foo'] = source
some_dict = {"foo": "bar"}
some_var = "a"
munger = DictMunger(some_dict)
munger.changefoo(some_var)
# ...
new_dict = munger.original_dict
Objects modifying themselves is generally expected and reads well.