As far as I can tell, there is no way to parameterize character strings in an AllenNLP config file --- only ints or floats - allennlp

So the issue is that, for using autotuning (like optuna) with AllenNLP, the suggested practice is to use, in jsonnet scripts, references to environment variables, and then to set up a study to modify those parameters.
That works fine, when the values are integers or floating points. For integers, you use std.parseInt(std.extVar(varname)), for floating point numbers, you use std.parseJson(std.extVar(varname)).
But if I want to change, say the optimization technique in my tests between "adam", "sparseadam", "adamax", adamw", etc. or change the type of RNN I am using, there does not appear to be an easy way to do that.
It would seem that you should be able to do std.extVar(varname) in that case without wrapping it inside a parseJson() or parseInt(), but that returns an error. Has anybody else had that problem and how did you get around it?
Just to add to this, I am trying this with three different string parameters. Here is the jsonnet for the first one, "bert_vocab":
local bert_vocab=std.extvar('bert_vocab');
Error message:
486 ext_vars = {**_environment_variables(), **ext_vars}
487
--> 488 file_dict = json.loads(evaluate_file(params_file, ext_vars=ext_vars))
489
490 if isinstance(params_overrides, dict):
RuntimeError: RUNTIME ERROR: field does not exist: extvar
/bigdisk/lax/cox/jupyter/bert_config.jsonnet:28:18-28 thunk <bert_vocab>
/bigdisk/lax/cox/jupyter/bert_config.jsonnet:61:22-32 object <anonymous>
/bigdisk/lax/cox/jupyter/bert_config.jsonnet:(59:16)-(63:12) object <anonymous>
/bigdisk/lax/cox/jupyter/bert_config.jsonnet:(58:21)-(64:10) object <anonymous>
/bigdisk/lax/cox/jupyter/bert_config.jsonnet:(56:19)-(65:8) object <anonymous>
During manifestation
I also tried various "string escaping functions" like here (but none of the string escaping functions work either:
local bert_vocab=std.escapeStringBash(std.extvar("bert_vocab"));
I can do the following to verify that the os environment variable is set:
os.environ['bert_vocab'] returns 'bert-base-uncased'

(I don't know AllenNLP, only Jsonnet.)
ExtVars can be arbitrary Jsonnet values (numbers, floats, strings, arrays, objects, functions or nulls).
Judging from the examples you brought up, AllenNLP passes the parameters as strings and you need to do the parsing. In such case it should be possible to just use "naked" std.ExtVar to get a string.

Related

Is everything really a string in TCL?

And what is it, if it isn't?
Everything I've read about TCL states that everything is just a string in it. There can be some other types and structures inside of an interpreter (for performance), but at TCL language level everything must behave just like a string. Or am I wrong?
I'm using an IDE for FPGA programming called Vivado. TCL automation is actively used there. (TCL version is still 8.5, if it helps)
Vivado's TCL scripts rely on some kind of "object oriented" system. Web search doesn't show any traces of this system elsewhere.
In this system objects are usually obtained from internal database with "get_*" commands. I can manipulate properties of these objects with commands like get_property, set_property, report_property, etc.
But these objects seem to be something more than just a string.
I'll try to illustrate:
> set vcu [get_bd_cells /vcu_0]
/vcu_0
> puts "|$vcu|"
|/vcu_0|
> report_property $vcu
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
> report_property "$vcu"
Property Type Read-only Value
CLASS string true bd_cell
CONFIG.AXI_DEC_BASE0 string false 0
<...>
But:
> report_property "/vcu_0"
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property {/vcu_0}
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> report_property /vcu_0
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
> puts |$vcu|
|/vcu_0|
> report_property [string range $vcu 0 end]
ERROR: [Common 17-58] '/vcu_0' is not a valid first class Tcl object.
So, my question is: what exactly is this "valid first class Tcl object"?
Clarification:
This question might seem like asking for help with Vivado scripting, but it is not. (I was even in doubt about adding [vivado] to tags.)
I can just live and script with these mystic objects.
But it would be quite useful (for me, and maybe for others) to better understand their inner workings.
Is this "object system" a dirty hack? Or is it a perfectly valid TCL usage?
If it's valid, where can I read about it?
If it is a hack, how is it (or can it be) implemented? Where exactly does string end and object starts?
Related:
A part of this answer can be considered as an opinion in favor of the "hack" version, but it is quite shallow in a sense of my question.
A first class Tcl value is a sequence of characters, where those characters are drawn from the Basic Multilingual Plane of the Unicode specification. (We're going to relax that BMP restriction in a future version, but that's not yet in a version we'd recommend for use.) All other values are logically considered to be subtypes of that. For example, binary strings have the characters come from the range [U+000000, U+0000FF], and integers are ASCII digit sequences possibly preceded by a small number of prefixes (e.g., - for a negative number).
In terms of implementation, there's more going on. For example, integers are usually implemented using 64-bit binary values in the endianness that your system uses (but can be expanded to bignums when required) inside a value boxing mechanism, and the string version of the value is generated on demand and cached while the integer value doesn't change. Floating point numbers are IEEE double-precision floats. Lists are internally implemented as an array of values (with smartness for handling allocation). Dictionaries are hash tables with linked lists hanging off each of the hash buckets. And so on. THESE ARE ALL IMPLEMENTATION DETAILS! As a programmer, you can and should typically ignore them totally. What you need to know is that if two values are the same, they will have the same string, and if they have the same string, they are the same in the other interpretation. (Values with different strings can also be equal for other reasons: for example, 0xFF is numerically equal to 255 — hex vs decimal — but they are not string equal. Tcl's true natural equality is string equality.)
True mutable entities are typically represented as named objects: only the name is a Tcl value. This is how Tcl's procedures, classes, I/O system, etc. all work. You can invoke operations on them, but you can only see inside to a limited extent.
Vivado TCL is not TCL. Vivado will not really document their language they call TCL, but refer you to the real TCL language documentation. Where Vivado TCL and TCL differ, you are left on your own without help. TCL was a poor choice for a scripting language given the very large data bases, so they had to bastardize it to get it half functional. You are better off getting help on the Xilinx forums then in general TCL forums. Why they went with TCL rather than python is beyond anyone's comprehension.

Correct way to use Mockito for JdbcOperation

I am new to Mockito and trying to cover following source code:
jdbcOperations.update(insertRoleQuery,new Object[]{"menuName","subMenuName","subSubMenuName","aa","bb","cc","role"});
In this query is taking 7 string parameters. I have written the mockito test case for the code and it's also covering the source code but I am not sure whether it's the correct way or not.
when(jdbcOperations.update(Mockito.anyString(), new Object[]{Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString(),Mockito.anyString()})).thenThrow(runtimeException);
Please suggest if i am doing it right way or not.
Thanks
As per the docs, you can either use exact values, or argument matchers, but not both at the same time:
Warning on argument matchers:
If you are using argument matchers, all arguments have to be provided
by matchers.
If you do mix them, like in your sample, mockito will complain with something similar to
org.mockito.exceptions.misusing.InvalidUseOfMatchersException:
Invalid use of argument matchers!
2 matchers expected, 1 recorded:
-> at MyTest.shouldMatchArray(MyTest.java:38)
This exception may occur if matchers are combined with raw values:
//incorrect:
someMethod(anyObject(), "raw String");
When using matchers, all arguments have to be provided by matchers.
For example:
//correct:
someMethod(anyObject(), eq("String by matcher"));
For more info see javadoc for Matchers class.
In your case you don't seem to care about the array contents, so you can just use any():
when(jdbcOperation.update(anyString(), any())).thenThrow(runtimeException);
If you want to at least check the number of parameters, you can use either
org.mockito.Mockito's argThat(argumentMatcher):
when(jdbcOperation.update(anyString(), argThat(array -> array.length == 7))).thenThrow(runtimeException);
org.mockito.hamcrest.MockitoHamcrest's argThat(hamcrestMatcher):
when(jdbcOperation.update(anyString(), argThat(arrayWithSize(7)))).thenThrow(runtimeException);
If you're interested in matching certain values, you can use AdditionalMatchers.aryEq(expectedArray), or just Mockito.eq(expectedArray) which has a special implementation for arrays, but I fell that the first one expresses your intent in a clearer way.
when(jdbcOperation.update(anyString(), aryEq(new Object[]{"whatever"}))).thenThrow(runtimeException);

How does Ruby JSON.parse differ to OJ.load in terms of allocating memory/Object IDs

This is my first question and I have tried my best to find an answer - I have looked everywhere for an answer but haven't managed to find anything concrete to answer this in both the oj docs and ruby json docs and here.
Oj is a gem that serves to improve serialization/deserialization speeds and can be found at: https://github.com/ohler55/oj
I noticed this difference when I tried to dump and parse a hash with a NaN contained in it, twice, and compared the two, i.e.
# Create json Dump
dump = JSON.dump ({x: Float::NAN})
# Create first JSON load
json_load = JSON.parse(dump, allow_nan: true)
# Create second JSON load
json_load_2 = JSON.parse(dump, allow_nan: true)
# Create first OJ load
oj_load = Oj.load(dump, :mode => :compat)
# Create second OJload
oj_load_2 = Oj.load(dump, :mode => :compat)
json_load == json_load_2 # Returns true
oj_load == oj_load_2 # Returns false
I always thought NaN could not be compared to NaN so this confused me for a while until I realised that json_load and json_load_2 have the same object ID and oj_load and oj_load_2 do not.
Can anyone point me in the direction of where this memory allocation/object ID allocation occurs or how I can control that behaviour with OJ?
Thanks and sorry if this answer is floating somewhere on the internet where I could not find it.
Additional info:
I am running Ruby 1.9.3.
Here's the output from my tests re object IDs:
puts Float::NAN.object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts JSON.parse(%q({"x":NaN}), allow_nan: true)["x"].object_id
70129392082680
70129387898880
70129387898880
puts Float::NAN.object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id; puts Oj.load(%q({"x":NaN}), allow_nan: true)["x"].object_id
70255410134280
70255410063100
70255410062620
Perhaps I am doing something wrong?
I believe that is a deep implementation detail. Oj does this:
if (ni->nan) {
rnum = rb_float_new(0.0/0.0);
}
I can't find a Ruby equivalent for that, Float.new doesn't appear to exist, but it does create a new Float object every time (from an actual C's NaN it constructs on-site), hence different object_ids.
Whereas Ruby's JSON module uses (also in C) its own JSON::NaN Float object everywhere:
CNaN = rb_const_get(mJSON, rb_intern("NaN"));
That explains why you get different NaNs' object_ids with Oj and same with Ruby's JSON.
No matter what object_ids the resulting hashes have, the problem is with NaNs. If they have the same object_ids, the enclosing hashes are considered equal. If not, they are not.
According to the docs, Hash#== uses Object#== for values that only outputs true if and only if the argument is the same object (same object_id). This contradicts NaN's property of not being equal to itself.
Spectacular. Inheritance gone haywire.
One could, probably, modify Oj's C code (and even make a pull request with it) to use a constant like Ruby's JSON module does. It's a subtle change, but it's in the spirit of being compat, I guess.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

How do I find character positions in ANTLR 2?

I have a simple grammar, and have produced a pair of c# classes using antlr 2.7.7. When the parser finds an error with a token, it throws an exception; I want to find out how many characters into a parsed stream the token came. How do I do that?
It's been a long time ago since I played with ANTLR, but if I remember well, to do what you want, I had to subclass the parser to keep a counter of characters that was incremented each time a new token was found (with the token length of course).
You ought to read chapter 10 ("Error Reporting and Recovery") from Terrence Parr's book "The Definitive ANTLR Reference".
Not knowing what target language you're using, it'll be hard to tell you exactly what to do. But I'll assume you're using the Java target, and you can correct me if I'm wrong.
When an ANTLR recognizer fails to match an input string, it throws a very specific exception, based on the failure context. (There are nine different kinds of exceptions, RecognitionException is the root type, and it has eight subclasses of its own: MismatchedTokenException, MismatchedTreeNodeException, NoViableAltException, EarlyExitException, FailedPredicateException, MismatchedRangeException, MismatchedSetException, MismatchedNotSetException).
The root exception type (RecognitionException) has a few handy public fields that you might want to take a look at (specifically: "index", "line" and "charPositionInLine"). The "index" field tells you the exact character position where the error was found. The "line" and "charPositionInLine" fields are pretty self-explanatory. Here's the JavaDoc:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_recognition_exception.html