Stanford NLP: how to disable warnings? - warnings

Stanford NLP pipeline issues lots of warnings particularly disturbing in production setup:
WARN Untokenizable: � (U+FFFD, decimal: 65533)
Is there a way to disable them?

If you are working directly with a Tokenizer, the answer Denis Kulagin gives is good; if you are operating at the higher level of a StanfordCoreNLP pipeline, you can simply give the property (or equivalent command-line option):
tokenize.options = untokenizable=noneDelete
(to silently delete all unknown characters) or to silently keep them:
tokenize.options = untokenizable=noneKeep

One can do it that way:
Reader reader = new StringReader(paragraphText);
DocumentPreprocessor documentPreprocessor = new DocumentPreprocessor(reader, DocumentPreprocessor.DocType.Plain);
TokenizerFactory<? extends HasWord> factory = PTBTokenizer.factory();
factory.setOptions("untokenizable=noneDelete");
documentPreprocessor.setTokenizerFactory(factory);
From here: https://github.com/stanfordnlp/CoreNLP/issues/103#issuecomment-157793500

Related

SQLModel in Fastapi using __table_args__ is unable to create tables for unit testing [duplicate]

I have a Pylons project and a SQLAlchemy model that implements schema qualified tables:
class Hockey(Base):
__tablename__ = "hockey"
__table_args__ = {'schema':'winter'}
hockey_id = sa.Column(sa.types.Integer, sa.Sequence('score_id_seq', optional=True), primary_key=True)
baseball_id = sa.Column(sa.types.Integer, sa.ForeignKey('summer.baseball.baseball_id'))
This code works great with Postgresql but fails when using SQLite on table and foreign key names (due to SQLite's lack of schema support)
sqlalchemy.exc.OperationalError: (OperationalError) unknown database "winter" 'PRAGMA "winter".table_info("hockey")' ()
I'd like to continue using SQLite for dev and testing.
Is there a way of have this fail gracefully on SQLite?
I'd like to continue using SQLite for
dev and testing.
Is there a way of have this fail
gracefully on SQLite?
It's hard to know where to start with that kind of question. So . . .
Stop it. Just stop it.
There are some developers who don't have the luxury of developing on their target platform. Their life is a hard one--moving code (and sometimes compilers) from one environment to the other, debugging twice (sometimes having to debug remotely on the target platform), gradually coming to an awareness that the gnawing in their gut is actually the start of an ulcer.
Install PostgreSQL.
When you can use the same database environment for development, testing, and deployment, you should.
Not to mention the QA team. Why on earth are they testing stuff they're not going to ship? If you're deploying on PostgreSQL, assure the quality of your work on PostgreSQL.
Seriously.
I'm not sure if this works with foreign keys, but someone could try to use SQLAlchemy's Multi-Tenancy Schema Translation for Table objects. It worked for me but I have used custom primaryjoin and secondaryjoinexpressions in combination with composite primary keys.
The schema translation map can be passed directly to the engine creator:
...
if dialect == "sqlite":
url = lambda: "sqlite:///:memory:"
execution_options={"schema_translate_map": {"winter": None, "summer": None}}
else:
url = lambda: f"postgresql://{user}:{pass}#{host}:{port}/{name}"
execution_options=None
engine = create_engine(url(), execution_options=execution_options)
...
Here is the doc for create_engine. There is a another question on so which might be related in that regard.
But one might get colliding table names all schema names are mapped to None.
I'm just a beginner myself, and I haven't used Pylons, but...
I notice that you are combining the table and the associated class together. How about if you separate them?
import sqlalchemy as sa
meta = sa.MetaData('sqlite:///tutorial.sqlite')
schema = None
hockey_table = sa.Table('hockey', meta,
sa.Column('score_id', sa.types.Integer, sa.Sequence('score_id_seq', optional=True), primary_key=True),
sa.Column('baseball_id', sa.types.Integer, sa.ForeignKey('summer.baseball.baseball_id')),
schema = schema,
)
meta.create_all()
Then you could create a separate
class Hockey(Object):
...
and
mapper(Hockey, hockey_table)
Then just set schema above = None everywhere if you are using sqlite, and the value(s) you want otherwise.
You don't have a working example, so the example above isn't a working one either. However, as other people have pointed out, trying to maintain portability across databases is in the end a losing game. I'd add a +1 to the people suggesting you just use PostgreSQL everywhere.
HTH, Regards.
I know this is a 10+ year old question, but I ran into the same problem recently: Postgres in production and sqlite in development.
The solution was to register an event listener for when the engine calls the "connect" method.
#sqlalchemy.event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
dbapi_connection.execute('ATTACH "your_data_base_name.db" AS "schema_name"')
Using ATTACH statement only once will not work, because it affects only a single connection. This is why we need the event listener, to make the ATTACH statement over all connections.

Dymola warning about derivatives of noEvent

In the Dymola translation log of one large model I see several times the warning:
Warning: Can only compute non-scalar gradients of functions specifying derivatives and not for:
noEvent
with no indication of where it comes from. Could someone explain what the warning means, and how to find and fix it?
Any message of that kind should only occur if you have set the flag Advanced.PrintFailureToDifferentiate = true; if you set this flag to false it should not occur.
The likely cause is that you have a non-scalar expression involving noEvent that for some reason needs to be solved using a system of equations; and it just means that Dymola does not yet generate analytic Jacobians for such systems.
The exact cause is difficult to say without the model; you could send that through the normal support-channels.

Unit test being confused by three question marks

Im writing some junits, and have this check, comparing the keys and values of two hashmaps
Iterator<Map.Entry<String, String>> it = expected.entrySet().iterator();
while (it.hasNext()) {
Map.Entry<String, String> pairs = (Map.Entry<String, String>) it.next();
assertTrue("Checks key exists", actual.containsKey(pairs.getKey()));
assertThat("Checks value", actual.get(pairs.getKey()), equalTo(pairs.getValue()));
}
Works great, but i have a value that trips it up:
java.lang.AssertionError: Checks value
Expected: "Member???s "
but: was "Member���s "
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
I checked the data, and the data is correct. it appears that the triple ? are tripping up something somehow. Does anyone know why this would be tripped? It seems pretty basic to me, its not even hamcrest getting messed up, its the actual assert.
You have an encoding conflict. Many different ways this could manifest itself, but generally caused by not enforcing consistent encoding across the board.
Assuming you are using UTF-8 somewhere...
If using Maven, set property project.build.sourceEncoding to UTF-8 See doc for more details.
Other build systems will certainly have options to specify code and resource file encodings.
If using IO to read (or write), always specify the encoding. For example, when reading from a file:
InputStream is = new FileInputStream(file), "UTF8");
In short, find any build system setting for encoding, and any point where IO is involved when reading text, and ensure your desired encoding is set.

Iterating over a string in Vimscript or Parse a JSON file

So I'm creating a vim script that needs to load and parse a JSON file into a local object graph. I searched and I couldn't find any native way to process a JSON file, and I don't want to add any dependencies to the script. So I wrote my own function to parse the JSON string (gotten from the file), but it's really slow. At the moment, I iterate through each character in the file like so:
let len = strlen(jsonString) - 1
let i = 0
while i < len
let c = strpart(jsonString, i, 1)
let i += 1
" A lot of code to process file....
" Note: I've tried short cutting the process by searching for enclosing double-quotes when I come across the initial double quotes (also taking into account escaping '\' character. It doesn't help
endwhile
I've also tried this method:
for c in split(jsonString, '\zs')
" Do a lot of parsing ....
endfor
For reference, a file with ~29,000 characters takes about 4 seconds to process, which is unacceptable.
Is there a better way to iterate over a string in vim script?
Or better yet, have I missed a native function to parse JSON?
Update:
I asked for no dependencies because I:
Didn't want to deal with them
Genuinely wanted some ideas for best way to do this without someone else's work.
Sometimes I just like to do things manually even though the problem has already been solved.
I'm not against plugins or dependencies at all, it's just that I'm curious. Thus the question.
I ended up creating my own function to parse the JSON file. I was creating a script that could parse the package.json file associated with node.js modules. Because of this, I could rely on a fairly consistent format and quit the processing whenever I'd retrieved the information I needed. This usually cut out large chunks of the file since most developers put the largest chunk of the file, their "readme" section, at the end. Because the package.json file is strictly defined, I left the process somewhat fragile. It assumed a root dictionary { } and actively looks for certain entries. You can find the script here: https://github.com/ahayman/vim-nodejs-complete/blob/master/after/ftplugin/javascript.vim#L33.
Of course, this doesn't answer my own question. It's only the solution to my unique problem. I'll wait a few days for new answers and pick the best one before the bounty ends (already set an alarm on my phone).
The simplest solution with the least dependencies is just using the json_decode vim function.
let dict = json_decode(jsonString)
Even though Vim's origin dates back a lot it happens that its internal string() eval() representation is that close to JSON that its likely to work unless you need special characters.
You can lookup the implementation here which even supports true/false/null if you want:
https://github.com/MarcWeber/vim-addon-json-encoding
Better use that library (vim-addon-manager allows to install dependencies easily).
Now it depends on your data whether this is good enough.
Now Benjamin Klein posted your question to vim_use which is why I'm replying.
Best and fast replies happen if you subscribe to the Vim mailinglist.
Goto vim.sf.net and follow the community link.
You cannot expect the Vim community to scrape stackoverflow.
I've added the keyword "json" and "parsing" to that little code that it can be found easier.
If this solution does not work for you you can try the many :h if_* bindings or write an external script which extracts the information you're looking for, or turns JSON into Vim's dictionary representation which can be read by eval() escaping special characters you care about correctly.
If you seek for completely correct solution omitting dependencies is one of the worst thing you can do. The eval() variant mentioned by #MarcWeber is one of the fastest, but it has its disadvantages:
Using solution for securing eval I mentioned in comment makes it no longer the fastest. In fact after you use this it makes eval() slower by more then an order of magnitude (0.02s vs 0.53s in my test).
It does not respect surrogate pairs.
It cannot be used to verify that you have correct JSON: it accepts some strings (e.g. "\<C-o>") that are not JSON strings and it allows trailing commas.
It fails to give normal error messages. It fails badly if you use vam#VerifyIsJSON I mentioned in p.1.
It fails to load floating point values like 1e10 (vim requires numbers to look like 1.0e10, but numbers like 1e10 are allowed: note “and/or” in the first paragraph).
. All of the above (except for the first) statements also apply to vim-addon-json-encoding mentioned by #MarcWeber because it uses eval. There are some other possibilities:
Fastest and the most correct is using python: pyeval('json.loads(vim.eval("varname"))'). Not faster then eval, but fastest among other possibilities. (0.04 in my test: approximately two times slower then eval())
Note that I use pyeval() here. If you want solution for vim version that lacks this functionality it will no longer be one of the fastest.
Use my json.vim plugin. It has an advantages of slightly better error reporting compared to failed vam#VerifyIsJSON, slightly worse compared to eval() and it correctly loads floating-point numbers. It can be used for verification of strings (it does not accept "\<C-a>"), but it loads lists with trailing comma just fine. It does not support surrogate pairs. It is also very slow: in the test I used (it uses 279702 character long strings) it takes 11.59s to load. Json.vim tries to use python if possible though.
For the best error reporting you can take yaml.vim and purge YAML support out of it leaving only JSON (I once have done the same thing for pyyaml, though in python: see markedjson library used in powerline: it is pyyaml minus YAML stuff plus classes with marks). But this variant is even slower then json.vim and should only be used if the main thing you need is error reporting: 207 seconds for loading the same 279702 character long string.
Note that the only variant mentioned that satisfies both requirements “no dependencies” and “no python” is eval(). If you are not fine with its disadvantages you have to throw away one or both of these requirements. Or copy-paste code. Though if you take speed into account only two candidates are left: eval() and python: if you want to parse json fast you really must use C and only these solutions spend most time in functions written in C.
Most other interpreters (ruby/perl/TCL) do not have pyeval() equivalent so they will be slower even if their JSON implementation is written in C. Some other (lua/racket (mzscheme)) have pyeval() equivalent, but e.g. luaeval('{}') is zero meaning that you will have to add additional step explicitly and recursively converting objects into vim dictionaries and lists (e.g. luaeval('vim.dict({})')) which will impact performance. Cannot say anything about mzeval(), but I have never heard about anybody actually using racket (mzscheme) with vim.

How to retrieve useful system information in java?

Which system information are useful - especially when tracing an exception or other problems down - in a java application?
I am thinking about details about exceptions, java/os information, memory/object consumptions, io information, environment/enchodings etc.
Besides the obvious - the exception stack trace - the more info you can get is better. So you should get all the system properties as well as environment variables. Also if your application have some settings, get all their values. Of course you should put all this info into your log file, I used System.out her for simplicity:
System.out.println("----Java System Properties----");
System.getProperties().list(System.out);
System.out.println("----System Environment Variables----");
Map<String, String> env = System.getenv();
Set<String> keys = env.keySet();
for (String key : keys) {
System.out.println(key + "=" + env.get(key));
}
For most cases this will be "too much" information, but for most cases the stack trace will be enough. Once you will get a tough issue you will be happy that you have all that "extra" information
Check out the Javadoc for System.getProperties() which documents the properties that are guaranteed to exist in every JVM.
For pure java applications:
System.getProperty("org.xml.sax.driver")
System.getProperty("java.version")
System.getProperty("java.vm.version")
System.getProperty("os.name")
System.getProperty("os.version")
System.getProperty("os.arch")
In addition, for java servlet applications:
response.getCharacterEncoding()
request.getSession().getId()
request.getRemoteHost()
request.getHeader("User-Agent")
pageContext.getServletConfig().getServletContext().getServerInfo()
One thing that really helps me- to see where my classes are getting loaded from.
obj.getClass().getProtectionDomain().getCodeSource().getLocation();
note: protectiondomain can be null as can code source so do the needed null checks