Stanford CoreNLP Conll output - output

I am using Stanford CoreNLP version 3.4. I want output in Conll format for annotators tokenize, ssplit,pos, lemma, and ner. However on executing the command java -cp stanford-corenlp-3.4.jar:stanford-corenlp-3.4-models.jar:xom.jar:joda-time.jar:jollyday.jar:ejml-3.4.jar -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file input.txt -outputFormat conll it shows following error.
Exception in thread "main" java.lang.IllegalArgumentException: No enum constant edu.stanford.nlp.pipeline.StanfordCoreNLP.OutputFormat.CONLL
at java.lang.Enum.valueOf(Enum.java:236)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$OutputFormat.valueOf(StanfordCoreNLP.java:86)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1167)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1448)
P.S: I don't want to annotate for dependencies.
any suggestions?

The CoNLL output formatter was added in version 3.5.0. Upgrade your version and this error should go away.

Related

Using json as a source for cxf-wadl2java

I received a specification of a RESTful service in json format and need to create a java api library for the client.
Now swagger can do it without a problem, but I would prefer to use cxf-wadl2java maven plugin. By default it doesn't expect the json format. See the exception cause stack trace below.
Is there a way to configure the cxf-wadl2java plugin to read json document?
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '{' (code 123) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:653)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2133)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1181)
at org.apache.cxf.staxutils.StaxUtils.readDocElements(StaxUtils.java:1367)
at org.apache.cxf.staxutils.StaxUtils.readDocElements(StaxUtils.java:1261)
at org.apache.cxf.staxutils.StaxUtils.read(StaxUtils.java:1189)
at org.apache.cxf.staxutils.StaxUtils.read(StaxUtils.java:1178)
at org.apache.cxf.staxutils.StaxUtils.read(StaxUtils.java:1168)
at org.apache.cxf.tools.wadlto.jaxrs.SourceGenerator.readXmlDocument(SourceGenerator.java:1757)
... 32 more
May be you can have two step conversion. swagger.json to wadl file and then use wadl2java plugin.
Install npm in you machine
Use maven exec plugin and run command defined in this npm package to convert from swagger to wadl.
Use cxf wadl2java plugin to generate java file from generated wadl file from above.
EDIT
There is a maven plugin provided by swagger.io. Please refer a usage example here

Adding Spark CSV dependency to Zeppelin

I'm running an EMR with a spark cluster on AWS.
Spark version is 1.6
When running the folllowing command:
proxy = sqlContext.read.load("/user/zeppelin/ProxyRaw.csv",
format="com.databricks.spark.csv",
header="true",
inferSchema="true")
I get the following error:
Py4JJavaError: An error occurred while calling o162.load.
: java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at
http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
How can I solve this? I assume I should add a package but how do I install it and where?
There is many way to add packages in Zeppelin :
One of them is to actually change the conf/zeppelin-env.sh configuration file adding the packages you need e.g com.databricks:spark-csv_2.10:1.4.0 in your case to the submit options since Zeppelin uses the spark-submit command under the hood :
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.4.0"
But let's say that you don't have actually access to those configuration. You can then use Dynamic Dependency Loading via %dep interpreter (deprecated) :
%dep
z.load("com.databricks:spark-csv_2.10:1.4.0")
This will require that you load the dependencies before launching or restarting the interpreter.
Another way to do it is do add the dependency you need via the interpreter dependency manager as described in the following link : Dependency Management for Interpreter.
Well,
First you need to download the CSV liv from Maven repository:
https://mvnrepository.com/artifact/com.databricks/spark-csv_2.10/1.5.0
Check the scala version that you are using. If is 2.10 or 2.11.
When you call spark-shell our spark-submit or pyspark. Or even your Zeppelin you need to add the option --jars and the path to your lib.
Like this:
pyspark --jars /path/to/jar/spark-csv_2.10-1.5.0.jar
Than you can call it as you did above.
You can see other close issue here: How to add third party java jars for use in pyspark

JRuby 1.7.1 and PsychParser error parsing UTF-8 YAML file (Rails 3.2.8)

In our JRuby/Rails project, we are using the i18n gem, and support Japanese as well as English. Our config/locales.ja.yml file is in UTF-8, without any BOM.
When running Rails 3.2.9 on JRuby 1.7.1, we now see the following error:
% jruby -S rake spec:models
Psych::SyntaxError: (C:/Projects/foobar/trunk/config/locales/ja.yml):
expected <block end>, but found Scalar while parsing a block
mapping at line 7 column 33
parse at org/jruby/ext/psych/PsychParser.java:213
...
This error for YAML parsing the ja.yml file is now happening on both our Windows XP and Linux development environments, and only seems to go away when we explicitly set the following system parameter for the JVM:
-Dfile.encoding=utf-8
Could anyone tell me why this is happening on JRuby 1.7.1?
I didn't see this in 1.6.8 or 1.7.0.
Over a year old now, but here is the answer:
http://jruby.org/2012/12/03/jruby-1-7-1.html
In that release, this happened:
Psych YAML engine updated to latest

rg.codehaus.groovy.control.MultipleCompilationErrorsException when I have 2 different jdbc jars on classpath [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Manifest.MF issue with MSSQLSERVER 2008 and Groovy
I have a very simple groovy script with 2 database connections:
One db connection to Oracle
Another db connection to SQLServer
Problem
When I run the program through the GGTS Editor (The groovy and grails version of SpringSource Tool Suite), the two queries run and return results fine. But, when I run the program from the command line, from the project folder as follows:
groovy -cp lib\jtds-1.3.0.jar lib\ojdbc6-11g.jar src\Starter.groovy
I get the following error:
C:\workspace-ggts\Test>groovy -cp lib\jtds-1.3.0.jar lib\ojdbc6-11g.jar src\Star
ter.groovy
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
C:\workspace-ggts\Test\lib\ojdbc6-11g.jar: 1: unexpected char: 0x3 # line 1, col
umn 3.
PK♥ ßî∟9 ♦ META-INF/■╩ ♥ ☻ PK♥♦ ßî∟9 ¶ M
ETA-INF/MANIFEST.MF?æ┴N├0►D∩æ≥☼½₧α►7)R[rúΘÑá☻R½^æq6─òcç╡SΦ▀π4◄ → ─╒3;π}╗µ
Z▬h]┤C▓╥Φ¶↕▬ç┴¬¬§V¿↔w■╤ï:7ö┬♥qí►2C╡íôtf▌Jº0♣│╧ƒ┼öφ9
^
1 error
What I have Tried
I have tried using the jtds driver to connect to SQLServer as I thought the problem was the sqljdbc4.jar from Microsoft site based on this same problem reported differently here
I have tried putting semicolons to separate the classpath dependencies, and still same error.
I have upgraded java version to 1.7. Groovy version is 2.0.5
From the IDE it runs fine, but from command line I get the error.
If I comment out one of the db access code (connection, query, println of resultset) leaving my groovy script with only one db connection & access the program runs fine from command line. For example:
This
groovy -cp lib\jtds-1.3.0.jar src\Starter.groovy
or this:
groovy -cp lib\ojdbc6-11g.jar src\Starter.groovy
does work. As soon as I add the code and the jar in the classpath for that second db access I get the error reported above.
I am out of thoughts or ideas
Files in your classpath need to be separated with a semi-colon on Windows. On unix-like platforms like Linux or OSX, the separator is a colon. Groovy is treating the second jar file as the script, and the script name as the first command line parameter.
Try this:
groovy -cp lib\jtds-1.3.0.jar;lib\ojdbc6-11g.jar src\Starter.groovy
Do you get a different error with that?

Manifest.MF issue with MSSQLSERVER 2008 and Groovy

I have created a simple Groovy project in GGTS IDE that connects to Oracle and SQLServer. The Program runs fine within the IDE but when I run the program through the command line I seem to get some sort of enconding error in MANIFEST.MF?. See the stacktrace below:
Command Line
groovy -cp lib\ojdbc14_g.jar lib\sqljdbc4.jar src\Starter.groovy
Result
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
C:\workspace-ggts-3.1.0.RELEASE\Test\lib\sqljdbc4.jar: 1: unexpected char: 0x3 # line 1, column 3.
PK♥ h?I# ¶ META-INF/MANIFEST.MF¡|GôΓ┌▓εⁿD∞ ░=x/êsä 8◄o ï∟B▲
ë╔
^
1 error
In the past to connect to MSSQLServer I have used the following jars:
msbase.jar
msutil.jar
mssqlserver.jar
This time though, when I looked for jdbc jars for 2008 I got the sqljdbc4.jar. Again it works from within the IDE but not from command line. I have singled out the problem to be in the sqljdbc4.jar because I commented all the code related to that and the program ran fine with just the oracle jar references.
Anybody know why this is happening?
What jars are you using to connect to sqlserver 2008 from the command line with groovy?
Thanks.
You need semi-colons between classpath entries (assuming you are on Windows)
groovy -cp lib\ojdbc14_g.jar;lib\sqljdbc4.jar src\Starter.groovy
Or colons if you're on Linux/Mac
groovy -cp lib/ojdbc14_g.jar:lib/sqljdbc4.jar src/Starter.groovy