escaping characters neo4j batch loader - csv

I'm trying to import an edge list using the csv importer, but I get an error message due to what I think is a failure to escape a character. Is there a way around this, or do I have to run some script on the csv files to handle this?
Exception in thread "main" java.lang.NumberFormatException: For input string: "'Abelmoschus esculentus' bunchy to
phytoplasma"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Long.parseLong(Unknown Source)
at java.lang.Long.parseLong(Unknown Source)
at org.neo4j.batchimport.Importer.id(Importer.java:213)
at org.neo4j.batchimport.Importer.id(Importer.java:181)
at org.neo4j.batchimport.Importer.importRelationships(Importer.java:147)
at org.neo4j.batchimport.Importer.doImport(Importer.java:232)
at org.neo4j.batchimport.Importer.main(Importer.java:83)

use the config option: batch_import.csv.quotes=true

Related

Add something to log if there is an exception in logback

Due to reasons outside of my control, newlines aren't handled properly by some logging infrastructure I have to use.
A workaround is to replace the \n with another character, e.g. _newline_
This can be done in logback by configuring the pattern:
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<!-- Standard pattern -->
<!-- <pattern>%coloredLevel - %logger - %message%n%xException{15}</pattern> -->
<!-- With newlines removed -->
<pattern>%coloredLevel - %logger - %replace(%message){'\n', '_newline_'}_newline_%replace(%xException){'\n', '_newline_'}%nopex%n</pattern>
</encoder>
</appender>
However, this adds a superfluous _newline_ in logslines when there isn't an exception. (and adds an extra newline to stack traces, but this isn't a large problem)
Is there a way to only output the _newline_ in between the message and the if there is an exception?
Found a bit easier solution to conditionally output newline between log message and exception message. It seems that exception message is always followed by newline, no matter how many lines of stack trace you print. What could you do is print an exception with 0 lines of stack trace (%ex{0}), and then truncate from left all the characters until the length is no more than 1 (%.1ex{0}). This effectively leaves a \n if there's an exception and produces empty string in case there's no exception.
So this:
// logger.error("Log message", new RuntimeException("Exception message", new RuntimeException("Cause")));
// logger.error("Log message without exception");
<pattern>%-5p %m%.1ex{0}%ex%nopex%n</pattern>
Results in:
ERROR Log message
java.lang.RuntimeException: Exception message
[ redacted ]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Cause
... 10 common frames omitted
ERROR Log message without exception
Now the fun thing. Remember the fact that exceptions are always followed by newline? This makes the log events with exceptions to be followed by empty line (one newline from exception and one from the pattern), unless you know a trick how to print something only when there are no exceptions...
P.S. you can wrap everything in replace(){} to escape newlines, too:
<pattern>%-5p %replace(%m%.1ex{0}%ex){'[\r\n]+', 'LF'}%nopex%n</pattern>
I solved it very hackily, by replace an extra exception stack trace with a single _newline_
<pattern>%coloredLevel - %logger - %replace(%message){'\n', '_newline_'} %replace(%replace(%ex{0}){'\n',''}){'.+', '<nl>'}%replace(%xException){'\n', '_newline_'}%nopex%n</pattern>

ERROR 2997: Encountered IOException. Directory part1 does not exist

I am having a problem in executing a script in Apache Pig. I have 3 files namely movies.csv, ratings.csv, tags.csv. First I want to load "movies.csv", then load "ratings.csv" and join both tables. But I am encountering an error in loading of the files. Code given by me is as follows,
register 'piggybank-0.15.0.jar'
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
part1 = LOAD '/home/cloudera/ml-20m/movies' as (movieId: chararray, title: chararray, genre: chararray);
cat part1;
When I give "cat" command, I am getting an error,as
Pig Stack Trace
ERROR 2997: Encountered IOException. Directory part1 does not exist.
java.io.IOException: Directory part1 does not exist.
at org.apache.pig.tools.grunt.GruntParser.processCat(GruntParser.java:677)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:547)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
But I have the file at specified location. I don't know why pig is not able to recognize the input file. I have tried by placing the input file in hdfs and loading the file. But the error is same though. Can anyone please help me. Thanks in advance.
part1 is not a file but a relation.When you use the LOAD command in Pig,you are instructing to load the contents of the file into a relation.You cannot use cat on a relation since the most common use of cat is to read the contents of files.
To display the content of part1 use
DUMP part1;
else if you insist on using cat, then specify the full path to the file
cat /home/cloudera/ml-20m/movies;

Run junit tests written in Kotlin with #category annotation

I have got a question.
Is it possible to annotate Junit tests, written in Kotlin,
by #Category with spaces in interface name?
I want to have something like this:
#Test
#Category(`Kotlin interface`::class)
fun `Test with spaces`() {
Now I receive an exception
Even though class names with spaces are allowed in Kotlin, they go against the naming convention, which is PascalCase or UpperCamelCase if you will. I just tested this with the latest JUnit 4 and it doesn't work.
"C:\Program Files\Java\jdk1.8.0_131\bin\java" -ea -Didea.test.cyclic.buffer.size=1048576 "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\lib\idea_rt.jar=56217:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\lib\idea_rt.jar" com.intellij.rt.execution.CommandLineWrapper C:\Users\foo\AppData\Local\Temp\idea_classpath com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 com.foo.bar.http.BarHTTPApplicationTest,systemInfoGET
java.lang.reflect.GenericSignatureFormatError: Signature Parse error: expected '<' or ';' but got
Remaining input: Tests;
at sun.reflect.generics.parser.SignatureParser.error(SignatureParser.java:124)
at sun.reflect.generics.parser.SignatureParser.parsePackageNameAndSimpleClassTypeSignature(SignatureParser.java:348)
at sun.reflect.generics.parser.SignatureParser.parseClassTypeSignature(SignatureParser.java:310)
at sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:289)
at sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:283)
at sun.reflect.generics.parser.SignatureParser.parseTypeSignature(SignatureParser.java:485)
at sun.reflect.generics.parser.SignatureParser.parseTypeSig(SignatureParser.java:188)
at sun.reflect.annotation.AnnotationParser.parseSig(AnnotationParser.java:436)
at sun.reflect.annotation.AnnotationParser.parseClassValue(AnnotationParser.java:420)
at sun.reflect.annotation.AnnotationParser.parseClassArray(AnnotationParser.java:724)
at sun.reflect.annotation.AnnotationParser.parseArray(AnnotationParser.java:531)
at sun.reflect.annotation.AnnotationParser.parseMemberValue(AnnotationParser.java:355)
at sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:286)
at sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:120)
at sun.reflect.annotation.AnnotationParser.parseAnnotations(AnnotationParser.java:72)
at java.lang.reflect.Executable.declaredAnnotations(Executable.java:599)
at java.lang.reflect.Executable.declaredAnnotations(Executable.java:597)
at java.lang.reflect.Executable.getDeclaredAnnotations(Executable.java:588)
at java.lang.reflect.Method.getDeclaredAnnotations(Method.java:630)
at java.lang.reflect.AccessibleObject.getAnnotations(AccessibleObject.java:207)
at org.junit.runners.model.FrameworkMethod.getAnnotations(FrameworkMethod.java:187)
at org.junit.runners.model.TestClass.addToAnnotationLists(TestClass.java:84)
at org.junit.runners.model.TestClass.scanAnnotatedMembers(TestClass.java:66)
at org.junit.runners.model.TestClass.<init>(TestClass.java:57)
at org.junit.runners.ParentRunner.createTestClass(ParentRunner.java:88)
at org.junit.runners.ParentRunner.<init>(ParentRunner.java:83)
at org.junit.runners.BlockJUnit4ClassRunner.<init>(BlockJUnit4ClassRunner.java:65)
at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:33)
at org.junit.internal.requests.FilterRequest.getRunner(FilterRequest.java:36)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:49)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:65)
Process finished with exit code -1
So, maybe one day it will be possible, but not right now.

Weka ARFF error

I am new to Weka.
I am trying to run my arff file in weka and I keep getting an error.
java.io.IOException: Unable to determine structure as arff(Reason: java.io.IOException:end of line expected, readToken[LINKEDIN],line1)
I don't see whats wrong with the line and I have tried to format the file and declare the attributes as best as I can.
I have attached the file
ARFF file.
Remove extra whitespace from your first line.
You can also try adding quotation marks, but I'm not sure if that is permitted by ARFF

Hive: select column with non alphanumeric characters

I'm using Hive 0.13.0, and I was expecting it to work with table and column names having non alphanumerical characters, as said in the documentation, but it is not.
I've been able to create a table having column names with dots, for instance:
hive> create external table frb_test (recvTime string, fiwareServicePath string, entityId string, entityType string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad` string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md` array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location '/user/frb/test';
OK
Time taken: 0.286 seconds
As you can see, I'm using https://github.com/rcongiu/Hive-JSON-Serde as Json serde.Nevertheless, below is the content of hdfs:///user/frb/test:
$ hadoop fs -cat /user/frb/test/deleteme
{"recvTime":"2016-02-09T18:03:48.986Z","fiwareServicePath":"orl_sou","entityId":"ORL.SOU.DH.SSTA10","entityType":"ETS", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad":"10.673299789428711", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md":[{"name":"dofTimestamp","type":"ms","value":"2016-02-08T23:00:00.000Z"},{"name":"tag","type":"text","value":"ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad"},{"name":"description","type":"text","value":"Electrical heat load"},{"name":"quality","type":"0:GOOD, +0:ERROR","value":"10813440"},{"name":"max","type":"max","value":"null"},{"name":"min","type":"min","value":"null"},{"name":"lcl","type":"lcl","value":"null"},{"name":"ucl","type":"ucl","value":"null"}]}
I'm not able to select the orl.sou.dh.ssta10.t.hvac.heatload column:
hive> add jar /home/frb/json-serde-1.3.7-jar-with-dependencies.jar;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test; Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0008, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0008/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-02-11 17:05:56,150 Stage-1 map = 0%, reduce = 0%
2016-02-11 17:06:23,653 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1455032234756_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1455032234756_0008_m_000000 (and more) from job job_1455032234756_0008
Task with the most failures(4):
-----
Task ID:
task_1455032234756_0008_m_000000
URL:
http://namenode.fiware.org:8088/taskdetails.jsp?jobid=job_1455032234756_0008&tipid=task_1455032234756_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
... 22 more
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:424)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:136)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I've seen the Hive property governing how Hive works with regard to the non alphanumeric characters is hive.support.quoted.identifiers, which can value none (then Hive behaves as 0.12.0 version) or column, which I guess it is the default value for 0.13.0; nevertheless, I've tried setting it and no results:
hive> set hive.support.quoted.identifiers=column;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0009, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0009/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0009
...
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
...
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I would bet that the HQL parser considers the "dot" character as a way to access the inner fields of a STRUCT, and nothing else.
And I would bet that among all the people involved in the support of "quoted identifiers" in Hive, no-one ever thought of a test case with a "dot" in a column name. After all, who on earth would be crazy enough to use a "dot" in a column name??
OK, maybe. Then who would be crazy enough to define a STRUCT column with a "dot" in its name, out of perversity, just to add an extra "dot" in the mix??
OK, let's assume this might happen. Then would that hypothetical person push the perversity even further, by insisting on using the first ever version of Hive that did support "quoted identifiers"? With no battle-testing of that feature in actual production systems? And no chance to benefit from eventual bug fixes??
My 2 cents: since you clearly have no control on that junk JSON you receive, just run a fast sed on it (or a slow Java regular expression, if you wish) to replace these dotted monstruosities with sane column names. And be happy ever after.