I am having a problem in executing a script in Apache Pig. I have 3 files namely movies.csv, ratings.csv, tags.csv. First I want to load "movies.csv", then load "ratings.csv" and join both tables. But I am encountering an error in loading of the files. Code given by me is as follows,
register 'piggybank-0.15.0.jar'
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
part1 = LOAD '/home/cloudera/ml-20m/movies' as (movieId: chararray, title: chararray, genre: chararray);
cat part1;
When I give "cat" command, I am getting an error,as
Pig Stack Trace
ERROR 2997: Encountered IOException. Directory part1 does not exist.
java.io.IOException: Directory part1 does not exist.
at org.apache.pig.tools.grunt.GruntParser.processCat(GruntParser.java:677)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:547)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
But I have the file at specified location. I don't know why pig is not able to recognize the input file. I have tried by placing the input file in hdfs and loading the file. But the error is same though. Can anyone please help me. Thanks in advance.
part1 is not a file but a relation.When you use the LOAD command in Pig,you are instructing to load the contents of the file into a relation.You cannot use cat on a relation since the most common use of cat is to read the contents of files.
To display the content of part1 use
DUMP part1;
else if you insist on using cat, then specify the full path to the file
cat /home/cloudera/ml-20m/movies;
I have got a question.
Is it possible to annotate Junit tests, written in Kotlin,
by #Category with spaces in interface name?
I want to have something like this:
#Test
#Category(`Kotlin interface`::class)
fun `Test with spaces`() {
Now I receive an exception
Even though class names with spaces are allowed in Kotlin, they go against the naming convention, which is PascalCase or UpperCamelCase if you will. I just tested this with the latest JUnit 4 and it doesn't work.
"C:\Program Files\Java\jdk1.8.0_131\bin\java" -ea -Didea.test.cyclic.buffer.size=1048576 "-javaagent:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\lib\idea_rt.jar=56217:C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\JetBrains\IntelliJ IDEA Community Edition 2017.3\lib\idea_rt.jar" com.intellij.rt.execution.CommandLineWrapper C:\Users\foo\AppData\Local\Temp\idea_classpath com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 com.foo.bar.http.BarHTTPApplicationTest,systemInfoGET
java.lang.reflect.GenericSignatureFormatError: Signature Parse error: expected '<' or ';' but got
Remaining input: Tests;
at sun.reflect.generics.parser.SignatureParser.error(SignatureParser.java:124)
at sun.reflect.generics.parser.SignatureParser.parsePackageNameAndSimpleClassTypeSignature(SignatureParser.java:348)
at sun.reflect.generics.parser.SignatureParser.parseClassTypeSignature(SignatureParser.java:310)
at sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:289)
at sun.reflect.generics.parser.SignatureParser.parseFieldTypeSignature(SignatureParser.java:283)
at sun.reflect.generics.parser.SignatureParser.parseTypeSignature(SignatureParser.java:485)
at sun.reflect.generics.parser.SignatureParser.parseTypeSig(SignatureParser.java:188)
at sun.reflect.annotation.AnnotationParser.parseSig(AnnotationParser.java:436)
at sun.reflect.annotation.AnnotationParser.parseClassValue(AnnotationParser.java:420)
at sun.reflect.annotation.AnnotationParser.parseClassArray(AnnotationParser.java:724)
at sun.reflect.annotation.AnnotationParser.parseArray(AnnotationParser.java:531)
at sun.reflect.annotation.AnnotationParser.parseMemberValue(AnnotationParser.java:355)
at sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:286)
at sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:120)
at sun.reflect.annotation.AnnotationParser.parseAnnotations(AnnotationParser.java:72)
at java.lang.reflect.Executable.declaredAnnotations(Executable.java:599)
at java.lang.reflect.Executable.declaredAnnotations(Executable.java:597)
at java.lang.reflect.Executable.getDeclaredAnnotations(Executable.java:588)
at java.lang.reflect.Method.getDeclaredAnnotations(Method.java:630)
at java.lang.reflect.AccessibleObject.getAnnotations(AccessibleObject.java:207)
at org.junit.runners.model.FrameworkMethod.getAnnotations(FrameworkMethod.java:187)
at org.junit.runners.model.TestClass.addToAnnotationLists(TestClass.java:84)
at org.junit.runners.model.TestClass.scanAnnotatedMembers(TestClass.java:66)
at org.junit.runners.model.TestClass.<init>(TestClass.java:57)
at org.junit.runners.ParentRunner.createTestClass(ParentRunner.java:88)
at org.junit.runners.ParentRunner.<init>(ParentRunner.java:83)
at org.junit.runners.BlockJUnit4ClassRunner.<init>(BlockJUnit4ClassRunner.java:65)
at org.junit.internal.builders.JUnit4Builder.runnerForClass(JUnit4Builder.java:10)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:33)
at org.junit.internal.requests.FilterRequest.getRunner(FilterRequest.java:36)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:49)
at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:65)
Process finished with exit code -1
So, maybe one day it will be possible, but not right now.
I'm using Hive 0.13.0, and I was expecting it to work with table and column names having non alphanumerical characters, as said in the documentation, but it is not.
I've been able to create a table having column names with dots, for instance:
hive> create external table frb_test (recvTime string, fiwareServicePath string, entityId string, entityType string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad` string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md` array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location '/user/frb/test';
OK
Time taken: 0.286 seconds
As you can see, I'm using https://github.com/rcongiu/Hive-JSON-Serde as Json serde.Nevertheless, below is the content of hdfs:///user/frb/test:
$ hadoop fs -cat /user/frb/test/deleteme
{"recvTime":"2016-02-09T18:03:48.986Z","fiwareServicePath":"orl_sou","entityId":"ORL.SOU.DH.SSTA10","entityType":"ETS", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad":"10.673299789428711", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md":[{"name":"dofTimestamp","type":"ms","value":"2016-02-08T23:00:00.000Z"},{"name":"tag","type":"text","value":"ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad"},{"name":"description","type":"text","value":"Electrical heat load"},{"name":"quality","type":"0:GOOD, +0:ERROR","value":"10813440"},{"name":"max","type":"max","value":"null"},{"name":"min","type":"min","value":"null"},{"name":"lcl","type":"lcl","value":"null"},{"name":"ucl","type":"ucl","value":"null"}]}
I'm not able to select the orl.sou.dh.ssta10.t.hvac.heatload column:
hive> add jar /home/frb/json-serde-1.3.7-jar-with-dependencies.jar;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test; Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0008, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0008/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-02-11 17:05:56,150 Stage-1 map = 0%, reduce = 0%
2016-02-11 17:06:23,653 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1455032234756_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1455032234756_0008_m_000000 (and more) from job job_1455032234756_0008
Task with the most failures(4):
-----
Task ID:
task_1455032234756_0008_m_000000
URL:
http://namenode.fiware.org:8088/taskdetails.jsp?jobid=job_1455032234756_0008&tipid=task_1455032234756_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
... 22 more
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:424)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:136)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I've seen the Hive property governing how Hive works with regard to the non alphanumeric characters is hive.support.quoted.identifiers, which can value none (then Hive behaves as 0.12.0 version) or column, which I guess it is the default value for 0.13.0; nevertheless, I've tried setting it and no results:
hive> set hive.support.quoted.identifiers=column;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0009, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0009/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0009
...
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
...
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I would bet that the HQL parser considers the "dot" character as a way to access the inner fields of a STRUCT, and nothing else.
And I would bet that among all the people involved in the support of "quoted identifiers" in Hive, no-one ever thought of a test case with a "dot" in a column name. After all, who on earth would be crazy enough to use a "dot" in a column name??
OK, maybe. Then who would be crazy enough to define a STRUCT column with a "dot" in its name, out of perversity, just to add an extra "dot" in the mix??
OK, let's assume this might happen. Then would that hypothetical person push the perversity even further, by insisting on using the first ever version of Hive that did support "quoted identifiers"? With no battle-testing of that feature in actual production systems? And no chance to benefit from eventual bug fixes??
My 2 cents: since you clearly have no control on that junk JSON you receive, just run a fast sed on it (or a slow Java regular expression, if you wish) to replace these dotted monstruosities with sane column names. And be happy ever after.
I'm trying to import an edge list using the csv importer, but I get an error message due to what I think is a failure to escape a character. Is there a way around this, or do I have to run some script on the csv files to handle this?
Exception in thread "main" java.lang.NumberFormatException: For input string: "'Abelmoschus esculentus' bunchy to
phytoplasma"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Long.parseLong(Unknown Source)
at java.lang.Long.parseLong(Unknown Source)
at org.neo4j.batchimport.Importer.id(Importer.java:213)
at org.neo4j.batchimport.Importer.id(Importer.java:181)
at org.neo4j.batchimport.Importer.importRelationships(Importer.java:147)
at org.neo4j.batchimport.Importer.doImport(Importer.java:232)
at org.neo4j.batchimport.Importer.main(Importer.java:83)
use the config option: batch_import.csv.quotes=true
I'm running Pig
example$ pig --version
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
on very simple dataset
example$ hadoop fs -cat /user/pavel/trivial.log
1 one
2 two
3 three
I'm trying to save the bag format as json by using the following script:
REGISTER ./pig.jar;
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
B = GROUP A BY mynum;
DUMP B;
STORE B into 'trivial_json.out' USING JsonStorage();
and I get an error:
Backend error message
---------------------
java.lang.NullPointerException
at org.apache.pig.ResourceSchema.<init>(ResourceSchema.java:239)
at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:154)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.execute(PigServer.java:1201)
at org.apache.pig.PigServer.access$100(PigServer.java:129)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1528)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:373)
at org.apache.pig.PigServer.executeBatch(PigServer.java:340)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:396)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
I'm not strong enough in Java to debug in minutes, can somebody suggest what might be going on?
Thanks much!
-Pavel
Got some questions regarding this by email, thought this might be useful for others:
It turned out that the JsonStorage class wasn't included in my Pig installation. In fact it's not even out in a stable branch yet, you won't find it in 0.9.2. But if you get the latest trunk
http://svn.apache.org/repos/asf/pig/trunk/
then trunk/test/org/apache/pig/test/TestJsonLoaderStorage.java shows how it works. If you are not averse to unreleased versions then you could try it. If you go this way you should also take a look at Avro (binary format with json metadata). It's not officially out either.
I was trying to run a streaming python job and was trying to avoid manually specifying the schema. I ended up passing Pig bags and parsing them by hand.
Hope this helps.
-Pavel