Hive: select column with non alphanumeric characters - json

I'm using Hive 0.13.0, and I was expecting it to work with table and column names having non alphanumerical characters, as said in the documentation, but it is not.
I've been able to create a table having column names with dots, for instance:
hive> create external table frb_test (recvTime string, fiwareServicePath string, entityId string, entityType string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad` string, `ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md` array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' location '/user/frb/test';
OK
Time taken: 0.286 seconds
As you can see, I'm using https://github.com/rcongiu/Hive-JSON-Serde as Json serde.Nevertheless, below is the content of hdfs:///user/frb/test:
$ hadoop fs -cat /user/frb/test/deleteme
{"recvTime":"2016-02-09T18:03:48.986Z","fiwareServicePath":"orl_sou","entityId":"ORL.SOU.DH.SSTA10","entityType":"ETS", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad":"10.673299789428711", "ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad_md":[{"name":"dofTimestamp","type":"ms","value":"2016-02-08T23:00:00.000Z"},{"name":"tag","type":"text","value":"ORL.SOU.DH.SSTA10.T.HVAC.HeatLoad"},{"name":"description","type":"text","value":"Electrical heat load"},{"name":"quality","type":"0:GOOD, +0:ERROR","value":"10813440"},{"name":"max","type":"max","value":"null"},{"name":"min","type":"min","value":"null"},{"name":"lcl","type":"lcl","value":"null"},{"name":"ucl","type":"ucl","value":"null"}]}
I'm not able to select the orl.sou.dh.ssta10.t.hvac.heatload column:
hive> add jar /home/frb/json-serde-1.3.7-jar-with-dependencies.jar;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test; Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0008, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0008/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0008
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-02-11 17:05:56,150 Stage-1 map = 0%, reduce = 0%
2016-02-11 17:06:23,653 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1455032234756_0008 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1455032234756_0008_m_000000 (and more) from job job_1455032234756_0008
Task with the most failures(4):
-----
Task ID:
task_1455032234756_0008_m_000000
URL:
http://namenode.fiware.org:8088/taskdetails.jsp?jobid=job_1455032234756_0008&tipid=task_1455032234756_0008_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:157)
... 22 more
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415)
at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:934)
at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:960)
at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:65)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:460)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:416)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:189)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:424)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:136)
... 22 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
I've seen the Hive property governing how Hive works with regard to the non alphanumeric characters is hive.support.quoted.identifiers, which can value none (then Hive behaves as 0.12.0 version) or column, which I guess it is the default value for 0.13.0; nevertheless, I've tried setting it and no results:
hive> set hive.support.quoted.identifiers=column;
hive> select `orl.sou.dh.ssta10.t.hvac.heatload` from frb_test;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1455032234756_0009, Tracking URL = http://namenode.fiware.org:8088/proxy/application_1455032234756_0009/
Kill Command = /opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/hadoop/bin/hadoop job -kill job_1455032234756_0009
...
Caused by: java.lang.RuntimeException: cannot find field orl from [0:recvtime, 1:fiwareservicepath, 2:entityid, 3:entitytype, 4:orl.sou.dh.ssta10.t.hvac.heatload, 5:orl.sou.dh.ssta10.t.hvac.heatload_md]
...
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

I would bet that the HQL parser considers the "dot" character as a way to access the inner fields of a STRUCT, and nothing else.
And I would bet that among all the people involved in the support of "quoted identifiers" in Hive, no-one ever thought of a test case with a "dot" in a column name. After all, who on earth would be crazy enough to use a "dot" in a column name??
OK, maybe. Then who would be crazy enough to define a STRUCT column with a "dot" in its name, out of perversity, just to add an extra "dot" in the mix??
OK, let's assume this might happen. Then would that hypothetical person push the perversity even further, by insisting on using the first ever version of Hive that did support "quoted identifiers"? With no battle-testing of that feature in actual production systems? And no chance to benefit from eventual bug fixes??
My 2 cents: since you clearly have no control on that junk JSON you receive, just run a fast sed on it (or a slow Java regular expression, if you wish) to replace these dotted monstruosities with sane column names. And be happy ever after.

Related

Snakemake fails to produce output

I'm trying to run fastqc on two paired files (1.fq.gz and 2.fq.gz). Running:
snakemake --use-conda -np newenv/1_fastqc.html
...produces what looks like a sensible DAG:
Building DAG of jobs...
Job stats:
job count min threads max threads
---------- ------- ------------- -------------
curlewtest 1 1 1
total 1 1 1
[Sat May 21 11:27:40 2022]
rule curlewtest:
input: 1.fq.gz, 2.fq.gz
output: newenv/1_fastqc.html, newenv/2_fastqc.html
jobid: 0
resources: tmpdir=/tmp
fastqc 1.fq.gz 2.fq.gz
Job stats:
job count min threads max threads
---------- ------- ------------- -------------
curlewtest 1 1 1
total 1 1 1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
When I run the job snakemake --use-conda --cores all newenv/1_fastqc.html
, the analysis runs, but the output files fail to appear. Snakemake also throws the following error:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 2 of /mnt/data/kcollier/snakemake-workspace/snakefile:
Job Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
newenv/1_fastqc.html
newenv/2_fastqc.html completed successfully, but some output files are missing. 0
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Increasing latency does not help. The output directory I created beforehand (newenv) also disappears. Does anyone else know why this is?
Without your code, it is difficult to answer precisely what causes the error. But this can happen if your shell command (or script) does not produce the output exactly as stated in the output directive of the rule - it may be something as simple as an error in the file paths.
Perhaps try to run the shell command that Snakemake runs and see if the output files, you expect, get created. You can easily see the commands that Snakemake runs by adding the --verbose/-p flag or the --debug flag to your snakemake command.

mariadb c-connector bind / execute mess up memory allocation

i'm using mariadb c-connector with prepare, bind and execute. it works usualy. but one case end up in "corrupted unsorted chunks" and core dumping when freeing bind buffer. i suggest the whole malloc organisation is messed up after calling mysql_stmt_execute(). my test's MysqlDynamic.c show:
the problem only is connected to x509cert variable bound by bnd[9]
freeing memory only fails if bnd[9].is_null = 0, if is_null execute end normally
freeing memory (using FreeStmt()) after bind and before execute end normally
print of bnd[9].buffer before execute show (void*) is connected to the correct string buffer
same behavior for setting bnd[9].buffer_length to STMT_INDICATOR_NTS or strlen()
other similar bindings (picture, bnd[10]) do not lead to corrupted memory and core dump.
i defined a c structure test for test data in my test program MysqlDynamic.c which is bound in MYSQL_BIND structure.
bindings for x509cert (string buffer) see bindInsTest():
bnd[9].buffer_type = MYSQL_TYPE_STRING;
bnd[9].buffer_length = STMT_INDICATOR_NTS;
bnd[9].is_null = &para->x509certI;
bnd[9].buffer = (void*) para->x509cert;
please get the details out of source file MysqlDynamic.c. please adapt defines in the source to your environment, verify content, and run it. you will find compile info in source code. MysqlDynymic -c will create the table. MysqlDynamic -i will insert 3 records each run. And 'MysqlDynamic -d` drop the the table again.
MysqlDynamic -vc show:
session set autocommit to <0>
connection id: 175
mariadb server ver:<100408>, client ver:<100408>
connected on localhost to db test by testA
>> if program get stuck - table is locked
table t_test created
mysql connection closed
pgm ended normaly
MysqlDynamic -i show
ins2: BufPara <92> name<master> stamp<> epoch<1651313806000>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<0x5596a0f0c220> buf<0x5596a0f0c220> null<0> length<172>
ins1: BufPara <91> name<> stamp<2020-04-30> epoch<1650707701123>
cert is cert<0x5596a0f181d0> buf<0x5596a0f181d0> null<0>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
ins0: BufPara <90> name<gugus> stamp<1988-10-12T18:43:36> epoch<922337203685477580>
cert is cert<(nil)> buf<(nil)> null<1>
picure is pic<(nil)> buf<(nil)> null<1> length<0>
free(): corrupted unsorted chunks
Aborted (core dumped)
checking t_test table content show all records are inserted as expected.
you can disable loading of x509cert and/or picture by commenting out the defines line 57/58. the program than end normally. you also can comment out line 208. the buffers are then indicated as NULL.
Questions:
is there a generic coding mistake in the program causing this behavior?
can you run the program in your environment without core dumping? i'm currently using version 10.04.08.
any improvment in code will be welcome.

How to add the SwingLibrary plugin to RobotFramework?

I'm trying to execute the SwingLibrary demo available in https://github.com/robotframework/SwingLibrary/wiki/SwingLibrary-Demo
After setting everything up (Jython, RobotFramework, demo app), I can run the following command:
run_demo.py startapp
, and it works (the demo app starts up).
Now if I try to run the sample tests, it fails:
run_demo.py example.txt
[ ERROR ] Error in file '/home/user1/python-scripts/gui_automation/sample-text.txt': Non-existing setting 'Library SwingLibrary'.
[ ERROR ] Error in file '/home/user1/python-scripts/gui_automation/sample-text.txt': Non-existing setting 'Suite Setup Start Test Application'.
==============================================================================
Sample-Text
==============================================================================
Test Add Todo Item | FAIL |
No keyword with name 'Insert Into Text Field description ${arg}' found.
------------------------------------------------------------------------------
Test Delete Todo Item | FAIL |
No keyword with name 'Insert Into Text Field description ${arg}' found.
------------------------------------------------------------------------------
Sample-Text | FAIL |
2 critical tests, 0 passed, 2 failed
2 tests total, 0 passed, 2 failed
==============================================================================
Output: /home/user1/python-scripts/gui_automation/results/output.xml
Log: /home/user1/python-scripts/gui_automation/results/log.html
Report: /home/user1/python-scripts/gui_automation/results/report.html
I suspect that it cannot find swinglibrary.jar, and therefore my plugin installation is probably messed up.
Any ideas?
Take a look at these error messages in the report:
[ ERROR ] Error in file '...': Non-existing setting 'Library SwingLibrary'.
[ ERROR ] Error in file '...': Non-existing setting 'Suite Setup Start Test Application'.
The first rule of debugging is to always assume error messages are telling you the literal truth.
They are telling you that you have an unknown setting. It thinks you are using a setting literally named "Library SwingLibrary" and one named "Suite Setup Start Test". Those are obviously incorrect setting names. The question is, why is it saying that?
My guess is that you are using the space-separated text format, and you only have a single space between "Library" and "SwingLibrary". Because there is one space, robot thinks that whole line is in the first column of the settings table, and whatever is in the first column is treated as the setting name.
The fix should be as simple as inserting two or more spaces after "Library", and two or more spaces after "Suite Setup".
This type of error is why I always recommend using the pipe-separated format. It makes the boundaries between cells much easier to see.

Error while indexing csv file

I am trying to index csv file in Endeca.Indexing is working fine in the case the line length is less than 65536.For large data it is throwing below exception.
FATAL 02/18/14 15:45:53.122 UTC (1392738353122) FORGE {baseline}: TextObjectInputStream: while reading "/opt/soft/endeca/apps/MyApp/data/processing/TestRecord.csv", delimiter " " not found within allowed distance of 65536 characters. ............................................. .............................................. ERROR 02/17/14 16:10:58.060 UTC (1392653458060) FORGE {baseline}: I/O Exception: Error reading data from Java: EdfException thrown in: edf/src/format/Shared/TextObjectInputStream.cpp:76. Message is: exit called
How can I increase this limit to index large data(having more than 65537 character in single line) in Endeca ?.
I imagine you've fixed this. If not, your error is when the row delimiter isn't set correctly in your Record Adapter.
If your records are legitimately that long in a CSV file, switch to XML or something else.

how to save pig bag in json format

I'm running Pig
example$ pig --version
Apache Pig version 0.8.1-cdh3u1 (rexported)
compiled Jul 18 2011, 08:29:40
on very simple dataset
example$ hadoop fs -cat /user/pavel/trivial.log
1 one
2 two
3 three
I'm trying to save the bag format as json by using the following script:
REGISTER ./pig.jar;
A = LOAD 'trivial.log' USING PigStorage('\t') AS (mynum: int, mynumstr: chararray);
B = GROUP A BY mynum;
DUMP B;
STORE B into 'trivial_json.out' USING JsonStorage();
and I get an error:
Backend error message
---------------------
java.lang.NullPointerException
at org.apache.pig.ResourceSchema.<init>(ResourceSchema.java:239)
at org.apache.pig.builtin.JsonStorage.prepareToWrite(JsonStorage.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.<init>(PigOutputFormat.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:85)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NullPointerException
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:221)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:154)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:337)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:382)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1209)
at org.apache.pig.PigServer.execute(PigServer.java:1201)
at org.apache.pig.PigServer.access$100(PigServer.java:129)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1528)
at org.apache.pig.PigServer.executeBatchEx(PigServer.java:373)
at org.apache.pig.PigServer.executeBatch(PigServer.java:340)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.run(Main.java:396)
at org.apache.pig.Main.main(Main.java:107)
================================================================================
I'm not strong enough in Java to debug in minutes, can somebody suggest what might be going on?
Thanks much!
-Pavel
Got some questions regarding this by email, thought this might be useful for others:
It turned out that the JsonStorage class wasn't included in my Pig installation. In fact it's not even out in a stable branch yet, you won't find it in 0.9.2. But if you get the latest trunk
http://svn.apache.org/repos/asf/pig/trunk/
then trunk/test/org/apache/pig/test/TestJsonLoaderStorage.java shows how it works. If you are not averse to unreleased versions then you could try it. If you go this way you should also take a look at Avro (binary format with json metadata). It's not officially out either.
I was trying to run a streaming python job and was trying to avoid manually specifying the schema. I ended up passing Pig bags and parsing them by hand.
Hope this helps.
-Pavel