inserting data into mysql table using pig - mysql

I have 2 text files. I am planning to dump the data into a mysql table using pig latin.
Is there a way to that?
I have written the following piece of code:
register '/homes/rdheeraj/pig-0.10.0/code/mysql-connector-java-5.1.17-bin.jar'
register '/homes/rdheeraj/pig-0.10.0/code/piggybank.jar';
a = load 'one.txt' using PigStorage('|') as (name:chararray, age:int);
b = load 'two.txt' using PigStorage('|') as (name:chararray, desg:chararray);
c = cogroup a by name, b by name;
d = foreach c generate flatten(a), flatten(b);
e = foreach d generate $0,$1,$3;
store e into 'TEST' using org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver','jdbc:mysql://127.0.0.1/sftool','root','password','insert into TEST (name,age,desg) values (?,?,?)');
The error I am getting is as follows
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Could not resolve org.apache.pig.piggybank.storage.DBStorage using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: Failed to parse: Cannot instantiate: org.apache.pig.piggybank.storage.DBStorage
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)
... 14 more
Caused by: java.lang.RuntimeException: Cannot instantiate: org.apache.pig.piggybank.storage.DBStorage
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:510)
at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:791)
at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:780)
at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:4583)
at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:6225)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1335)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:789)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:507)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:382)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)

A few suggestions.
I think you've got an error in your path to your jar - shouldn't it be home not homes?
You don't need quotes around your paths
You are missing a semicolon after your registration of the mysql driver.
So it should look more like this:
REGISTER /home/rdheeraj/pig-0.10.0/code/mysql-connector-java-5.1.17-bin.jar;
REGISTER /home/rdheeraj/pig-0.10.0/code/piggybank.jar;

Related

Hive throws ORC does not support type conversion from file type error when create table as select with to_json UDF

With following simple SQL:
ADD JAR ivy://com.klout:brickhouse:0.6.+?transitive=false;
CREATE TEMPORARY FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF';
create table test (b boolean);
insert into table test values (true);
create test1 as select to_json(b) as b from test;
I got the following exception.
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:212)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:724)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 11 more
Caused by: org.apache.hadoop.hive.ql.io.orc.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type boolean (1) to reader type string (1)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:254)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:214)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.<init>(SchemaEvolution.java:94)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:225)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:550)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:240)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:164)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1067)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67)
... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
After days' struggle, I figured out this issue is: to_json(b) as b.
As I use the same column name after it's converted to a string with to_json. Seems Hive will use the original table schema to derive column type for the new table. And when we store a different type into that column, it throws an exception. What a wired behavior.

org.apache.spark.sql.AnalysisException when loading JSON file

I've got a large datastore of JSON objects that I'm trying to load into Spark in a dataframe, but I'm receiving an AnalysisException when trying to do any processing on the data. There are several differently formatted JSON objects in the input data. Some of them have the same fields in different orders and/or levels of the JSON.
I've loaded the JSON in with the following code.
val messagesDF = spark.read.json("data/test")
But once I try to perform any operations on the data I receive the following stack trace:
Caused by: java.lang.reflect.InvocationTargetException: org.apache.spark.sql.AnalysisException: Reference 'SGLN' is ambiguous, could be: SGLN#1099, SGLN#1204.;
at sun.reflect.GeneratedMethodAccessor86.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.zeppelin.spark.ZeppelinContext.showDF(ZeppelinContext.java:235)
... 48 more
Caused by: org.apache.spark.sql.AnalysisException: Reference 'SGLN' is ambiguous, could be: SGLN#1099, SGLN#1204.;
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:264)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:158)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:130)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1.apply(LogicalPlan.scala:129)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:96)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at org.apache.spark.sql.types.StructType.map(StructType.scala:96)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:129)
at org.apache.spark.sql.execution.datasources.FileSourceStrategy$.apply(FileSourceStrategy.scala:83)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:62)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
at
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2814)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2127)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2342)
... 52 more
Everything I've been able to find on this error is from joining different dataframes which is certainly similar to what I'm trying to do, but I can't go in and refactor the dataframe to provide unique ids.

Getting Exception while writing Dataset to Hive

I am trying to write DataSet to Hive database by using Spark Java, but while in the process I am getting Exception.
This is my code:
Dataset<Row> data = spark.read().json(rdd).select("event.event_name");
data.write().mode("overwrite").saveAsTable("telecom.t2");
Here, rdd is the streamed json data and I can able to print the result data by following command.
data.show();
But when i try to write this result into Hive database i am not getting any exceptions but i am getting Exception in Hive command line when i try to print those values. For eg:
select * from telecom.t2;
And the exception is:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:89)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:72)
at parquet.column.Encoding$1.initDictionary(Encoding.java:89)
at parquet.column.Encoding$4.initDictionary(Encoding.java:148)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:337)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:66)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:61)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:122)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 48 more
Exception in thread "main" org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:89)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:72)
at parquet.column.Encoding$1.initDictionary(Encoding.java:89)
at parquet.column.Encoding$4.initDictionary(Encoding.java:148)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:337)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:66)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:61)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:122)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
2 Jan, 2017 12:02:40 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 12 records.
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 29 ms. row count = 12
Spark saves data in parquet.snappy format by default when you call saveAsTable and it seems you don't have snappy in hive library path. Changing writer format (for example to json) will not work because Hive expects sequence files in table created using this option.
But you can change compression algorithm before saving data as table:
spark.conf.set("spark.sql.parquet.compression.codec", "gzip")
Gzip compression should be available on Hive by default, in case of any problems you can still save data without compression:
spark.conf.set("spark.sql.parquet.compression.codec", "uncompressed")
This error
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
is related to snappy issue https://github.com/xerial/snappy-java/issues/6
In last comment of this issue, there is a workaround:
unzip snappy-java-1.0.4.1.jar
cd org/xerial/snappy/native/Mac/x86_64/
cp libsnappyjava.jnilib libsnappyjava.dylib
cd ../../../../../..
cp snappy-java-1.0.4.1.jar snappy-java-1.0.4.1.jar.old
jar cf snappy-java-1.0.4.1.jar org

export more than 100,000 rows to csv in vaadin

I try to export data from vaadin table with more than 100,000 records to a CSV 2007 file because it support more than 1 million rows.
i try this code:
CsvExport csvExport = new CsvExport(exporttable,exportedReportName,exportedReportName);
csvExport.excludeCollapsedColumns();
csvExport.setExportFileName(exportedReportName+".csv");
csvExport.setDisplayTotals(false);
csvExport.export();
using tableexport-for-vaadin-1.5.1.5 and poi-3.10-FINAL.jar
and i got this exception;
Caused by: java.lang.IllegalArgumentException: Invalid row number (65536) outside allowable range (0..65535)
at org.apache.poi.hssf.usermodel.HSSFRow.setRowNum(HSSFRow.java:239)
at org.apache.poi.hssf.usermodel.HSSFRow.<init>(HSSFRow.java:87)
at org.apache.poi.hssf.usermodel.HSSFRow.<init>(HSSFRow.java:71)
at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:232)
at org.apache.poi.hssf.usermodel.HSSFSheet.createRow(HSSFSheet.java:68)
at com.vaadin.addon.tableexport.ExcelExport.addDataRow(ExcelExport.java:518)
at com.vaadin.addon.tableexport.ExcelExport.addDataRows(ExcelExport.java:469)
at com.vaadin.addon.tableexport.ExcelExport.convertTable(ExcelExport.java:264)
at com.vaadin.addon.tableexport.TableExport.export(TableExport.java:80)
at com.mobinets.fm.gui.views.AlarmStatusView$4.buttonClick(AlarmStatusView.java:605)
at sun.reflect.GeneratedMethodAccessor277.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:508)

neo4j creating multiple nodes json

Please let me know where I am going wrong
I am using neo4j to store my routers interface and link information. The link is to be created between 2 interfaces.
I have successfully created nodes and interfaces but finding issues in creating links.
This is the query I use to create link
MATCH (I:Interface), (I2:Interface)
FOREACH(p in FILTER(z in {props} WHERE z.OrigIPAddress = I.IfIPAddress or z.TermIPAddress = I.IfIPAddress) |
MERGE (I {IfIPAddress:p.OrigIPAddress})-[r:link]->(I2 {IfIPAddress:p.TermIPAddress})
ON CREATE SET r = p
ON MATCH SET r = p)
I have an array of maps called props which I am passing in the json as params which contains the properties of link i.e OrigIPAddress (Originating interface IP), TermIPAddress (Terminating interface ip).
In the foreach I first filter all those link which have their source or destination interfaces already present. Now after doing this I am creating links out of the props.
When I run this it runs properly but no links are created. There are both source and destination interfaces present.
EDIT 1:
I modified the query and when I run this query
MATCH (I:Interface), (I2:Interface)
FOREACH(p in FILTER(z in {props} WHERE z.OrigIPAddress = I.IfIPAddress and z.TermIPAddress = I2.IfIPAddress) |
MERGE (I {IfIPAddress:p.OrigIPAddress})-[r:link]->(I {IfIPAddress:p.TermIPAddress})
ON CREATE SET r = p
ON MATCH SET r = p)
I do not get ant response back from the neo4j and in neo4j web console I can see this meesage
"Neo4j disconnected, check you socket..."
Here are the logs
Apr 09, 2014 11:06:46 AM org.neo4j.server.logging.Logger log
WARNING: You are using an unsupported Java runtime. Please use Oracle(R) Java(TM) Runtime Environment 7.
SEVERE: The response of the WebApplicationException cannot be utilized as the response is already committed. Re-throwing to the HTTP container
javax.ws.rs.WebApplicationException: javax.ws.rs.WebApplicationException: org.eclipse.jetty.io.EofException
at org.neo4j.server.rest.repr.OutputFormat$1.write(OutputFormat.java:174)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1506)
at org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1477)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:744)
Caused by: javax.ws.rs.WebApplicationException: org.eclipse.jetty.io.EofException
at org.neo4j.server.rest.repr.formats.StreamingJsonFormat$StreamingRepresentationFormat.flush(StreamingJsonFormat.java:401)
at org.neo4j.server.rest.repr.formats.StreamingJsonFormat$StreamingRepresentationFormat.complete(StreamingJsonFormat.java:389)
at org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:43)
at org.neo4j.server.rest.repr.OutputFormat$1.write(OutputFormat.java:160)
... 30 more
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:186)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:335)
at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:125)
at org.eclipse.jetty.server.HttpConnection$ContentCallback.process(HttpConnection.java:784)
at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:79)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:356)
at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:631)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:661)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:151)
at com.sun.jersey.spi.container.servlet.WebComponent$Writer.flush(WebComponent.java:315)
at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.flush(ContainerResponse.java:145)
at org.codehaus.jackson.impl.Utf8Generator.flush(Utf8Generator.java:1091)
at org.neo4j.server.rest.repr.formats.StreamingJsonFormat$StreamingRepresentationFormat.flush(StreamingJsonFormat.java:397)
... 33 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.write(IOUtil.java:148)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:524)
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:167)
... 45 more
Apr 09, 2014 11:10:17 AM com.sun.jersey.server.impl.application.WebApplicationImpl _handleRequest
SEVERE: The response of the WebApplicationException cannot be utilized as the response is already committed. Re-throwing to the HTTP container
javax.ws.rs.WebApplicationException: javax.ws.rs.WebApplicationException: org.eclipse.jetty.io.EofException
at org.neo4j.server.rest.repr.OutputFormat$1.write(OutputFormat.java:174)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71)
at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57)
at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1506)
at org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1477)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:744)
Caused by: javax.ws.rs.WebApplicationException: org.eclipse.jetty.io.EofException
at org.neo4j.server.rest.repr.formats.StreamingJsonFormat$StreamingRepresentationFormat.flush(StreamingJsonFormat.java:401)
at org.neo4j.server.rest.repr.formats.StreamingJsonFormat$StreamingRepresentationFormat.complete(StreamingJsonFormat.java:389)
at org.neo4j.server.rest.repr.MappingRepresentation.serialize(MappingRepresentation.java:43)
at org.neo4j.server.rest.repr.OutputFormat$1.write(OutputFormat.java:160)
... 30 more
Caused by: org.eclipse.jetty.io.EofException
at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:186)
at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:335)
at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:125)
at org.eclipse.jetty.server.HttpConnection$ContentCallback.process(HttpConnection.java:784)
at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:79)
at org.eclipse.jetty.server.HttpConnection.send(HttpConnection.java:356)
at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:631)
at org.eclipse.jetty.server.HttpChannel.write(HttpChannel.java:661)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:151)
at
Let me know where I am I going wrong.
Is this closer to what you are trying to do?
MATCH (I:Interface)
FOREACH(
p in FILTER(z in {props}
WHERE z.OrigIPAddress = I.IfIPAddress or z.TermIPAddress = I.IfIPAddress) |
MERGE (I)-[r:link]->(:Interface {IfIPAddress:p.TermIPAddress})
SET I.IfIPAddress = p.OrigIPAddress, r = p
);
Finally after googling I found the issue.
Its server timeout. Since the query is taking some time to execute, the server is timing out and sending a response.
I increased the timeout and it worked
Here is the reference on how to increase it.
http://docs.neo4j.org/chunked/stable/server-configuration.html