Spark: com.mysql.jdbc.Driver does not allow create table as select - mysql

I'm getting the following error when attempting to save to a MySQL database through spark:
Py4JJavaError: An error occurred while calling o41.saveAsTable.
: java.lang.RuntimeException: com.mysql.jdbc.Driver does not allow create table as select.
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:242)
at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:218)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1121)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1091)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
My python code:
res.saveAsTable(tableName='test.provider_phones', source='com.mysql.jdbc.Driver',driver='com.mysql.jdbc.Driver', mode='append', url='jdbc:mysql://host.amazonaws.com:port/test?user=user&password=pass')
This happens whether or not the table already exists.
I am using spark 1.3.1

you can use createJDBCTable(url: String, table: String, allowExisting: Boolean) or insertIntoJDBC(url: String, table: String, overwrite: Boolean) function of DataFrame.
http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/

Unfortunately, this is not possible in pyspark 1.3.1. My solution is to switch over to scala, and use DataFrame.insertIntoJDBC

Related

Store Pig result into Hive table using Hcat - ClassNotFoundException: org.datanucleus.NucleusContext

I want to load the simple json data into hive table without using Json Serde.
So I have tried to Load the json file in Pig and store the result into hive table.
A = LOAD '/data/pig_files/test.json' Using JsonLoader('merchant_id: chararray, platform_id: chararray, last_cdd_review_date: chararray, risk_factors:{(area:chararray,risk:chararray)}, expected_annual_transaction_number: chararray');
STORE A INTO 'Test_tbl' USING org.apache.hive.hcatalog.pig.HCatStorer();
It works fine but after server patching getting below error
Pig Stack Trace
---------------
ERROR 1115: org.apache.hive.hcatalog.common.HCatException : 2001 :
Error setting output information. Cause :
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias A
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1779)
at org.apache.pig.PigServer.registerQuery(PigServer.java:708)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1110)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:512)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
at org.apache.hadoop.util.RunJar.main(RunJar.java:227)
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 1115: <line 3, column 0> Output Location Validation Failed for: 'test_tbl More info to follow:
org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information.
Cause : com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:64)
at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:99)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.relational.LogicalPlan.validate(LogicalPlan.java:234)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1852)
at org.apache.pig.PigServer$Graph.access$300(PigServer.java:1528)
at org.apache.pig.PigServer.execute(PigServer.java:1441)
at org.apache.pig.PigServer.access$500(PigServer.java:119)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1774)
... 14 more
Caused by: org.apache.pig.PigException: ERROR 1115: org.apache.hive.hcatalog.common.HCatException : 2001 :
Error setting output information. Cause : com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:196)
at org.apache.pig.newplan.logical.visitor.InputOutputFileValidatorVisitor.visit(InputOutputFileValidatorVisitor.java:57)
... 26 more
Caused by: org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output information. Cause :
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException:
Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:220)
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:70)
at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:191)
... 27 more
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2263)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.hive.hcatalog.common.HiveClientCache.getOrCreate(HiveClientCache.java:229)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:204)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:563)
at org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.setOutput(HCatOutputFormat.java:90)
... 29 more
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1775)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:80)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:115)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:234)
at org.apache.hive.hcatalog.common.HiveClientCache$5.call(HiveClientCache.java:229)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
... 35 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1773)
... 45 more
Caused by: java.lang.NoClassDefFoundError: org/datanucleus/NucleusContext
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1741)
at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:64)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:688)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:654)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:648)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:717)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:420)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:78)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:84)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:7036)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:254)
at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.<init>(HiveClientCache.java:334)
... 50 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.NucleusContext
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 64 more
================================================================================
Help me to resolve this or suggest me other ways to load Json data into hive table without Serde.

Hive throws ORC does not support type conversion from file type error when create table as select with to_json UDF

With following simple SQL:
ADD JAR ivy://com.klout:brickhouse:0.6.+?transitive=false;
CREATE TEMPORARY FUNCTION to_json AS 'brickhouse.udf.json.ToJsonUDF';
create table test (b boolean);
insert into table test values (true);
create test1 as select to_json(b) as b from test;
I got the following exception.
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:265)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.<init>(HadoopShimsSecure.java:212)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:332)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:724)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:251)
... 11 more
Caused by: org.apache.hadoop.hive.ql.io.orc.SchemaEvolution$IllegalEvolutionException: ORC does not support type conversion from file type boolean (1) to reader type string (1)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:254)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.buildConversionFileTypesArray(SchemaEvolution.java:214)
at org.apache.hadoop.hive.ql.io.orc.SchemaEvolution.<init>(SchemaEvolution.java:94)
at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.<init>(RecordReaderImpl.java:225)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.rowsOptions(ReaderImpl.java:550)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.createReaderFromFile(OrcInputFormat.java:240)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.<init>(OrcInputFormat.java:164)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1067)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:67)
... 16 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
After days' struggle, I figured out this issue is: to_json(b) as b.
As I use the same column name after it's converted to a string with to_json. Seems Hive will use the original table schema to derive column type for the new table. And when we store a different type into that column, it throws an exception. What a wired behavior.

Spark sql read json file from hdfs failed

My code like this:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
import sqlContext.implicits._
val customers = sqlContext.read.json("jsonfilepath")
In spark-shell occur error ,I can not understand this:
17/06/19 09:59:04 ERROR bonecp.PoolWatchThread: Error in trying to obtain a connection. Retrying in 7000ms
java.sql.SQLException: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.setReadOnly(Unknown Source)
at com.jolbox.bonecp.ConnectionHandle.setReadOnly(ConnectionHandle.java:1324)
at com.jolbox.bonecp.ConnectionHandle.<init>(ConnectionHandle.java:262)
at com.jolbox.bonecp.PoolWatchThread.fillConnections(PoolWatchThread.java:115)
at com.jolbox.bonecp.PoolWatchThread.run(PoolWatchThread.java:82)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: ERROR 25505: A read-only user or a user in a read-only database is not permitted to disable read-only mode on a connection.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.sql.conn.GenericAuthorizer.setReadOnlyConnection(Unknown Source)
at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.setReadOnly(Unknown Source)
... 8 more
How can I solve it?Thanks
Why are you using HiveContext to read json data?
Use SOLContext instead.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val customers = sqlContext.read.json("jsonfilepath")

Getting Exception while writing Dataset to Hive

I am trying to write DataSet to Hive database by using Spark Java, but while in the process I am getting Exception.
This is my code:
Dataset<Row> data = spark.read().json(rdd).select("event.event_name");
data.write().mode("overwrite").saveAsTable("telecom.t2");
Here, rdd is the streamed json data and I can able to print the result data by following command.
data.show();
But when i try to write this result into Hive database i am not getting any exceptions but i am getting Exception in Hive command line when i try to print those values. For eg:
select * from telecom.t2;
And the exception is:
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:317)
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:219)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:89)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:72)
at parquet.column.Encoding$1.initDictionary(Encoding.java:89)
at parquet.column.Encoding$4.initDictionary(Encoding.java:148)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:337)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:66)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:61)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:122)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
at java.lang.Runtime.loadLibrary0(Runtime.java:870)
at java.lang.System.loadLibrary(System.java:1122)
at org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
... 48 more
Exception in thread "main" org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
at parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:204)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:89)
at parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.<init>(PlainValuesDictionary.java:72)
at parquet.column.Encoding$1.initDictionary(Encoding.java:89)
at parquet.column.Encoding$4.initDictionary(Encoding.java:148)
at parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:337)
at parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:66)
at parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:61)
at parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:134)
at parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:99)
at parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
at parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:99)
at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:208)
at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:122)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:85)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:673)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:323)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:445)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:414)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1670)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
2 Jan, 2017 12:02:40 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 12 records.
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block
2 Jan, 2017 12:02:40 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 29 ms. row count = 12
Spark saves data in parquet.snappy format by default when you call saveAsTable and it seems you don't have snappy in hive library path. Changing writer format (for example to json) will not work because Hive expects sequence files in table created using this option.
But you can change compression algorithm before saving data as table:
spark.conf.set("spark.sql.parquet.compression.codec", "gzip")
Gzip compression should be available on Hive by default, in case of any problems you can still save data without compression:
spark.conf.set("spark.sql.parquet.compression.codec", "uncompressed")
This error
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
is related to snappy issue https://github.com/xerial/snappy-java/issues/6
In last comment of this issue, there is a workaround:
unzip snappy-java-1.0.4.1.jar
cd org/xerial/snappy/native/Mac/x86_64/
cp libsnappyjava.jnilib libsnappyjava.dylib
cd ../../../../../..
cp snappy-java-1.0.4.1.jar snappy-java-1.0.4.1.jar.old
jar cf snappy-java-1.0.4.1.jar org

Slick 2.1 Code generator for MySQL : ClassNotFoundException on driver

I am trying to generate the Table classes from an existing SQL tables in MySQL.
I am using Slick 2.1 (slick_2.11-2.1.0-M2), mysql-connector-java-5.1.31-bin
I have created a simple Scala file:
object MySQLPlayground {
def main(args: Array[String]) {
scala.slick.model.codegen.SourceCodeGenerator.main(
Array("scala.slick.driver.MySQLDriver", "com.mysql.jdbc.Driver",
"jdbc:mysql://localhost/fannuaire", "src/main/scala", "modelGene", "user", "password")
)
}
}
But it seems the driver path is wrong. I have the following exception:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/runtime/package$
at scala.slick.model.codegen.SourceCodeGenerator$.main(SourceCodeGenerator.scala:60)
at com.scala.mysql.MySQLPlayground$.main(MySQLPlayground.scala:10)
at com.scala.mysql.MySQLPlayground.main(MySQLPlayground.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.ClassNotFoundException: scala.reflect.runtime.package$
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 8 more
Since the doc says :
Slick’s code generator comes with a default runner that can be used
from the command line or from Java/Scala.
I was expecting it to work out of the box.
Is there any other set up I have to do? Is the driver's path correct?
Thanks
Please add scala-reflect to your code generator project dependencies.
libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value