AWS EMR Spark exception on jdbc datasource load - mysql

I'm spinning emr-5.31.0 image of AWS EMR cluster with Spark 2.4.6 onboard and then I'm trying to login into spark-shell on the master node and follow this tutorial
https://bigdataprogrammers.com/load-data-from-mysql-in-spark-using-jdbc/
for uploading data from my RDS MySQL instance.
I've uploaded both connector jar (mysql-connector-java-5.1.49-bin.jar) as well as script to ~/home/hadoop folder.
Then I perform as described in tutorial and I'm getting 2 errors
scala> [hadoop#ip-172-31-* ~]$ spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/10/09 16:41:31 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://ip-172-31-*.ec2.internal:4040
Spark context available as 'sc' (master = yarn, app id = application_1602254033216_0005).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.6-amzn-0
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_265)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :require /home/hadoop/mysql-connector-java-5.1.49-bin.jar
Added '/home/hadoop/mysql-connector-java-5.1.49-bin.jar' to classpath.
scala> :load /home/hadoop/test01.scala
Loading /home/hadoop/test01.scala...
import java.sql.{Connection, DriverManager, ResultSet}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext
error: error while loading package, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/execution/package.class)' has location not matching its contents: contains package object execution
error: error while loading QueryExecution, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/execution/QueryExecution.class)' has location not matching its contents: contains class QueryExecution
error: error while loading package, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/plans/package.class)' has location not matching its contents: contains package object plans
error: error while loading LogicalPlan, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.class)' has location not matching its contents: contains class LogicalPlan
error: error while loading package, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/encoders/package.class)' has location not matching its contents: contains package object encoders
error: error while loading ExpressionEncoder, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.class)' has location not matching its contents: contains class ExpressionEncoder
error: error while loading Expression, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/expressions/Expression.class)' has location not matching its contents: contains class Expression
error: error while loading NamedExpression, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/catalyst/expressions/NamedExpression.class)' has location not matching its contents: contains class NamedExpression
error: error while loading DataFrameNaFunctions, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/DataFrameNaFunctions.class)' has location not matching its contents: contains class DataFrameNaFunctions
error: error while loading DataFrameStatFunctions, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/DataFrameStatFunctions.class)' has location not matching its contents: contains class DataFrameStatFunctions
error: error while loading TypedColumn, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/TypedColumn.class)' has location not matching its contents: contains class TypedColumn
error: error while loading package, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/package.class)' has location not matching its contents: contains package object function
error: error while loading ReduceFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/ReduceFunction.class)' has location not matching its contents: contains class ReduceFunction
error: error while loading KeyValueGroupedDataset, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/KeyValueGroupedDataset.class)' has location not matching its contents: contains class KeyValueGroupedDataset
error: error while loading MapFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/MapFunction.class)' has location not matching its contents: contains class MapFunction
error: error while loading Metadata, class file '/usr/lib/spark/jars/spark-catalyst_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/types/Metadata.class)' has location not matching its contents: contains class Metadata
error: error while loading FilterFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/FilterFunction.class)' has location not matching its contents: contains class FilterFunction
error: error while loading MapPartitionsFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/MapPartitionsFunction.class)' has location not matching its contents: contains class MapPartitionsFunction
error: error while loading FlatMapFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/FlatMapFunction.class)' has location not matching its contents: contains class FlatMapFunction
error: error while loading ForeachFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/ForeachFunction.class)' has location not matching its contents: contains class ForeachFunction
error: error while loading ForeachPartitionFunction, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/api/java/function/ForeachPartitionFunction.class)' has location not matching its contents: contains class ForeachPartitionFunction
error: error while loading StorageLevel, class file '/usr/lib/spark/jars/spark-core_2.11-2.4.6-amzn-0.jar(org/apache/spark/storage/StorageLevel.class)' has location not matching its contents: contains class StorageLevel
error: error while loading CreateViewCommand, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/execution/command/CreateViewCommand.class)' has location not matching its contents: contains class CreateViewCommand
error: error while loading DataFrameWriter, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/DataFrameWriter.class)' has location not matching its contents: contains class DataFrameWriter
error: error while loading DataStreamWriter, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/streaming/DataStreamWriter.class)' has location not matching its contents: contains class DataStreamWriter
error: error while loading SparkPlan, class file '/usr/lib/spark/jars/spark-sql_2.11-2.4.6-amzn-0.jar(org/apache/spark/sql/execution/SparkPlan.class)' has location not matching its contents: contains class SparkPlan
scala> :load /home/hadoop/test01.scala
Loading /home/hadoop/test01.scala...
import java.sql.{Connection, DriverManager, ResultSet}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext
defined object ReadDataFromJdbc
scala> ReadDataFromJdbc.main(Array("batches"))
Started.......Fri Oct 09 16:42:02 UTC 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
[Stage 0:> (0 + 1) / 1]20/10/09 16:42:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-172-31-20-13.ec2.internal, executor 1): java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:111)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:45)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:272)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:106)
... 25 more
[Stage 0:> (0 + 0) / 1]20/10/09 16:42:05 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
(Connectivity Failed for Table ,org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-172-31-27-165.ec2.internal, executor 2): java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:111)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:45)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:54)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:272)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1405)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.lang.ClassLoader.findClass(ClassLoader.java:523)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:106)
... 25 more
Driver stacktrace:)
first error when I'm loading scala script, it is getting loaded with some errors but repetition of same command seems to fix it
second error once I'm requesting data to be loaded from mysql and despite fact that mysql jdbc connector was added to classpath with a command earlier, it fails with java.lang.ClassNotFoundException: com.mysql.jdbc.Driver.
While I believe I can find some directory which will be accessible by spark to find jdbc, I'm super-confused by error appearing on load of script - why is it appearing and how it can be fixed?

I've ended up creating a bootstrap step for cluster which was copying mysql-connector-java jar to all nodes of cluster before spark and hadoop even installed.
First, create copymysqljar.sh script
#!/bin/bash
sudo mkdir -p /home/hadoop
sudo mkdir -p /usr/lib/spark/jars
sudo mkdir -p /usr/lib/hadoop/lib
aws s3 cp s3://<YOUR_BUCKET>/mysql-connector-java-5.1.49-bin.jar /home/hadoop
chmod 777 /home/hadoop/mysql-connector-java-5.1.49-bin.jar
sudo cp /home/hadoop/mysql-connector-java-5.1.49-bin.jar /usr/lib/spark/jars
sudo cp /home/hadoop/mysql-connector-java-5.1.49-bin.jar /usr/lib/hadoop/lib
save copymysqljar.sh to S3 bucket identified by s3://<YOUR_BUCKET>
proceed to cluster creation in AWS with 'create cluster'-'advanced configuration'
during advanced configuration on step 4 create a custom bootstrap action with s3://<YOUR_BUCKET>/copymysqljar.sh as a script
start cluster creation
Alternatively, instead of steps 3, 4 and 5 you can do the same with AWS command-line tools.
You can reach out to official docs on bootstrap steps https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html#CustomBootstrapCopyS3Object
In general, this script takes care of everything for AWS EMR 5.31 with Hadoop, Spark and Zeppelin. Might require to copy to other directories if other tools should connect to mysql too.

Related

Apache Spark SQL get_json_object java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String

I am trying to read a json stream from an MQTT broker in Apache Spark with structured streaming, read some properties of an incoming json and output them to the console. My code looks like that:
val spark = SparkSession
.builder()
.appName("BahirStructuredStreaming")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val topic = "temp"
val brokerUrl = "tcp://localhost:1883"
val lines = spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.toDF().withColumn("payload", $"payload".cast(StringType))
val jsonDF = lines.select(get_json_object($"payload", "$.eventDate").alias("eventDate"))
val query = jsonDF.writeStream
.format("console")
.start()
query.awaitTermination()
However, when the json arrives I get the following errors:
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
=== Streaming Query ===
Identifier: [id = 14d28475-d435-49be-a303-8e47e2f907e3, runId = b5bd28bb-b247-48a9-8a58-cb990edaf139]
Current Committed Offsets: {MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496]: -1}
Current Available Offsets: {MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496]: 0}
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan:
Project [get_json_object(payload#22, $.id) AS eventDate#27]
+- Project [id#10, topic#11, cast(payload#12 as string) AS payload#22, timestamp#13]
+- StreamingExecutionRelation MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496], [id#10, topic#11, payload#12, timestamp#13]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:300)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2783)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:533)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:532)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
... 1 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 8, localhost, executor driver): java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:117)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:64)
... 34 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:117)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I am sending the JSON records using mosquitto broker and they look like this:
mosquitto_pub -m '{"eventDate": "2020-11-11T15:17:00.000+0200"}' -t "temp"
It seems that every strings coming from Bahir stream source provider raise this error. For instance the following code also raises this error :
spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.select("topic")
.writeStream
.format("console")
.start()
It looks like Spark does not recognize strings coming from Bahir, maybe some kind of weird string class version issue. I've tried the following actions to make the code work:
setup java version to 8
upgrade spark version from 2.4.0 to 2.4.7
setup scala version to 2.11.12
use decode function with all possible encoding combinations instead of .cast(StringType) to transform column "payload" to String
use substring function on column "payload" to recreate a compatible String.
Finally, I got working code by recreating the string using constructor and dataset:
val lines = spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.select("payload")
.as[Array[Byte]]
.map(payload => new String(payload))
.toDF("payload")
This solution is rather ugly but at least it works.
I believe that there is nothing wrong with the code provided in the question and I suspect a bug on Bahir or Spark side preventing Spark to handle String from Bahir source.

Spark SQL error read JSON file : java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class

i am trying to read JSON file using Spark SQL in Java.
this is my code
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
...
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(jsc);
DataFrame df = sqlContext.jsonFile("~/test.json");
df.printSchema();
df.registerTempTable("test");
...
i made simple JSON "test.json", to make it simple:
{
"name": "myname"
}
and when i tried to run the code, it comes error message:
efg
17/03/30 10:02:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.6.86.82:36824 with 1948.2 MB RAM, BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.6.86.82, 36824)
17/03/30 10:02:26 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at org.apache.spark.sql.sources.CaseInsensitiveMap.<init>(ddl.scala:344)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:572)
at org.apache.spark.sql.SQLContext.jsonFile(SQLContext.scala:553)
at sugi.kau.sparkonjava.SparkSQL.main(SparkSQL.java:32)
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
17/03/30 10:02:26 INFO SparkContext: Invoking stop() from shutdown hook
...
thanks
in the docs spark for the function jsonFile(String path):
Loads a JSON file (one object per line), returning the result as a DataFrame. (Note tha jsonFile is replaced by read().json())
so you should have an object per line and your source file should be like this :
{"name": "myname"}
{"name": "myname2"}
.....

How to configure rsyslog template for Exception error for remote logging?

I'm using rsyslog to ship logs to a remote Logstash server, and the Logstash on that service expects input data in a json format. How can I configure an rsyslog template to json-ify a exception. For example, I want to send the following exception as a single message.
2017-02-08 21:59:51,727 ERROR :localhost-startStop-1 [jdbc.sqlonly] 1. PreparedStatement.executeBatch() batching 1 statements:
1: insert into CR_CLUSTER_REGISTRY (Cluster_Name, Url, Update_Dttm, Node_Id) values ('customer', 'rmi://ip-10-53-123.123.eu-west-1.compute.internal:1199/2', '02/08/2017 21:59:51.639', '2')
java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 15.00.00.35] [Error 1338] [SQLState HY000] A failure occurred while executing a PreparedStatement batch request. Details of the failure can be found in the exception chain that is accessible with getNextException.
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:148)
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:137)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.executeBatchDMLArray(TDPreparedStatement.java:272)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.executeBatch(TDPreparedStatement.java:2584)
at com.teradata.tal.qes.StatementProxy.executeBatch(StatementProxy.java:186)
at net.sf.log4jdbc.StatementSpy.executeBatch(StatementSpy.java:539)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:70)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:268)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:50)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1028)
at com.teradata.tal.common.persistence.dao.SessionWrapper.flush(SessionWrapper.java:920)
at com.teradata.trm.common.persistence.dao.DaoImpl.save(DaoImpl.java:263)
at com.teradata.trm.common.service.AbstractService.save(AbstractService.java:509)
at com.teradata.trm.common.cluster.Cluster.init(Cluster.java:413)
at com.teradata.trm.common.cluster.NodeConfiguration.initialize(NodeConfiguration.java:182)
at com.teradata.trm.common.context.Initializer.onApplicationEvent(Initializer.java:73)
at com.teradata.trm.common.context.Initializer.onApplicationEvent(Initializer.java:30)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:97)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:324)
at org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:929)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:467)
at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:385)
at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:284)
at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:111)
at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4973)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5467)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:632)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1247)
at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1898)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: [Teradata Database] [TeraJDBC 15.00.00.35] [Error -2801] [SQLState 23000] Duplicate unique prime key error in CIM_META.CR_CLUSTER_REGISTRY.
at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDatabaseSQLException(ErrorFactory.java:301)
at com.teradata.jdbc.jdbc_4.statemachine.ReceiveInitSubState.action(ReceiveInitSubState.java:114)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.subStateMachine(StatementReceiveState.java:311)
at com.teradata.jdbc.jdbc_4.statemachine.StatementReceiveState.action(StatementReceiveState.java:200)
at com.teradata.jdbc.jdbc_4.statemachine.StatementController.runBody(StatementController.java:137)
at com.teradata.jdbc.jdbc_4.statemachine.PreparedBatchStatementController.run(PreparedBatchStatementController.java:58)
at com.teradata.jdbc.jdbc_4.TDStatement.executeStatement(TDStatement.java:387)
at com.teradata.jdbc.jdbc_4.TDPreparedStatement.executeBatchDMLArray(TDPreparedStatement.java:252)
... 37 more
I have the following rsyslog configuration file. The startmsg.regex aims to "flag" the start of a new message when it sees the "YYYY-mm-dd" date format, and until it sees that format, it should treat any text following the date format as part of the current message.
input(type="imfile"
File="/usr/share/tomcat/dist/logs/trm-error.log*"
Facility="local3"
Tag="trm-error:"
Severity="error"
startmsg.regex="^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}"
escapeLF="on"
)
if $programname == 'trm-error:' then {
action(
type="omfwd"
Target="10.53.234.234"
Port="5514"
Protocol="udp"
template="textLogTemplate"
)
stop
}
..and the following template.
# Template for non json logs, just sends the message wholesale with extra
# # furniture.
template(name="textLogTemplate" type="list") {
constant(value="{ ")
constant(value="\"type\":\"")
property(name="programname")
constant(value="\", ")
constant(value="\"host\":\"")
property(name="hostname")
constant(value="\", ")
constant(value="\"timestamp\":\"")
property(name="timestamp" dateFormat="rfc3339")
constant(value="\", ")
constant(value="\"#version\":\"1\", ")
constant(value="\"customer\":\"customer\", ")
constant(value="\"role\":\"app2\", ")
constant(value="\"sourcefile\":\"")
property(name="$!metadata!filename")
constant(value="\", ")
constant(value="\"message\":\"")
property(name="rawmsg" format="json")
constant(value="\"}\n")
}
However, Logstash complains about a "jsonparseerror" when it tries to parse the log as a json file. Any clues?
The rsyslog configuration files I'm using are correct, that is, Java exception log is indeed wrapped into a valid JSON file. However, Logstash is complaining about a _jsonparsefailure, so this problem is probably related to Logstash Ruby code, and not on the rsyslog side.

How to set a bucket password with spring-data couchbase

I have followed the tutorial for spring-data couchbase and have a succesfull example project with unit tests for persisting a number of custom entities with a range of views implemented to query the entities.
This works correctly in both a local dev environment and a ci environment when using the "default" bucket name and no password as the authentication.
Moving beyond the example, I want to make use of a different bucket and ultimately make use of a password.
When I create a new bucket (named "test_bucket"), and update the property injected into the CouchbaseConfig (extends AbstractCouchbaseConfiguration) to use this new bucket inplace of the "default" I get the following exception when running the unit tests.
I also tried adding a password to the creation script and adding the same password ("psswd" string in both cases) to the properties used in the CouchbaseConfig but get the same exception shown below.
So is it possible to use another bucket than "default" (and its no-authorisation required) and how do I configure a password for use on this bucket ?
I have verified that the bucket(s) and the expected views have been created correctly in couchbase from the Admin GUI.
2015-06-09 16:41:40 INFO ClasspathLoggingApplicationListener:55 - Application failed to start with classpath: [file:/C:/tools/cmd/cygwin64/home/akirby/workspaces/repos/blackjack/persistence/target/surefire/surefirebooter7615727324811258159.jar]
2015-06-09 16:41:40 INFO AutoConfigurationReportLoggingInitializer:107 -
Error starting ApplicationContext. To display the auto-configuration report enabled debug logging (start with --debug)
2015-06-09 16:41:40 ERROR SpringApplication:338 - Application startup failed
java.lang.NoSuchMethodError: org.apache.commons.codec.binary.Base64.encodeBase64String([B)Ljava/lang/String;
at com.couchbase.client.http.HttpUtil.buildAuthHeader(HttpUtil.java:55)
at com.couchbase.client.ViewConnection.addOp(ViewConnection.java:205)
at com.couchbase.client.CouchbaseClient.addOp(CouchbaseClient.java:803)
at com.couchbase.client.CouchbaseClient.asyncGetView(CouchbaseClient.java:342)
at com.couchbase.client.CouchbaseClient.getView(CouchbaseClient.java:430)
at org.springframework.data.couchbase.core.CouchbaseTemplate$2.doInBucket(CouchbaseTemplate.java:223)
at org.springframework.data.couchbase.core.CouchbaseTemplate$2.doInBucket(CouchbaseTemplate.java:220)
at org.springframework.data.couchbase.core.CouchbaseTemplate.execute(CouchbaseTemplate.java:244)
at org.springframework.data.couchbase.core.CouchbaseTemplate.queryView(CouchbaseTemplate.java:220)
at org.springframework.data.couchbase.repository.support.SimpleCouchbaseRepository.deleteAll(SimpleCouchbaseRepository.java:168)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.executeMethodOn(RepositoryFactorySupport.java:416)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:401)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:373)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$DefaultMethodInvokingMethodInterceptor.invoke(RepositoryFactorySupport.java:486)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.data.couchbase.repository.support.ViewPostProcessor$ViewInterceptor.invoke(ViewPostProcessor.java:87)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy50.deleteAll(Unknown Source)
at com.pubtech.cms.persistence.RepositoryService.doWork(RepositoryService.java:47)
at com.pubtech.cms.persistence.ApplicationRepository.lambda$commandLineRunner$0(ApplicationRepository.java:83)
at com.pubtech.cms.persistence.ApplicationRepository$$Lambda$9/594916129.run(Unknown Source)
at org.springframework.boot.SpringApplication.runCommandLineRunners(SpringApplication.java:672)
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:690)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:321)
at org.springframework.boot.test.SpringApplicationContextLoader.loadContext(SpringApplicationContextLoader.java:101)
at org.springframework.test.context.DefaultCacheAwareContextLoaderDelegate.loadContextInternal(DefaultCacheAwareContextLoaderDelegate.java:68)
at org.springframework.test.context.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:86)
at org.springframework.test.context.DefaultTestContext.getApplicationContext(DefaultTestContext.java:72)
at org.springframework.test.context.web.ServletTestExecutionListener.setUpRequestContextIfNecessary(ServletTestExecutionListener.java:170)
at org.springframework.test.context.web.ServletTestExecutionListener.prepareTestInstance(ServletTestExecutionListener.java:110)
at org.springframework.test.context.TestContextManager.prepareTestInstance(TestContextManager.java:212)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.createTest(SpringJUnit4ClassRunner.java:200)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner$1.runReflectiveCall(SpringJUnit4ClassRunner.java:259)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.methodBlock(SpringJUnit4ClassRunner.java:261)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:219)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:83)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:68)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:163)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
2015-06-09 16:41:40 INFO GenericWebApplicationContext:862 - Closing org.springframework.web.context.support.GenericWebApplicationContext#6302bbb1: startup date [Tue Jun 09 16:41:33 BST 2015]; root of context hierarchy
2015-06-09 16:41:40 INFO CouchbaseConnection:87 - Shut down Couchbase client
2015-06-09 16:41:40 INFO ViewConnection:87 - I/O reactor terminated
when using a bucket name that requires a password (bucket "t1", password "pswd") I see this authentication error in the logs, is there some format. other than plain text that the passsord should be encoded with ?
2015-06-10 10:55:58 INFO DefaultListableBeanFactory:822 - Overriding bean definition for bean 'beanNameViewResolver': replacing [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/ErrorMvcAutoConfiguration$WhitelabelErrorViewConfiguration.class]] with [Root bean: class [null]; scope=; abstract=false; lazyInit=false; autowireMode=3; dependencyCheck=0; autowireCandidate=true; primary=false; factoryBeanName=org.springframework.boot.autoconfigure.web.WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter; factoryMethodName=beanNameViewResolver; initMethodName=null; destroyMethodName=(inferred); defined in class path resource [org/springframework/boot/autoconfigure/web/WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter.class]]
2015-06-10 10:55:59 INFO Version:27 - HV000001: Hibernate Validator 5.1.3.Final
2015-06-10 10:56:00 ERROR SASLStepOperationImpl:93 - Error: Auth failure
2015-06-10 10:56:00 WARN BinaryMemcachedNodeImpl:90 - Discarding partially completed op: SASL steps operation
2015-06-10 10:56:00 WARN AuthThread:90 - Authentication failed to localhost/127.0.0.1:11210, Status: {OperationStatus success=false: cancelled}
2015-06-10 10:56:02 WARN AuthThread:90 - Authentication failed to localhost/127.0.0.1:11210, Status: {OperationStatus success=false: Invalid arguments}
2015-06-10 10:56:02 WARN AuthThread:90 - Authentication failed to localhost/127.0.0.1:11210, Status: {OperationStatus success=false: Invalid arguments}
I use the couchbase-cli to create the buckets from a script, using the same script to create the working "default" and not working "test_bucket", (properties are correctly injected using mvn filter) :
# Create Bucket
couchbase-cli bucket-create -c $COUCHBASE_HOST:$COUCHBASE_PORT -u $CB_REST_USERNAME -p $CB_REST_PASSWORD \
--bucket=$BUCKET_NAME \
--bucket-type=couchbase \
--bucket-ramsize=200 \
--bucket-replica=1 \
--wait
CouchbaseConfig class:
..
#Configuration
#EnableCouchbaseRepositories(basePackages = {"com.persistence.db"})
#EnableAutoConfiguration
public class CouchbaseConfig extends AbstractCouchbaseConfiguration {
#Value("${couchbase.bucket:boris}")
private String bucketName;
#Value("${couchbase.bucket.password:nopwd}")
private String password;
#Value("${couchbase.host:127.0.0.1}")
private String ip;
..
I think you got a similar issue to what i was experiencing, the issue for me was using #Value in an #Configuration class has a slight special requirement. I was using YAML for my properties file if that matters at all.
add this to your class (must be static as well)
/**
* this is required for some reason: https://jira.spring.io/browse/SPR-11773
*
* #return
*/
#Bean
public static PropertySourcesPlaceholderConfigurer propertyPlaceholderConfigurer() {
return new PropertySourcesPlaceholderConfigurer();

ClassNotFoundException when trying to deploy a DataSource with #DataSourceDefinition

I'm trying to deploy a datasource with the #DataSourceDefinition-Annotation.
When wildfly deploys the jar, it throws a ClassNotFoundException.
I put the mysql-jdbc-Driver in the deployment-directory. I already use the com.mysql.jdbc.Driver class in Datasources configured in standalone.xml. I havn't created a module with the jdbc-driver under "modules\system\layers\base"
Here is the Class with the Annotation:
#Stateless
#DataSourceDefinition(name = "java:global/jdbc/testingDS",
className = "com.mysql.jdbc.Driver",
portNumber = 3306,
serverName = "localhost",
databaseName = "testing",
user = "testing",
password = "testing")
public class DataSourceDeployment {
public void someMethod() { }
}
And here is the Exception (this is the *.failed-File):
{
"JBAS014671: Failed services" => {"jboss.deployment.unit.\"DatasourceDeploymentTest-1.jar\".INSTALL" => "org.jboss.msc.service.StartException in service jboss.deployment.unit.\"DatasourceDeploymentTest-1.jar\".INSTALL: JBAS018733: Failed to process phase INSTALL of deployment \"DatasourceDeploymentTest-1.jar\"
Caused by: org.jboss.as.server.deployment.DeploymentUnitProcessingException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver from [Module \"deployment.DatasourceDeploymentTest-1.jar:main\" from Service Module Loader]
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver from [Module \"deployment.DatasourceDeploymentTest-1.jar:main\" from Service Module Loader]"},
"JBAS014771: Services with missing/unavailable dependencies" => [
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.InstanceName is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]",
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.ORB is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]",
"jboss.deployment.unit.\"DatasourceDeploymentTest-1.jar\".weld.weldClassIntrospector is missing [jboss.deployment.unit.\"DatasourceDeploymentTest-1.jar\".beanmanager]",
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.HandleDelegate is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]",
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.ValidatorFactory is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]",
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.InAppClientContainer is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]",
"jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment.Validator is missing [jboss.naming.context.java.comp.DatasourceDeploymentTest-1.DatasourceDeploymentTest-1.DataSourceDeployment]"
]
}
CFNE is just as it should be.
You have mysql jdbc driver in module but your deployment and it's #DataSourceDefinition know noting about it.
#DataSourceDefinition uses deployments classloader to load the jdbc driver but it is not available to it as it is in module.
To solve this you should either
1) add deployment's dependency to your mysql driver module via manifest.mf / jboss-deployment-structure.xml, see https://docs.jboss.org/author/display/WFLY8/Class+Loading+in+WildFly for details how
2) add jdbc driver to your war's lib directory
but I would definitely go with 1)