Min.io with AWS SDK 2 for Java refuses to work properly - aws-sdk

I am trying to work with min.io with AWS SDK 2, after battling it for a few days, I have managed to reach a dead end, I upload the files successfully but still get a client error
AWS CLI works just fine.
aws s3 cp --profile minio --endpoint-url http://127.0.0.1:9000 helloworld.txt s3://abc/helloworld.txt
upload: ./helloworld.txt to s3://abc/helloworld.txt
Also when working with local-stack, everything works fine
The Java SDK configuration:
private fun buildCustomS3AsyncClient() : S3AsyncClient {
val customEndpoint : String = env.getRequiredProperty(S3EndpointPropertyName)
val customEndpointPort : String = env.getRequiredProperty(S3Port)
//val region: Region = Region.of(env.getRequiredProperty(S3RegionPropertyName))
val region: Region = Region.US_EAST_1
val asyncClientBuilder = S3AsyncClient.builder()
asyncClientBuilder.region(region)
asyncClientBuilder.endpointOverride(URI.create("http://${customEndpoint}:${customEndpointPort}"))
val staticCredentialsProvider: StaticCredentialsProvider = StaticCredentialsProvider.create(
AwsBasicCredentials.create(
"AKIAIOSFODNN7EXAMPLE",
"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
)
val confBuilder = software.amazon.awssdk.services.s3.S3Configuration
.builder()
.pathStyleAccessEnabled(true)
.build()
asyncClientBuilder.serviceConfiguration(confBuilder)
asyncClientBuilder.credentialsProvider(staticCredentialsProvider)
return asyncClientBuilder.build()
}
and the upload code:
fun uploadToS3(file: File, bucket: String, key: String) : Mono<PutObjectResponse> {
val putObjectRequest: PutObjectRequest = PutObjectRequest.builder().bucket(bucket).key(UUID.randomUUID().toString()).build()
logger.info(putObjectRequest.toString())
val asyncRequestBody: AsyncRequestBody = AsyncRequestBody.fromFile(file)
return Mono.fromFuture(s3AsyncClient.putObject(putObjectRequest, asyncRequestBody))
}
Stack trace:
[sdk-async-response-0-1] ERROR [] [--] o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler - [80ebb08c] 500 Server Error for HTTP POST "/s3"
software.amazon.awssdk.core.exception.SdkClientException: null
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
|_ checkpoint ⇢ Handler com.*.kotlinplayground.controllers.ReactiveUploadController#uploadHandler(Flux) [DispatcherHandler]
|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
|_ checkpoint ⇢ HTTP POST "/s3" [ExceptionHandlingWebHandler]
Stack trace:
at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:97)
at software.amazon.awssdk.core.internal.util.ThrowableUtils.asSdkException(ThrowableUtils.java:98)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.retryIfNeeded(AsyncRetryableStage.java:123)
at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryExecutor.lambda$execute$0(AsyncRetryableStage.java:105)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$1(MakeAsyncHttpRequestStage.java:137)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException: Invalid base 16 character: '-'
at software.amazon.awssdk.utils.internal.Base16Codec.pos(Base16Codec.java:80)
at software.amazon.awssdk.utils.internal.Base16Codec.decode(Base16Codec.java:66)
at software.amazon.awssdk.utils.internal.Base16Lower.decode(Base16Lower.java:65)
at software.amazon.awssdk.services.s3.checksums.ChecksumsEnabledValidator.validatePutObjectChecksum(ChecksumsEnabledValidator.java:131)
at software.amazon.awssdk.services.s3.internal.handlers.AsyncChecksumValidationInterceptor.afterUnmarshalling(AsyncChecksumValidationInterceptor.java:86)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.lambda$afterUnmarshalling$9(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.reverseForEach(ExecutionInterceptorChain.java:210)
at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.afterUnmarshalling(ExecutionInterceptorChain.java:152)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.runAfterUnmarshallingInterceptors(BaseClientHandler.java:138)
at software.amazon.awssdk.core.client.handler.BaseClientHandler.lambda$interceptorCalling$2(BaseClientHandler.java:151)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:40)
at software.amazon.awssdk.core.client.handler.AttachHttpMetadataResponseHandler.handle(AttachHttpMetadataResponseHandler.java:28)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:88)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:129)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher$1.request(ResponseHandler.java:358)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onSubscribe(AsyncResponseHandler.java:108)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$FullResponseContentPublisher.subscribe(ResponseHandler.java:349)
at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.onStream(AsyncResponseHandler.java:71)
at software.amazon.awssdk.core.internal.http.async.AsyncAfterTransmissionInterceptorCallingResponseHandler.onStream(AsyncAfterTransmissionInterceptorCallingResponseHandler.java:86)
at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage$ResponseHandler.onStream(MakeAsyncHttpRequestStage.java:253)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:107)
at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.channelRead0(ResponseHandler.java:66)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at com.typesafe.netty.http.HttpStreamsHandler.channelRead(HttpStreamsHandler.java:129)
at com.typesafe.netty.http.HttpStreamsClientHandler.channelRead(HttpStreamsClientHandler.java:148)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at software.amazon.awssdk.http.nio.netty.internal.LastHttpContentHandler.channelRead(LastHttpContentHandler.java:43)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:438)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:326)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:300)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:253)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:834)

I have looked deeper into it, and it seems that the AWS SDK decodes the ETag returned in the response using "Base16Lower", the ETag returned by min.io contains the '-' character (for example "afb0b3566f2a2a57d59a1487c413eb30-1"), which is unacceptable by the decoder, that is why the upload is successful, but parsing the response fails.
awssdk:s3:2.5.29
I was able to bypass the issue by disabling the checksum validation in the client configuration
val confBuilder = software.amazon.awssdk.services.s3.S3Configuration.builder().pathStyleAccessEnabled(true).checksumValidationEnabled(false).build()

Related

Apache Spark SQL get_json_object java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String

I am trying to read a json stream from an MQTT broker in Apache Spark with structured streaming, read some properties of an incoming json and output them to the console. My code looks like that:
val spark = SparkSession
.builder()
.appName("BahirStructuredStreaming")
.master("local[*]")
.getOrCreate()
import spark.implicits._
val topic = "temp"
val brokerUrl = "tcp://localhost:1883"
val lines = spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.toDF().withColumn("payload", $"payload".cast(StringType))
val jsonDF = lines.select(get_json_object($"payload", "$.eventDate").alias("eventDate"))
val query = jsonDF.writeStream
.format("console")
.start()
query.awaitTermination()
However, when the json arrives I get the following errors:
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
=== Streaming Query ===
Identifier: [id = 14d28475-d435-49be-a303-8e47e2f907e3, runId = b5bd28bb-b247-48a9-8a58-cb990edaf139]
Current Committed Offsets: {MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496]: -1}
Current Available Offsets: {MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496]: 0}
Current State: ACTIVE
Thread State: RUNNABLE
Logical Plan:
Project [get_json_object(payload#22, $.id) AS eventDate#27]
+- Project [id#10, topic#11, cast(payload#12 as string) AS payload#22, timestamp#13]
+- StreamingExecutionRelation MQTTStreamSource[brokerUrl: tcp://localhost:1883, topic: temp clientId: paho7247541031496], [id#10, topic#11, payload#12, timestamp#13]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:300)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2783)
at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$14(MicroBatchExecution.scala:533)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:532)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:198)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:349)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
... 1 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 8, localhost, executor driver): java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:117)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:64)
... 34 more
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getUTF8String$(rows.scala:46)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getUTF8String(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.$anonfun$run$2(WriteToDataSourceV2Exec.scala:117)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:116)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.$anonfun$doExecute$2(WriteToDataSourceV2Exec.scala:67)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:405)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I am sending the JSON records using mosquitto broker and they look like this:
mosquitto_pub -m '{"eventDate": "2020-11-11T15:17:00.000+0200"}' -t "temp"
It seems that every strings coming from Bahir stream source provider raise this error. For instance the following code also raises this error :
spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.select("topic")
.writeStream
.format("console")
.start()
It looks like Spark does not recognize strings coming from Bahir, maybe some kind of weird string class version issue. I've tried the following actions to make the code work:
setup java version to 8
upgrade spark version from 2.4.0 to 2.4.7
setup scala version to 2.11.12
use decode function with all possible encoding combinations instead of .cast(StringType) to transform column "payload" to String
use substring function on column "payload" to recreate a compatible String.
Finally, I got working code by recreating the string using constructor and dataset:
val lines = spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", topic).option("persistence", "memory")
.load(brokerUrl)
.select("payload")
.as[Array[Byte]]
.map(payload => new String(payload))
.toDF("payload")
This solution is rather ugly but at least it works.
I believe that there is nothing wrong with the code provided in the question and I suspect a bug on Bahir or Spark side preventing Spark to handle String from Bahir source.

Spark scala dataframe read and show multiline json file

I am trying to read and show JSON file data in spark using Scala. I am successful in reading the file , but when I say dataframe.show() it throws an error. Code as below
I see that reading multiline JSON file got easier from spark version 2.2 hence using this approach.
import java.sql.{Date, Timestamp}
import java.text.SimpleDateFormat
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql._
object MostTrendingVideoOnADay {
def main(args: Array[ String ]): Unit = {
Logger.getLogger("org").setLevel(Level.OFF)
val spark = SparkSession
.builder()
.appName("youtube")
.master("local[*]")
.getOrCreate()
val usCategory = spark.read.option("multiline", true).option("mode", "PERMISSIVE").json("G:/Apache Spark/DataSets/youtube/US_category_id.json")
usCategory.printSchema()
usCategory.show()
spark.stop()
}
}
JSON File:
{
"kind": "youtube#videoCategoryListResponse",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/S730Ilt-Fi-emsQJvJAAShlR6hM\"",
"items": [
{
"kind": "youtube#videoCategory",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/Xy1mB4_yLrHy_BmKmPBggty2mZQ\"",
"id": "1",
"snippet": {
"channelId": "UCBR8-60-B28hp2BmDPdntcQ",
"title": "Film & Animation",
"assignable": true
}
},
{
"kind": "youtube#videoCategory",
"etag": "\"m2yskBQFythfE4irbTIeOgYYfBU/UZ1oLIIz2dxIhO45ZTFR3a3NyTA\"",
"id": "2",
"snippet": {
"channelId": "UCBR8-60-B28hp2BmDPdntcQ",
"title": "Autos & Vehicles",
"assignable": true
}
}
]
}
Error:
Exception in thread "main" org.apache.spark.SparkException: Job
aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most
recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor
driver): java.io.FileNotFoundException: File
file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not
exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE
tableName' command in SQL or by recreating the Dataset/DataFrame
involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1517)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1504)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:245)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)
at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2861)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2150)
at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2150)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2842)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2841)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2150)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2363)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
at org.apache.spark.sql.Dataset.show(Dataset.scala:637)
at org.apache.spark.sql.Dataset.show(Dataset.scala:596)
at org.apache.spark.sql.Dataset.show(Dataset.scala:605)
at MostTrendingVideoOnADay$.main(MostTrendingVideoOnADay.scala:21)
at MostTrendingVideoOnADay.main(MostTrendingVideoOnADay.scala)
Caused by: java.io.FileNotFoundException: File file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not
exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE
tableName' command in SQL or by recreating the Dataset/DataFrame
involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:174)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
As seen in your log file java.io.FileNotFoundException: File file:/G:/Apache%20Spark/DataSets/youtube/US_category_id.json does not exist
You can see there is a space in path Apache%20Spark which is causing the issue can you remove the space in the path?
Make it like ApacheSpark or Apache_Spark this should solve the issue.
Hope this helps!

Grails: Value out of sequence: expected mode to be OBJECT or ARRAY when writing

I'm trying to parse a JSON object using grails.converters.JSON, but this errors appears.
The code
def str = '{"a": "b"}'
def json = new JSON(str)
or
def map = [:]
map.a = "b"
def json = map as JSON
json = new JSON(json.toString())
are returning this following error:
2016-11-22 14:21:34.592 ERROR --- [nio-8080-exec-9] o.g.web.errors.GrailsExceptionResolver : JSONException occurred when processing request: [GET] /test/index
Value out of sequence: expected mode to be OBJECT or ARRAY when writing '{"a":"b"}' but was INIT. Stacktrace follows:
java.lang.reflect.InvocationTargetException: null
at org.grails.core.DefaultGrailsControllerClass$ReflectionInvoker.invoke(DefaultGrailsControllerClass.java:210)
at org.grails.core.DefaultGrailsControllerClass.invoke(DefaultGrailsControllerClass.java:187)
at org.grails.web.mapping.mvc.UrlMappingsInfoHandlerAdapter.handle(UrlMappingsInfoHandlerAdapter.groovy:90)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:963)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:897)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:861)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
at org.springframework.boot.web.filter.ApplicationContextHeaderFilter.doFilterInternal(ApplicationContextHeaderFilter.java:55)
at org.grails.web.servlet.mvc.GrailsWebRequestFilter.doFilterInternal(GrailsWebRequestFilter.java:77)
at org.grails.web.filters.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:67)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.grails.web.converters.exceptions.ConverterException: org.grails.web.json.JSONException: Value out of sequence: expected mode to be OBJECT or ARRAY when writing '{"a":"b"}' but was INIT
at org.grails.web.converters.AbstractConverter.toString(AbstractConverter.java:111)
at grails3.TestController$$EQ3IkyX7.index(TestController.groovy:25)
... 14 common frames omitted
Caused by: org.grails.web.converters.exceptions.ConverterException: org.grails.web.json.JSONException: Value out of sequence: expected mode to be OBJECT or ARRAY when writing '{"a":"b"}' but was INIT
at grails.converters.JSON.value(JSON.java:193)
at grails.converters.JSON.render(JSON.java:119)
at org.grails.web.converters.AbstractConverter.toString(AbstractConverter.java:109)
... 15 common frames omitted
Caused by: org.grails.web.json.JSONException: Value out of sequence: expected mode to be OBJECT or ARRAY when writing '{"a":"b"}' but was INIT
at org.grails.web.json.JSONWriter.append(JSONWriter.java:142)
at org.grails.web.json.JSONWriter.value(JSONWriter.java:353)
at grails.converters.JSON.value(JSON.java:162)
... 17 common frames omitted
Grails version: 3.2.3
Java version: 1.8u45 and 1.8u111
This statement itself converts the map to JSON.
def val = [a: 101, b: '100'] as JSON
You don't need the do this json = new JSON(json.toString()) anymore

Groovy JsonBuilder appending \u0000's to string

I'm trying to make a simple UDP socket server for a Unity3D game I'm making, and I've got it mostly working. I can send messages to it and read the messages. But when I'm trying to send the message back to the client (for testing purposes, at the moment), I get a BufferOverFlowException.
Before sending the data back, I'm converting it to json using groovy.json.JsonBuilder. The data has a very simple structure:
[data: "Hello World"]
But for whatever reason, JsonBuilder is building it as
{
"data": "Hello World\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\..."
}
the \u0000's go on for a while. Long enough to make my 1024 byte long ByteBuffer overflow.
This is the class that's responsible for sending the data back to the client:
import groovy.json.JsonBuilder
import groovy.transform.CompileStatic
import groovyx.gpars.actor.DynamicDispatchActor
import java.nio.ByteBuffer
import java.nio.channels.DatagramChannel
#CompileStatic
class SenderActor extends DynamicDispatchActor{
//takes message of type [data: Object, receiver: SocketAddress]
void onMessage(Map message){
println(message.data) //prints "Hello World"
def json = new JsonBuilder([data: message.data]).toString()
println("Sending: $json") //prints '{"data": "Hello World\u0000\u0000..."}'
def channel = DatagramChannel.open()
channel.connect(message.receiver as SocketAddress)
def buffer = ByteBuffer.allocate(1024)
buffer.clear()
buffer.put(json.getBytes())
buffer.flip()
channel.send(buffer, message.receiver as SocketAddress)
}
}
And this is the stack trace I get:
An exception occurred in the Actor thread Actor Thread 2
java.nio.BufferOverflowException
at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
at java.nio.ByteBuffer.put(ByteBuffer.java:859)
at Croquet.Actors.SenderActor.onMessage(SenderActor.groovy:28)
at Croquet.Actors.SenderActor$onMessage.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at Croquet.Actors.ProcessorActor$onMessage.call(Unknown Source)
at groovyx.gpars.actor.impl.DDAClosure$_createDDAClosure_closure1.doCall(DDAClosure.groovy:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:292)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1016)
at groovy.lang.Closure.call(Closure.java:423)
at groovy.lang.Closure.call(Closure.java:439)
at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The data in question is encoded as UTF-8, if that helps.
This is the client code that is responsible for sending data to the server (written in C#):
void sendTestMessage(UdpClient udpClient, UdpState udpState){
byte[] data = Encoding.UTF8.GetBytes("Hello World");
udpClient.BeginSend(
data,
data.Length,
udpState.e, //IPEndPoint
result =>{
messageSent = true;
Debug.Log(string.Format("Message '{1}' Sent to {0}", udpState.e, Encoding.UTF8.GetString(data)));
udpClient.EndSend(result);
},
udpState);
}
I solved my problem by changing
def json = new JsonBuilder([data: message.data]).toString()
to
def json = new JsonBuilder([data: message.data]).toString().replace("\\u0000", "")

Solr return 400(bad request) while indexing the json file from localhost

I am new in Solr . I have setup solr in local pc. I am facing problem with the indexing of json file. I have one json file in local pc which i want to index in solr. it show some error which i have mention below.
Error
D:\Solr\Example\exampledocs>java -Durl=http://localhost:8983/solr/update -Dtype=application/json -jar post.jar timeline.json
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/json..
POSTing file timeline.json
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/update
SimplePostTool: WARNING: Response: {"responseHeader":{"status":400,"QTime":0},"error":{"msg":"Unknown command: Name [8]","code":400}}
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/update
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:00.167
Please help me , how can i solve it? Thanks in advance.
Log show
ERROR - 2014-07-30 18:38:52.330; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unexpected character '{' (code 123) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '{' (code 123) in prolog; expected '<'
at [row,col {unknown-source}]: [1,1]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:648)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2047)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:213)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
... 32 more
Json file
{ "Name" : "Matches and Schedule", "timestamp" : { "$date" : 1400825267792 }, "_id" : { "$oid" : "537ee50494" } }
{ "Name" : "meet Modi", "timestamp" : { "$date" : 1401449841192 }, "_id" : { "$oid" : "53886d3a2c" } }
Can't comment yet.
Please post a snippet of your JSON file so we can comment.
Also, in these cases, Solr log files are your best friend, they will tell you exactly what it doesn't like about the data you are posting.
Also, you don't have to specify the host if you are sending to localhost, it is the default.
EDIT: Your JSON doesn't look correct. What do you expect it to do with stuff like:
"timestamp" : { "$date" : 1400825267792 }
First of, if you need timestamp is solr to be a date/time field it needs to be in UTC format. Second, Solr doesn't support nested elements.
Finally, if you are posting multiple documents via json, the format needs to look like this:
[ {"id":"doc1","field2":"val2"} , {"id":"doc2","field2":"val3"} ]
Note that all documents are enclosed in square brackets and separated by a comma.