value schema must be a struct - json

I am sending a nested json data to Kafka consumer for PostgreSQL sink.I am building sink connector and unfortunately I cant change data at source. I want to send the data as it is without any conversions using kafka.
kafka connect is showing this error:
[2023-01-04 22:58:15,227] ERROR WorkerSinkTask{id=Kafkapgsink-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: Value schema must be of type Struct (org.apache.kafka.connect.runtime.WorkerSinkTask:609)
org.apache.kafka.connect.errors.ConnectException: Value schema must be of type Struct
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:86)
at io.confluent.connect.jdbc.sink.metadata.FieldsMetadata.extract(FieldsMetadata.java:67)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:115)
at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:74)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:85)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:581)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:333)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:234)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:203)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:244)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
My kafka connector properties are =
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
sink properties are
name=Kafkapgsink connector.class=io.confluent.connect.jdbc.JdbcSinkConnector task.max=100 connection.url=jdbc:postgresql://localhost:5432/fileintegrity connection.user=postgres connection.password=09900 insert.mode=insert auto.create=true auto.evolve=true table.name.format=oi pk.mode=record_key delete.enabled=true

Your problem is here
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
Strings do not have typed key, value pairs (i.e. structure)
More details here if you want to use JSON - https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained
As you see there, schemas.enable is only a property of JsonConverter

Related

How to handle serialization exception with kafka stream

I'm currently struggling handling serialization exceptions properly in a kafka stream application. Using the latest version. Exception handling for deserialization and production exceptions are fine. There I'm using
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, MyCustomDeserializationExceptionHandler.class);
props.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG, MyCustomProductionExceptionHandler.class);
I'm using AVRO as output format and having define such a serde:
private SpecificAvroSerde<MyClass> avroSerde(Properties envProps) {
SpecificAvroSerde<MyClass> avroSerde = new SpecificAvroSerde<>();
final HashMap<String, String> serdeConfig = new HashMap<>();
serdeConfig.put(SCHEMA_REGISTRY_URL_CONFIG, envProps.getProperty("schema.registry.url"));
avroSerde.configure(serdeConfig, false);
return avroSerde;
}
If for example a required value is not set we obviously get an exception. But as we are dynamically mapping input values it's not really in our hand if we get correct values. An exception looks like
Error encountered sending record to topic NAME for task 0_0 due to:
org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
org.apache.kafka.streams.errors.StreamsException: Error encountered sending record to topic NAME for task 0_0 due to:
org.apache.kafka.common.errors.SerializationException: Error serializing Avro message
at
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:167)
at
org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:129)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:91)
Finally the question/problem. How can I handle such serialization exceptions properly? I would like to log the error and not having the stream failing.
Thanks in advance!

Exception during deserialize java.time.Instant from redis cache

I keep getting following exception while reading data from cache.
org.springframework.data.redis.serializer.SerializationException: Could not read JSON: Cannot construct instance of `java.time.Instant` (no Creators, like default construct, exist): cannot deserialize from Object value (no delegate- or property-based Creator)
It started as soon as I introduced new variable of type java.time.Instant
You can use JsonSerializer and JsonDeserializer to serialize Instant object either as milliseconds or custom text format.
For example implementation follow the answer section of How to set format of string for java.time.Instant using objectMapper?

Kafka connect string to json in Postgresql

I have a topic with string containing json. For example a message could be:
'{"id":"foo", "datetime":1}'
In this topic everything is considered as string.
I would like to send messages in postgresql table with kafka-connect. My goal is to let postgresql to understand that messages are json. Indeed, postgresql handles pretty well json.
How to tell kafka-connect or postgresql that messages are in fact json ?
Thanks
EDIT:
For now, I use ./bin/connect-standalone config/connect-standalone.properties config/sink-sql-rules.properties.
With:
connect-standalone.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
rest.port=8084
plugin.path=share/java
sink-sql-rules.properties
name=mysink
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
# The topics to consume from - required for sink connectors like this one
topics=mytopic
# Configuration specific to the JDBC sink connector.
connection.url=***
connection.user=***
connection.password=***
mode=timestamp+incremeting
auto.create=true
auto.evolve=true
table.name.format=mytable
batch.size=500
EDIT2:
With those conf I get this error:
org.apache.kafka.connect.errors.ConnectException: No fields found using key and value schemas for table

Read Multilple json schema with spark

Software Configuration:
Hadoop distribution:Amazon 2.8.3
Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Spark 2.3.0
Tried to read with multiple json schema,
val df = spark.read.option("mergeSchema",
"true").json("s3a://s3bucket/2018/01/01/*")
Throws an error,
org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:207)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:206)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:392)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:397)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:340)
How to read json with multipl schema's with spark?
This sometimes happens when you are pointing to wrong path (when data does not exist).

Elasticsearch-kafka-river plugin issue

I am trying to pass data from Kafka to Elasticsearch and then to Kibana. I am using kafka-river plugin as mentioned in this link:Elasticsearch-river-kafka plugin
After starting Kafka Zookeeper, server and producer, I am giving data as {"test":"one"}
Then start elasticsearch. I am getting the following error in Kafka:
[2016-02-04 00:05:00,094] ERROR Closing socket for /192.168.1.9 because of error (kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375)
at kafka.utils.Utils$.read(Utils.scala:380)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
at kafka.network.Processor.read(SocketServer.scala:444)
at kafka.network.Processor.run(SocketServer.scala:340)
at java.lang.Thread.run(Thread.java:745)
And, in elasticsearch the following error:
org.codehaus.jackson.JsonParseException: Unexpected character ('S' (code 83)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
Also, I see this in elasticsearch logs:
[2016-02-04 00:14:31,340][WARN ][river.routing ] [ISAAC] no river _meta document found after 5 attempts
Any idea what I am doing wrong? Please help. Thanks.
The concept of rivers is deprecated in elastic search, it adds performance issues. Why dont you look at using the LogStash Kafka plugin for same thing. You can find out more about it at https://www.elastic.co/blog/logstash-kafka-intro