can Kafka connect value conveter (JSONConverter) can be used to convert GPB? - json

can Kafka connect value conveter (JSONConverter) can be used to convert GPB ?
I am using kafka connect as sink and writing all topic messages (GPB) into database
I am using default JSONConverter as valueconverter in value.converter field in property file, can this be used to convert GPB object ?
If not can I use the deserializer class used to this deserialize this object or I need to write some other custom class? could you please share some example of the same

No, the JSONConverter expects strictly-formatted JSON, Protocol Buffers (I assume that's what you mean by GPB?) are binary records and need an appropriate converter.
Fortunately the community has one available here: https://github.com/blueapron/kafka-connect-protobuf-converter/blob/master/README.md

Related

Kafka Connect schema format for JSONConverter

I am using Kafka Connect to retrieve an existing schema from the schema registry and then trying to convert the returned schema string using JSONConverter (org.apache.kafka.connect.json.JSONConverter).
Unfortunately, I get an error from JSONConverter:
org.apache.kafka.connect.errors.DataException: Unknown schema type: object
I viewed the JSONConverter code and the error occurs because the schema "type" returned from the schema registry is "object" (see below) but JSONConverter does not recognize that type.
Questions:
Is the retrieved schema usable for JSONConverter? If yes, am I using this incorrectly?
Is JSONConverter expecting a different format? If yes, does someone know what the format JSONConverter is expecting?
Is there a different method of concerting the schema registry response into a "Schema"?
Here are the relevant artifacts:
schema registry response (when querying for a particular schema):
[{"subject":"test-schema","version":1,"id":1,"schemaType":"JSON","schema":"{\"title\":\"test-schema\",\"type\":\"object\",\"required\":[\"id\"],\"additionalProperties\":false,\"properties\":{\"id\":{\"type\":\"integer\"}}}"}]
When the text above is cleaned up a bit, the relevant schema component ("schema") is shown below:
{
"title":"test-schema",
"type":"object",
"required":["id"],
"additionalProperties":false,
"properties":{"id":{"type":"integer"}}
}
org.apache.kafka.connect.json.JSONConverter doesn't actually use "JSONSchema" specification. It has its own (not well documented) format. It also doesn't integrate at all with the Schema Registry.
An object is struct type. - https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/#json-schemas
If you intend on using actual JSONSchema (and the registry), you need to use the Converter from Confluent - io.confluent.connect.json.JsonSchemaConverter
Is there a different method of concerting the schema registry response into a "Schema"
If you use the Schema Registry Java Client, then yes, use the getSchemaById method, then the schemaType() and rawSchema() method of that response should get you close to what you want. With that, you would pass it to some JSONSchema library (e.g. org.everit.json.schema, which is used by the registry)

How to write the output from DB to a JSON file using Spring batch?

I am new to spring batch and there is a requirement for me to read the data from DB and write in to JSON format. whats the best way to do this ? Any api's are there? or we need to write custom writer ? or i need to user JSON libraries such as GSON or JACKSON ? Please guide me...
To read data from a relational database, you can use one of the database readers. You can find an example in the spring-batch-samples repository.
To write JSON data, Spring Batch 4.1.0.RC1 provides the JsonFileItemWriter that allows you to write JSON data to a file. It collaborates with a JsonObjectMarshaller to marshal an object to JSON format. Spring Batch provides support for both Gson and Jackson libraries (you need to have the one you want to use in the classpath). You can find more details here.
Hope this helps.
You do not need GSON or Jackson Libraries if you DB support JSON.
Example : In SQL Server there is an option to get data out of DB as JSON String instead of resultset.
Reference - https://learn.microsoft.com/en-us/sql/relational-databases/json/format-query-results-as-json-with-for-json-sql-server?view=sql-server-2017
https://learn.microsoft.com/en-us/sql/relational-databases/json/format-nested-json-output-with-path-mode-sql-server?view=sql-server-2017
Example - select (select * from tableName for json path) as jsonString;
This will already give you output in JsonString which you can write to a file.

How to generate Recordio from Java object

I am trying to serialize a list of java Objects (POJO) into RecordIO format. I have seen this BeanIO (http://beanio.org/) but it seems to be outdated. Is there any other Java library that could be used or a different way to do this ?
Once list of objects is serialized it will be used to train a model with SageMaker.
Solving my own problem. I decided to use Apache Avro instead of BeanIO. Spark allow to serialize using Avro (c.f. Spark-Avro). This seems to work however it did not fit my use case has I was trying to serialize an array of numbers.

How to stop timezone conversion completely in jackson or Jackson should not apply local offset in any case

In Spring-REST API project, I have a scenario where I am using Jackson annotation on date type field of the class.
There are such scenarios where front-end is sending data in JSON format and this will be auto-mapped with POJO at API(server) side by Jackson internally.
So Jackson library automatically converts that date-time value from JSON data with server's timezone.
I know there is one option to stop timezone conversion if timezone info is present in JSON data (using below)
this.disable(DeserializationFeature.ADJUST_DATES_TO_CONTEXT_TIME_ZONE);
But I need a configuration/setting, Using which Jackson should not apply timezone conversion for incoming JSON date data for any use case.
Please provide snippet if anyone have. It will be more helpful.

SpringXD JSON parser to Oracle DB

I am trying to use SpringXD to stream some JSON metrics data to a Oracle database.
I am using this example from here: SpringXD Example
Http call being made: EarthquakeJsonExample
My shell cmd.
stream create earthData --definition "trigger|usgs| jdbc --columns='mag,place,time,updated,tz,url,felt,cdi,mni,alert,tsunami,status,sig,net,code,ids,souces,types,nst,dmin,rms,gap,magnitude_type' --driverClassName=driver --username=username --password --url=url --tableName=Test_Table" --deploy
I would like to capture just the properties portion of this JSON response into the given table columns. I got it to the point where it doesn't give me a error on the hashing but instead just deposits a bunch of nulls into the column.
I think my problem is the parsing of the JSON itself. Since really the properties is in the Features array. Can SpringXD distinguish this for me out of the box or will I need to write a custom processor?
Here is a look at what the database looks like after a successful cmd.
Any advice? Im new to parsing JSON in this fashion and im not really sure how to find more documentation or examples with SpringXD itself.
Here is reference to the documentation: SpringXD Doc
The transformer in the JDBC sink expects a simple document that can converted to a map of keys/values. You would need to add a transformer upstream, perhaps in your usgs processor or even a separate processor. You could use a #jsonPath expression to extract the properties key and make it the payload.