Importing a json file into Cassandra - json

Hi is it possible to import any random json file into cassandra.
The json file is not exported from sstable2json. The json file is from a different website and needs to be imported into cassandra. Please could anyone advise whether this is possible

JSON support won't be introduced until Cassandra 3.0 (see CASSANDRA-7970) and in this case you still need to define a schema for your json data to map to. You do have some other options:
Use maps which sort of map to JSON. Maps can be indexed as of Cassandra 2.1 (CASSANDRA-4511) There is also a good Stack Exchange post about this.
You mention 'any random json file'. You could just have a string column that contains the raw JSON, but then you lose any query-ability of that data.
Come up with some kind of schema for your JSON data and map it to a CQL table and write some code that parses the JSON and writes it to the CQL table mapping to that data. This doesn't sound like an option for you since you want to be able to import any random JSON file.
If you are looking to only do json document storage, you might want to look at more document-oriented solutions instead of a column-oriented solution like cassandra.

Related

Is there a way to generate json data from json schema with multi source json data?

I am currently working with a project to implement data fusion / data integration. Now I can find the technology to get json schemas from json data. But how could I integrate different json attributes into the target json schema. My final target is to implement the json data fusion by configuration on a web page. Is there any solution or any open source implementation I could take referrence? Thank you so much.
Please allow me to explain it more clearly. I could define a target json schema, and its attributes should be filled by other json data's attributes (the json may come from multi source).

Kafka stream append to JSON as event enrichment

I have a producer that writes a json file to the topic to be read by a kafka consumer stream. Its simple key-value pair.
I want to stream the topic and enrich the event by adding/concatenating more JSON key-value rows and publish to another topic.
None of the values or keys have anything in common by the way.
I am probably overthinking this, but how would I get around this logic?
I suppose you want to decode JSON message at the consumer side.
If you are not concerned about schema and but just want to deal with JSON as a Map, you can use Jackson library to read the JSON string as a Map<String,Object>. For this you can add the fields that you want, convert it back to a JSON string and push it to the new topic.
If you want to have a schema, you need to store the information as to which class it is mapping to or the JSON schema or some id that maps to this, then the following could work.
Store the schema info in headers
For example, you can store the JSON schema or Java class name in the headers of the message while producing and write a deserializer to extract that information from the headers and decode it.
The Deserializer#deserialize() has the Headers argument.
default T deserialize(java.lang.String topic,
Headers headers,
byte[] data)
and you can do something like..
objectMapper.readValue(data,
new Class.forName(
new String(headers.lastHeader("classname").value()
))
Use schema registry
Apart from these, there is also a schema registry from Confluent which can maintain different versions of the schema. You would need to run another process for that, though. If you are going to use this, you may want to look at the subject naming strategy and set it to RecordNameStrategy since you have multiple schemas in the same topic.

Processing json data from kafka using structured streaming

I want to convert incoming JSON data from Kafka into a dataframe.
I am using structured streaming with Scala 2.12
Most people add a hard coded schema, but if the json can have additional fields, it requires changing the code base every-time, which is tedious.
One approach is to write it into a file and infer it with but I rather avoid doing that.
Is there any other way to approach this problem?
Edit: Found a way to turn a json string into a dataframe but cant extract it from the stream source, it is possible to extract it?
One way is to store the schema itself in the message headers (not in the key or value).
Though, this increases message size, it will be easy to parse the JSON value without the need for any external resource like a file or a schema registry.
New messages can have new schemas while at the same time old messages can still be processed using their old schema itself, because the schema is within the message itself.
Alternatively, you can version the schemas and include an id for every schema in the message headers (or) a magic byte in the key or value and infer the schema from there.
This approach is followed by Confluent Schema registry. It allows you to basically go through different versions of same schema and see how your schema has evolved over time.
Read the data as string and then convert it to map[string,String], this way you can process the any json without even knowing its schema
based on JavaTechnical answer , the best approach would be to use a schema registry and
avro data instead of json, there is no going around hardcoding a schema (for now).
include your schema name and id as a header and use them to read the schema from the schema registry.
use the from_avro fucntion to turn that data into a df!

JSON with too many characters error

I am building an application that I am dumping the data from one table into a json object. When I try to use this json object via angular I dont see the data. So I took the data from the json object and ran it in a parser and was told that there are too many characters in my json object. What is this limit?
Part 2, if this is the case, any ideas on how I could break up the data? Thanks

Javascript in place of json input step

I am loading data from a mongodb collection to a mysql table through Kettle transformation.
First I extract them using MongodbInput and then I use json input step.
But since json input step has very low performance, I wanted to replace it with a
javacript script.
I am a beginner in Javascript and even though i tried somethings, the kettle javascript script is not recognizing any keywords.
can anyone give me sample code to convert Json data to different columns using javascript?
To solve your problem you need to see three aspects:
Reading from MongoDB
Reading from JSON
Reading from (probably) String
Reading from MongoDB Except if you changed the interface, MongoDB returns not JSON but BSON files (~binary JSON). You need to see the MongoDB documentation about reading and writing BSON: probably something like BSON.to() and BSON.from() but I don't know it by heart.
Reading from JSON Once you have your BSON in JSON format, you can read it using JSON.stringify() which returns a String.
Reading from (probably) String If you want to use the capabilities of JSON (why else would you use JSON?), you also want to use JSON.parse() which returns a JSON object.
My experience is that to send a JSON object from one step to the other, using a String is not a bad idea, i.e. at the end of a JavaScript step, you write your JSON object to a String and at the beginning of the next JavaScript step (can be further down the stream) you parse it back to JSON to work with it.
I hope this answers your question.
PS: writing JavaScript steps requires you to learn JavaScript. You don't have to be a master, but the basics are required. There is no way around it.
you could use the json input step to get the values of this json and put in common rows