One of our project, we have to get the data from Cassandra tables and populate it in JSON format in response. What are the possible ways to do it for the same? Some time, we require to get the data from more than one Cassandra table. What are possible ways available for the same
especially what are the ways to connect Cassandra?
You can query your data and retrieve a JSON string with the following type of queries:
SELECT JSON keyspace_name, durable_writes FROM system_schema.keyspaces ;
This will return you a json string that maps the keys (column name) with the corresponding value.
See the doc here: http://cassandra.apache.org/doc/latest/cql/json.html
Then you could re-insert the json string in Cassandra, if that's what you want.
If you need to do that at scale, or as a streaming job, you would want to look at using Spark on top of Cassandra: Load your Cassandra data into spark, use spark to transform that into a JSON string, and reinsert into Cassandra or another db.
Related
I am trying to implement a Snowflake Sink connector so that can i load messages coming to a Kafka topic directly to an appropriate Snowflake table. So far, I could only get to the point of loading the raw json into a table with two columns (RECORD_METADATA and RECORD_CONTENT). My goal would be to directly load the json messages into an appropriate table by flattening them. I have the structure of what the table should be, so I could create a table and directly load into that. But I need a way for the load process to flatten the messages.
I have been looking online and through the documentation, but haven't found a clear way to do this.
Is it possible or do I have to first load the raw json and then do transformations to get the table that I want?
Thanks
You have to load the raw JSON first then you can do transformations.
Each Kafka message is passed to Snowflake in JSON format or Avro format. The Kafka connector stores that formatted information in a single column of type VARIANT. The data is not parsed, and the data is not split into multiple columns in the Snowflake table.
For more information you can read here
I have some json data which i am getting from a particular API. i am using postgresql as a db. What the best way to store the json data? Using row column format or saving the complete json data in a single field of jsonb type
From my experience with Django using postgresql: I am used to store the raw json in a single field.
Next, I parse it according to my needs.
I have a lot of data (JSON string) per day (around 150-200B).
I want to insert the JSON to Hadoop, what is the best way to do it (I need a fast insert and a fast query on JSON fields)?
Do I need to use hive and create Avro scheme to my JSON? Or do I need to insert the JSON as a string to a specific column?
If you want to make the data available in Hive to perform mostly aggregations on top of it, I would suggest 1 of the following method using spark.
If you have multiple-line json files
var df = spark.read.json(sc.wholeTextFiles("hdfs://ypur/hdfs/path/*.json").values)
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
If you have single-line json files
val df = spark.read.json("hdfs://ypur/hdfs/path/*.json")
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
Spark will automatically infer the table schema for you. If you are using cloudera distribution you will be able to read the data using impala (depending on your cloudera version it may not support complex structures)
I want to insert the JSON to Hadoop
You just put it in HDFS... Since you have data over a time period, you'll want to create partitions for Hive to read
jsondata/dt=20180619/foo.json
jsondata/dt=20180620/bar.json
Do I need to use hive and create Avro scheme to my JSON?
Nope. Not sure where you got mixed up between Avro and JSON. Now, if you could convert the JSON into defined Avro with a schema, then that would help improve Hive queries since querying structured binary is better than parsing JSON text.
do I need to insert the JSON as a string to a specific column?
Not recommended. You could, but then you cannot query it, via Hive's JSON Serde support
Don't forget with the above structure you'll need PARTITIONED BY (dt STRING). And in order for partitions to be created on the table for existing files, you'll need to manually (and daily) run an MSCK REPAIR TABLE command
i have JSON as string (from kafka)
Don't use Spark for that (at least, don't reinvent the wheel). My suggestion would be to use Confluent's HDFS Kafka Connect that comes with Hive table creation support.
I am new to PostgreSQL database(9.5.0), I need to store my json data in PostgreSQL database, so that I need to create a table for it, how can I create a table for it to store my json object on clicking of submit button from my front-end, so that it will be stored in created database table and my sample josn object is as follows(having key/value pairs, contains files also), Please help me.
{"key1":"value1","key2":"value2","key3":"value3","key4_file_name":"Test.txt"}
PostgreSQL has the json and jsonb (b for binary) data types. You can use these in your table definitions just like any other data type. If you plan to merely store the json data then the json data type is best. If, on the other hand, you plan to do a lot of analysis with the json data inside the PG server, then the jsonb data type is best.
is it possible to read mongodb data with hadoop connector but save output as mysql data table. So I want to read some data from mongodb collection by hadoop, processing it with hadoop and outputing it NOT already in mongodb but as MYSQL.
I used like, fetching data from mongodb as input and store result in different mongodb address. For that one you need to specify like
MongoConfigUtil.setInputURI(discussConf,"mongodb://ipaddress1/Database.Collection");
MongoConfigUtil.setOutputURI(discussConf,"mongodb://ipaddress2/Database.Collection");
for mongodb to mysql
my suggestion is , you can write normal java code to insert whatever data you need to insert in mysql . that code may be in reduce or map function