I have some json data which i am getting from a particular API. i am using postgresql as a db. What the best way to store the json data? Using row column format or saving the complete json data in a single field of jsonb type
From my experience with Django using postgresql: I am used to store the raw json in a single field.
Next, I parse it according to my needs.
Related
I am trying to implement a Snowflake Sink connector so that can i load messages coming to a Kafka topic directly to an appropriate Snowflake table. So far, I could only get to the point of loading the raw json into a table with two columns (RECORD_METADATA and RECORD_CONTENT). My goal would be to directly load the json messages into an appropriate table by flattening them. I have the structure of what the table should be, so I could create a table and directly load into that. But I need a way for the load process to flatten the messages.
I have been looking online and through the documentation, but haven't found a clear way to do this.
Is it possible or do I have to first load the raw json and then do transformations to get the table that I want?
Thanks
You have to load the raw JSON first then you can do transformations.
Each Kafka message is passed to Snowflake in JSON format or Avro format. The Kafka connector stores that formatted information in a single column of type VARIANT. The data is not parsed, and the data is not split into multiple columns in the Snowflake table.
For more information you can read here
I can't seem to find out if this is implemented in mysql? I can only find information relating to postgresql.
So, can you use JSONB in mysql or is it just JSON?
The main difference between the json and jsonb types in Postgres is that the latter is stored in a compressed binary format. From the MySQL documentation, it appears that MySQL's JSON type already has at least some of the behavior of Postgres' jsonb:
The JSON data type provides these advantages over storing JSON-format strings in a string column:
Optimized storage format. JSON documents stored in JSON columns are converted to an internal format that permits quick read access to document elements. When the server later must read a JSON value stored in this binary format, the value need not be parsed from a text representation. The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index without reading all values before or after them in the document.
If I recall correctly, the MySQL JSON functions will still work correctly on JSON text (e.g. stored as varchar), so maybe MySQL's analogy to Postgres' json would just be storing JSON content as plain text.
I have a lot of data (JSON string) per day (around 150-200B).
I want to insert the JSON to Hadoop, what is the best way to do it (I need a fast insert and a fast query on JSON fields)?
Do I need to use hive and create Avro scheme to my JSON? Or do I need to insert the JSON as a string to a specific column?
If you want to make the data available in Hive to perform mostly aggregations on top of it, I would suggest 1 of the following method using spark.
If you have multiple-line json files
var df = spark.read.json(sc.wholeTextFiles("hdfs://ypur/hdfs/path/*.json").values)
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
If you have single-line json files
val df = spark.read.json("hdfs://ypur/hdfs/path/*.json")
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
Spark will automatically infer the table schema for you. If you are using cloudera distribution you will be able to read the data using impala (depending on your cloudera version it may not support complex structures)
I want to insert the JSON to Hadoop
You just put it in HDFS... Since you have data over a time period, you'll want to create partitions for Hive to read
jsondata/dt=20180619/foo.json
jsondata/dt=20180620/bar.json
Do I need to use hive and create Avro scheme to my JSON?
Nope. Not sure where you got mixed up between Avro and JSON. Now, if you could convert the JSON into defined Avro with a schema, then that would help improve Hive queries since querying structured binary is better than parsing JSON text.
do I need to insert the JSON as a string to a specific column?
Not recommended. You could, but then you cannot query it, via Hive's JSON Serde support
Don't forget with the above structure you'll need PARTITIONED BY (dt STRING). And in order for partitions to be created on the table for existing files, you'll need to manually (and daily) run an MSCK REPAIR TABLE command
i have JSON as string (from kafka)
Don't use Spark for that (at least, don't reinvent the wheel). My suggestion would be to use Confluent's HDFS Kafka Connect that comes with Hive table creation support.
One of our project, we have to get the data from Cassandra tables and populate it in JSON format in response. What are the possible ways to do it for the same? Some time, we require to get the data from more than one Cassandra table. What are possible ways available for the same
especially what are the ways to connect Cassandra?
You can query your data and retrieve a JSON string with the following type of queries:
SELECT JSON keyspace_name, durable_writes FROM system_schema.keyspaces ;
This will return you a json string that maps the keys (column name) with the corresponding value.
See the doc here: http://cassandra.apache.org/doc/latest/cql/json.html
Then you could re-insert the json string in Cassandra, if that's what you want.
If you need to do that at scale, or as a streaming job, you would want to look at using Spark on top of Cassandra: Load your Cassandra data into spark, use spark to transform that into a JSON string, and reinsert into Cassandra or another db.
I am new to PostgreSQL database(9.5.0), I need to store my json data in PostgreSQL database, so that I need to create a table for it, how can I create a table for it to store my json object on clicking of submit button from my front-end, so that it will be stored in created database table and my sample josn object is as follows(having key/value pairs, contains files also), Please help me.
{"key1":"value1","key2":"value2","key3":"value3","key4_file_name":"Test.txt"}
PostgreSQL has the json and jsonb (b for binary) data types. You can use these in your table definitions just like any other data type. If you plan to merely store the json data then the json data type is best. If, on the other hand, you plan to do a lot of analysis with the json data inside the PG server, then the jsonb data type is best.