Create table structure in postgresql from json file - json

I would like to know if there is a way to create a table structure in postgresql using a JSON file. The story is that I exported JSON data that uses a schema from mongoDB, now I would like to create the same structure (the schema in mongo) in a table in postgresql so then I can import the data from that JSON file into it. Is there a way to do that base on the JSON file, or should I just create the table and its structure myself using the postgres JSON type and then import the data? I'm just looking for opinions, suggestions or articles that could be related to this, any help would be really appreciate it. Thanks.

Related

Insert JSON into Hadoop

I have a lot of data (JSON string) per day (around 150-200B).
I want to insert the JSON to Hadoop, what is the best way to do it (I need a fast insert and a fast query on JSON fields)?
Do I need to use hive and create Avro scheme to my JSON? Or do I need to insert the JSON as a string to a specific column?
If you want to make the data available in Hive to perform mostly aggregations on top of it, I would suggest 1 of the following method using spark.
If you have multiple-line json files
var df = spark.read.json(sc.wholeTextFiles("hdfs://ypur/hdfs/path/*.json").values)
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
If you have single-line json files
val df = spark.read.json("hdfs://ypur/hdfs/path/*.json")
df.write.format("parquet").mode("overwrite").saveAsTable("yourhivedb.tablename")
Spark will automatically infer the table schema for you. If you are using cloudera distribution you will be able to read the data using impala (depending on your cloudera version it may not support complex structures)
I want to insert the JSON to Hadoop
You just put it in HDFS... Since you have data over a time period, you'll want to create partitions for Hive to read
jsondata/dt=20180619/foo.json
jsondata/dt=20180620/bar.json
Do I need to use hive and create Avro scheme to my JSON?
Nope. Not sure where you got mixed up between Avro and JSON. Now, if you could convert the JSON into defined Avro with a schema, then that would help improve Hive queries since querying structured binary is better than parsing JSON text.
do I need to insert the JSON as a string to a specific column?
Not recommended. You could, but then you cannot query it, via Hive's JSON Serde support
Don't forget with the above structure you'll need PARTITIONED BY (dt STRING). And in order for partitions to be created on the table for existing files, you'll need to manually (and daily) run an MSCK REPAIR TABLE command
i have JSON as string (from kafka)
Don't use Spark for that (at least, don't reinvent the wheel). My suggestion would be to use Confluent's HDFS Kafka Connect that comes with Hive table creation support.

Dataframes reading json files with changing schema

I am currently reading json files which have variable schema in each file. We are using the following logic to read json - first we read the base schema which has all fields and then read the actual data. We are using this approach because the schema is understood based on the first file read, but we are not getting all the fields in the first file it self. So just tricking the code to understand the schema first and then start reading the actual data.
rdd=sc.textFile(baseSchemaWithAllColumns.json).union(pathToActualFile.json)
sqlContext.read.json(rdd)
//Create dataframe and then save as temp table and query
I know the above is just work around and we need a cleaner solution to accept json files with varying schema.
I understand that there are two other ways to understand schema as mentioned here
However, for that it looks like we need to parse the json and map each field to the data received.
There seems to be an option for parquet schema merger, but that looks like mostly at the reading from the dataframe - or am I missing something here.
What is the best way to read a changing schema of json files and work with Spark SQL for querying.
Can I just read the json file as is and save as temp table and then use mergeSchema=true while querying

How to create table to store json object data in PostgreSQL database?

I am new to PostgreSQL database(9.5.0), I need to store my json data in PostgreSQL database, so that I need to create a table for it, how can I create a table for it to store my json object on clicking of submit button from my front-end, so that it will be stored in created database table and my sample josn object is as follows(having key/value pairs, contains files also), Please help me.
{"key1":"value1","key2":"value2","key3":"value3","key4_file_name":"Test.txt"}
PostgreSQL has the json and jsonb (b for binary) data types. You can use these in your table definitions just like any other data type. If you plan to merely store the json data then the json data type is best. If, on the other hand, you plan to do a lot of analysis with the json data inside the PG server, then the jsonb data type is best.

What is a standard way of importing XML data into MySQL

I did some research on how to import XML data into MySQL possibly with the Workbench.
However, I was unable to find any easy tutorial how to do that. I have 6 XML files and all contain data, no schema.
From what I understood, the process consists of 2 parts:
1.Making the table (this is the part which is unclear to me) - is there a way to make the table from only XML data file?
2.Importing the data to the MySQL table. I think I understand this one, it could be done by executing this query:
LOAD XML LOCAL INFILE '/pathtofile/file.xml'
INTO TABLE my_tablename(personal_number, firstname, ...);
I've done it before where I read the XML files into a MySQL database where the data type was set to BLOB.

Convert database table structure to XSD format

Is there any way i can convert a table struture in a MySQL or Oracle database to XSD (XML Schema Definition) format ?.
Thank You.
use XML Spy.
http://williamjxj.wordpress.com/2011/05/25/1004/
Yes, but it's fairly complicated. You'll want to run the query SHOW CREATE TABLE <tablename> and it will return the full table creation statement (in tidy CREATE TABLE syntax).
Then you'll want to parse each line of the create table syntax using your language. Thankfully the fields are neatly separated by newlines.
The types should be fairly easy to map to XSD types.
Where it gets complicated is when you're parsing foreign key relationships - then you'll need to define custom types in your XSD and reference them accordingly.
It really comes down to your implementation. If you're looking for a portable data format that you can easily import/export from your database then there are a number of other solutions.