Loading Json data to Redshift using Spark results in null values

Loading Json data to Redshift using Spark results in null values - json

I am using this spark-redshift library where I want to load the data in either Array or Json format shared in this link querying-json-data-in-amazon-redshift.
In Dataframe, I am reading data from Mongodb & I am converting the dataframe to Json & then pushing to Redshift & it is able to load without any errors/Exceptions. But in Redshift,the column values are showing values as NULL.
I am using spark-mongodb connector & I want to know,how I can store the Mongodb data in Array or Json format ?? Is it possible to pull Mongodb data & put it in Redshift in either Array or Json format ?

Related

How to Access the data inside the JSON object. Some of my columns in pyspark dataframe is in JSON format

I was trying to access the data inside object. The table is in pyspark dataframe, but some table values are in JSON format. I need t access the date from it and convert into meaningful format. Any help or idea will be great relief.
This is thing i'm working out:
I was able to extract data which are in array format using:
data_df=df_deidentifieddocuments_tst.select("_id",explode("annotationId").alias("annotationId")).select("_id","annotationId.*")
The above code doesn't work for date, as its showing type mismatch error
AnalysisException: cannot resolve 'explode(createdAt)' due to data type mismatch: input to function explode should be array or map type, not struct<$date:bigint>;

SQL compilation error: JSON file format can produce one and only one column of type variant or object or array when copying from S3 to Snowflake

I have the following JSON stored in S3:
{"data":"this is a test for firehose"}
I have created the table test_firehose with a varchar column data, and a file_format called JSON with type JSON and the rest in default values. I want to copy the content from s3 to snowflake, and I have tried with the following statement:
COPY INTO test_firehose
FROM 's3://s3_bucket/firehose/2020/12/30/09/tracking-1-2020-12-30-09-38-46'
FILE_FORMAT = 'JSON';
And I receive the error:
SQL compilation error: JSON file format can produce one and only one column of type
variant or object or array. Use CSV file format if you want to load more than one column.
How could I solve this? Thanks

If you want to keep your data as JSON (rather than just as text) then you need to load it into a column with a datatype of VARIANT, not VARCHAR

JSON format in wordpress database

I need to import JSON data in a WP database. I found the correct table of the database but the example JSON data present in table is not in a normal JSON format.
I need to import:
{"nome": "Pippo","cognome": "Paperino"}
but the example data in the table is:
a:2:{s:4:"nome";s:5:"Pippo";s:7:"cognome";s:8:"Paperino";}
How i can convert my JSON to "WP JSON"?

The data is serialized, that's why it looks weird. You can use maybe_unserialize() in WordPress, this function will unserialize the data if it was serialized.
https://developer.wordpress.org/reference/functions/maybe_unserialize/
Some functions serialize data before saving it in wordpress, and some will also unserialize when pulling from the DB. So depending on how you save it, and how you later extract the data, you might end up with serialized data.

Loading JSON data into hive tables

I've tried loading simple JSON records from a file into hive tables like as shown below. Each JSON record is in a separate line.
{"Industry":"Manufacturing","Phone":null,"Id":null,"type":"Account","Name":"Manufacturing"}
{"Industry":null,"Phone":"(738) 244-5566","Id":null,"type":"Account","Name":"Sales"}
{"Industry":"Government","Phone":null,"Id":null,"type":"Account","Name":"Kansas City Brewery & Co"}
But I couldn't find any serde to load the array of comma (,) separated JSON records into the hive table. Input is a file containing JSON records as shown below...
[{"Industry":"Manufacturing","Phone":null,"Id":null,"type":"Account","Name":"Manufacturing"},{"Industry":null,"Phone":"(738) 244-5566","Id":null,"type":"Account","Name":"Sales"},{"Industry":"Government","Phone":null,"Id":null,"type":"Account","Name":"Kansas City Brewery & Co"}]
Can someone suggest me a serde which can parse this JSON file?
Thanks

You can check this serde : https://github.com/rcongiu/Hive-JSON-Serde
Another related post : Parse json arrays using HIVE

How to serialise a spark sql row when it contains a comma

I am using Spark Jobserver https://github.com/spark-jobserver/spark-jobserver and Apache Spark for some analytic processing.
I am receiving back the following structure from jobserver when a job finishes
"status": "OK",
"result": [
"[17799.91015625,null,hello there how areyou?]",
"[50000.0,null,Hi, im fine]",
"[0.0,null,All good]"
]
The result doesnt contain valid json, as explained here:
https://github.com/spark-jobserver/spark-jobserver/issues/176
So I'm trying to convert the returned structure into a json structure, however I cant simply make the result string insert ' (single quotes) based on the comma delimiter, as sometimes the result contains a comma itself.
How can i convert a spark Sql row into a json object in the above situation?

I actually found a better way in the end,
from 1.3.0 onwards you can use .toJSON on a Dataframe to convert it to json
df.toJSON.collect()
to output a dataframes schema to json you can use
df.schema.json

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Loading Json data to Redshift using Spark results in null values - json

Related

How to Access the data inside the JSON object. Some of my columns in pyspark dataframe is in JSON format

SQL compilation error: JSON file format can produce one and only one column of type variant or object or array when copying from S3 to Snowflake

JSON format in wordpress database

Loading JSON data into hive tables

How to serialise a spark sql row when it contains a comma

Categories

Resources