I am using this spark-redshift library where I want to load the data in either Array or Json format shared in this link querying-json-data-in-amazon-redshift.
In Dataframe, I am reading data from Mongodb & I am converting the dataframe to Json & then pushing to Redshift & it is able to load without any errors/Exceptions. But in Redshift,the column values are showing values as NULL.
I am using spark-mongodb connector & I want to know,how I can store the Mongodb data in Array or Json format ?? Is it possible to pull Mongodb data & put it in Redshift in either Array or Json format ?
Related
I was trying to access the data inside object. The table is in pyspark dataframe, but some table values are in JSON format. I need t access the date from it and convert into meaningful format. Any help or idea will be great relief.
This is thing i'm working out:
I was able to extract data which are in array format using:
data_df=df_deidentifieddocuments_tst.select("_id",explode("annotationId").alias("annotationId")).select("_id","annotationId.*")
The above code doesn't work for date, as its showing type mismatch error
AnalysisException: cannot resolve 'explode(createdAt)' due to data type mismatch: input to function explode should be array or map type, not struct<$date:bigint>;
I have the following JSON stored in S3:
{"data":"this is a test for firehose"}
I have created the table test_firehose with a varchar column data, and a file_format called JSON with type JSON and the rest in default values. I want to copy the content from s3 to snowflake, and I have tried with the following statement:
COPY INTO test_firehose
FROM 's3://s3_bucket/firehose/2020/12/30/09/tracking-1-2020-12-30-09-38-46'
FILE_FORMAT = 'JSON';
And I receive the error:
SQL compilation error: JSON file format can produce one and only one column of type
variant or object or array. Use CSV file format if you want to load more than one column.
How could I solve this? Thanks
If you want to keep your data as JSON (rather than just as text) then you need to load it into a column with a datatype of VARIANT, not VARCHAR
I need to import JSON data in a WP database. I found the correct table of the database but the example JSON data present in table is not in a normal JSON format.
I need to import:
{"nome": "Pippo","cognome": "Paperino"}
but the example data in the table is:
a:2:{s:4:"nome";s:5:"Pippo";s:7:"cognome";s:8:"Paperino";}
How i can convert my JSON to "WP JSON"?
The data is serialized, that's why it looks weird. You can use maybe_unserialize() in WordPress, this function will unserialize the data if it was serialized.
https://developer.wordpress.org/reference/functions/maybe_unserialize/
Some functions serialize data before saving it in wordpress, and some will also unserialize when pulling from the DB. So depending on how you save it, and how you later extract the data, you might end up with serialized data.
I've tried loading simple JSON records from a file into hive tables like as shown below. Each JSON record is in a separate line.
{"Industry":"Manufacturing","Phone":null,"Id":null,"type":"Account","Name":"Manufacturing"}
{"Industry":null,"Phone":"(738) 244-5566","Id":null,"type":"Account","Name":"Sales"}
{"Industry":"Government","Phone":null,"Id":null,"type":"Account","Name":"Kansas City Brewery & Co"}
But I couldn't find any serde to load the array of comma (,) separated JSON records into the hive table. Input is a file containing JSON records as shown below...
[{"Industry":"Manufacturing","Phone":null,"Id":null,"type":"Account","Name":"Manufacturing"},{"Industry":null,"Phone":"(738) 244-5566","Id":null,"type":"Account","Name":"Sales"},{"Industry":"Government","Phone":null,"Id":null,"type":"Account","Name":"Kansas City Brewery & Co"}]
Can someone suggest me a serde which can parse this JSON file?
Thanks
You can check this serde : https://github.com/rcongiu/Hive-JSON-Serde
Another related post : Parse json arrays using HIVE
I am using Spark Jobserver https://github.com/spark-jobserver/spark-jobserver and Apache Spark for some analytic processing.
I am receiving back the following structure from jobserver when a job finishes
"status": "OK",
"result": [
"[17799.91015625,null,hello there how areyou?]",
"[50000.0,null,Hi, im fine]",
"[0.0,null,All good]"
]
The result doesnt contain valid json, as explained here:
https://github.com/spark-jobserver/spark-jobserver/issues/176
So I'm trying to convert the returned structure into a json structure, however I cant simply make the result string insert ' (single quotes) based on the comma delimiter, as sometimes the result contains a comma itself.
How can i convert a spark Sql row into a json object in the above situation?
I actually found a better way in the end,
from 1.3.0 onwards you can use .toJSON on a Dataframe to convert it to json
df.toJSON.collect()
to output a dataframes schema to json you can use
df.schema.json