Flatten a json string to a nested table in BigQuery - json

I have a json string and I managed to transpose the json string to the following table:
SQL I used and the outcome
However, there is sub-tree in the json string called 'activequeue' - it is an array, and with this way of SQL query I'm not getting a nested table attached to the column named 'activequeue' with columns such as 'id' etc.
Any ways I can achieve this?

Related

How to store dynamically generated JSON object in Big Query Table?

I have a use case to store dynamic JSON objects in a column in Big Query. The schema of the object is dynamically generated by the source and not known beforehand. The number of key value pairs in the object can differ as well, as shown below.
Example JSON objects:
{"Fruit":"Apple","Price":"10","Sale":"No"}
{"Movie":"Avatar","Genre":"Fiction"}
I could achieve the same in Hive by defining the column as map<string, string> object and I could query the data in the column like col_name["Fruit"] or col_name["Movie"] for that corresponding row.
Is there an equivalent way of above usage in Big Query? I came across 'RECORD' data type but the schema needs to be same for all the objects in the column.
Note: Storing the column as string datatype is not an option as the users need to query the data on the keys directly without parsing after retrieving the data.
Storing the data as a JSON string seems to be the only way to implement your requirement, at the moment. As a workaround, you can create a JavaScript UDF that parses the JSON string and extracts the necessary information. Below is a sample UDF.
CREATE TEMP FUNCTION extract_from_json(json STRING, key STRING)
RETURNS STRING
LANGUAGE js AS """
const obj = JSON.parse(json);
return obj[key];
""";
WITH json_table AS (
SELECT '{"Fruit":"Apple","Price":"10","Sale":"No"}' json_data UNION ALL
SELECT '{"Movie":"Avatar","Genre":"Fiction"}' json_data
)
SELECT extract_from_json(json_data, 'Movie') AS photos
FROM json_table
You can also check out the newly introduced JSON data type in BigQuery. The data type offers more flexibility when handling JSON data but please note that the data type is still in preview and is not recommended for production. You will have to enroll in this preview. For more information on working with JSON data, refer to this documentation.

how to extract value from a column which in json format using pyspark

I have a pyspark dataframe, where there is one column(quite long strings) in json string, which has many keys, where I am only interested in one key. May I know how to extract the value for that key?
here is the example of the string of the column userbehavior:
[{"num":"1234","Projections":"test", "intent":"test", "Mtime":11333.....}]
I wish to extract the value for "Mtime" only, i tried using:
user_hist_df=user_hist_df.select(get_json_object(user_hist_df.userbehavior, '$.Mtime').alias("Time"))
However it does not work.
You are almost right, it isn't working because your JSON is an array of objects. Just change to this:
get_json_object('userbehavior', '$[*].Mtime').alias("Time")
In order to extract from a json column you can use - from_json() and specify the schema
e.g. df = df.withColumn("parsed_col", from_json($"Body",MapType(StringType,StringType)))
Once you parse the json as per the schema - just extract the column as per your need
df = df.withColumn("col_1", col("parsed_col").getItem("col_1"))

how to select a part of a text column as new column in mysql query

I have a text column that save a json string on it.
I want to select specific element of json as new column and i do not want to change type of this column to json.
Is it possible?
How can i do that?
My table name is 'logs' and my column name is 'response' and my target element in JSON string is 'server_response_time'.
If you have a valid JSON string stored in a string column, you can directly use JSON functions on it. MySQL will happily convert it to JSON under the hood.
So:
with t as (select '{"foo": "bar", "baz": "zoo"}' col)
select col, col ->> '$.foo' as foo
from t
If your string is not valid JSON, this generates a runtime error. This is one of the reasons why I would still recommend storing your data as JSON rather than string: that way, data integrity is enforced at the time when your data is stored, rather than delayed until it is read.

Postgresql merge/aggregate json column and return json object as output

I am using Postgresql 9.6
I have a table, where my column is of type json.
create table test (
my_data json,
.
.
);
When queried, for each row the column is shown as json:
I want to aggregate the data, for sake of simplicity selected 2 columns only. I need to group by col1_data. I want like below:
I tried to use json_agg, but that will agregate and output as array of jsons.
select col1_data, json_agg(col2_data) as col2_data
from test
group by col1_data;
Can someone help to convert the array of json to json?
This should do it:
select col1_data, json_agg(col2_element) as col2_data
from test, json_array_elements(col2_data) as col2_element
group by col1_data;
Alternatively, you can write your own aggregate function that concatenates json arrays.

Complex Json schema into custom spark dataframe

Ok so I'm getting a big Json string from an API call, and I want to save some of that string into Cassandra. I'm trying to parse the Json string into a more table like structure, but with only some fields. The overall schema looks like this:
And I want my table structure using regnum, date and value fields.
With sqlContext.read.json(vals).select(explode('register) as 'reg).select("reg.#attributes.regnum","reg.data.date","reg.data.value").show I can get a table like this:
But as you can see date and value fields are arrays. I would like to have one element per record, and duplicate the corresponding regnum for each record. Any help is very much appreciated.
You can cast your DataFrame to Dataset then flatMap on it.
df.select("reg.#attributes.regnum","reg.data.date","reg.data.value")
.as[(Long, Array[String], Array[String])]
.flatMap(s => s._2.zip(s._3).map(p => (s._1, p._1, p._2)))