How to remove {} and [] from json column postgreSQL - json

I have column in postgreSQL with json data type. Until today there were not row which contained {} or [].
However, I start to see {} and [] due to new implementation. I want to remove it.
Example: Following is my table looks like. json is json data type
id | json
----+------------------
a | {"st":[{"State": "TX", "Value":"0.02"}, {"State": "CA", "Value":"0.2" ...
----+------------------
b | {"st":[{"State": "TX", "Value":"0.32"}, {"State": "CA", "Value":"0.47" ...
----+------------------
d | {}
----+------------------
e | []
Where I want as following:
id | json
----+------------------
a | {"st":[{"State": "TX", "Value":"0.02"}, {"State": "CA", "Value":"0.2" ...
----+------------------
b | {"st":[{"State": "TX", "Value":"0.32"}, {"State": "CA", "Value":"0.47" ...
How I should able to do it ?
I have writen following query:
SELECT *
FROM tableA
WHERE json::text <> '[]'::text
Where I am able to filter empty elements which starts with {}. but still seeing [].

Very easy, just select all rows that don't contain those values:
SELECT *
FROM tableA
WHERE json :: text NOT IN ('{}', '[]')

Related

Pyspark: How to create a nested Json by adding dynamic prefix to each column based on a row value

I have a dataframe in below format.
Input:
id
Name_type
Name
Car
1
First
rob
Nissan
2
First
joe
Hyundai
1
Last
dent
Infiniti
2
Last
Kent
Genesis
need to transform into a json column by appending a row value below format for a given key column as shown below.
Result expected:
id
json_column
1
{"First_Name":"rob","First_*Car", "Nissan","Last_Name":"dent","Last_Car", "Infiniti"}
2
{"First_Name":"joe","First_Car", "Hyundai","Last_Name":"kent","Last_Car", "Genesis"}
with below piece of code
column_set = ['Name','Car']
df = df.withColumn("json_data", to_json(struct(\[df\[x\] for x in column_set\])))
I was able to generate data as
id
Name_type
Json_data
1
First
{"Name":"rob", "Car": "Nissan"}
2
First
{"Name":"joe", "Car": "Hyundai"}
1
Last
{"Name":"dent", "Car": "infiniti"}
2
Last
{"Name":"kent", "Car": "Genesis"}
I was able to create a json column using to_json for a given row.
But not able to figure out how to append the row value to a column name and convert to nested json for a given key column.
To do what you want, you first need to manipulate your input dataframe a little bit. You can do this by grouping by the id column, and pivoting around the Name_type column like so:
from pyspark.sql.functions import first
df = spark.createDataFrame(
[
("1", "First", "rob", "Nissan"),
("2", "First", "joe", "Hyundai"),
("1", "Last", "dent", "Infiniti"),
("2", "Last", "Kent", "Genesis")
],
["id", "Name_type", "Name", "Car"]
)
output = df.groupBy("id").pivot("Name_type").agg(first("Name").alias('Name'), first("Car").alias('Car'))
output.show()
+---+----------+---------+---------+--------+
| id|First_Name|First_Car|Last_Name|Last_Car|
+---+----------+---------+---------+--------+
| 1| rob| Nissan| dent|Infiniti|
| 2| joe| Hyundai| Kent| Genesis|
+---+----------+---------+---------+--------+
Then you can use the exact same code as what you used to get your wanted result, but using 4 columns instead of 2:
from pyspark.sql.functions import to_json, struct
column_set = ['First_Name','First_Car', 'Last_Name', 'Last_Car']
output = output.withColumn("json_data", to_json(struct([output[x] for x in column_set])))
output.show(truncate=False)
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+
|id |First_Name|First_Car|Last_Name|Last_Car|json_data |
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+
|1 |rob |Nissan |dent |Infiniti|{"First_Name":"rob","First_Car":"Nissan","Last_Name":"dent","Last_Car":"Infiniti"}|
|2 |joe |Hyundai |Kent |Genesis |{"First_Name":"joe","First_Car":"Hyundai","Last_Name":"Kent","Last_Car":"Genesis"}|
+---+----------+---------+---------+--------+----------------------------------------------------------------------------------+

SQL json_extract returns null

I am attempting to extract from my json object
hits = [{“title”: “Facebook”,
“domain”: “facebook.com”},
{“title”: “Linkedin”,
“domain”: “linkedin.com”}]
When I use:
json_extract(hits,'$.title') as title,
nothing is returned. I would like the result to be: [Facebook, Linkedin].
However, when I extract by a scalar value, ex.:
json_extract_scalar(hits,'$[0].title') as title,
it works and Facebook is returned.
hits contains a lot of values, so I need to use json_extract in order to get all of them, so I can't do each scalar individually. Any suggestions to fix this would be greatly appreciated.
I get INVALID_FUNCTION_ARGUMENT: Invalid JSON path: '$.title' as an error for $.title (double stars). When I try unnest I get INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar as an error and INVALID_FUNCTION_ARGUMENT: Cannot unnest type: json. I get SYNTAX_ERROR: line 26:19: Column '$.title' cannot be resolved when I try double quotes
Correct json path to exract all titles is $.[*].title (or $.*.title), though it is not supported by athena. One option is to cast your json to array of json and use transform on it:
WITH dataset AS (
SELECT * FROM (VALUES
(JSON '[{"title": "Facebook",
"domain": "facebook.com"},
{"title": "Linkedin",
"domain": "linkedin.com"}]')
) AS t (json_string))
SELECT transform(cast(json_string as ARRAY(JSON)), js -> json_extract_scalar(js, '$.title'))
FROM dataset
Output:
_col0
[Facebook, Linkedin]
Fits you have an array. So $.title doesn't exist see below
Second, you have not a valid json, is must have double quotes " like the example shows
SET #a := '[{
"title": "Facebook",
"domain": "facebook.com"
},
{
"title": "Linkedin",
"domain": "linkedin.com"
}
]'
SELECT json_extract(#a,'$[0]') as title
| title |
| :---------------------------------------------- |
| {"title": "Facebook", "domain": "facebook.com"} |
SELECT JSON_EXTRACT(#a, "$[0].title") AS 'from'
| from |
| :--------- |
| "Facebook" |
SELECT #a
| #a |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [{<br> "title": "Facebook",<br> "domain": "facebook.com"<br> },<br> {<br><br> "title": "Linkedin",<br> "domain": "linkedin.com"<br> }<br>] |
db<>fiddle here

how to extract nested json using sqlite json-extract

How could I extract nested json using sqlite json-extract or other sqlite json command ?
Here I'd like to extract given_id
"invoices": [{
........
"items": [{
"given_id": "TBC0003B",
...
}
]
}
]
Thanks.
In SQLite you can use json_extract() as follows:
select json_extract(my_json_col, '$.invoices[0].items[0].given_id') my_given_id from mytable
This gives you the given_id attribute of the first element of the items array under first element of the invoices array.
Demo on DB Fiddle:
with mytable as (select '{
"invoices": [{
"items": [{ "given_id": "TBC0003B" }]
}]
}' my_json_col)
select json_extract(my_json_col, '$.invoices[0].items[0].given_id') my_given_id from mytable
| my_given_id |
| :---------- |
| TBC0003B |

Postgres 9.4: Include sibling column in jsonb array on SELECT

If I have a table like this:
office_id int
employees jsonb
and the data looks something like this:
1
[{ "name" : "John" }, { "name" : "Jane" }]
Is there an easy way to query so that the results look like this:
office_id,employees
1,[{ "name" : "John", "office_id" : 1 }, { "name" : "Jane", "office_id" : 1 }]
For example data, check out this sqlfiddle: http://sqlfiddle.com/#!15/ac37b/1/0
The results should actually look like this:
id employees
1 [{ "name" : "John", "office_id" : 1 }, { "name" : "Jane", "office_id" : 1 }]
2 [{ "name" : "Jamal", "office_id" : 1 }]
I've been reading through the json functions and it seems like it's possible, but I can't seem to figure it out. I would rather not have to store the office_id on each nested object.
Note: This is similar to my other question about jsonb arrays, but the desired output is different.
I'm not sure if you are selecting from a Postgres table or a json object table. Doing a normal query and converting it to json can be done with json_agg().
Here is a normal query:
ao_db=# SELECT * FROM record.instance;
id | created_by | created_on | modified_by | modified_on
--------------------------------------+------------+-------------------------------+-------------+-------------------------------
18d8ca56-87b6-11e5-9c15-48d22415d991 | sysop | 2015-11-10 23:19:47.181026+09 | sysop | 2015-11-10 23:19:47.181026+09
190a0e86-87b6-11e5-9c15-48d22415d991 | sysop | 2015-11-10 23:19:47.56517+09 | sysop | 2015-11-10 23:19:47.56517+09
57611c9c-87b6-11e5-8c4b-48d22415d991 | admin | 2015-11-10 23:21:32.399775+09 | admin | 2015-11-10 23:22:27.975266+09
(3 行)
Here is the same query passed through json_agg():
ao_db=# WITH j AS (SELECT * FROM record.instance) SELECT json_agg(j) FROM j;
json_agg
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"id":"18d8ca56-87b6-11e5-9c15-48d22415d991","created_by":"sysop","created_on":"2015-11-10T23:19:47.181026+09:00","modified_by":"sysop","modified_on":"2015-11-10T23:19:47.181026+09:00"}, +
{"id":"190a0e86-87b6-11e5-9c15-48d22415d991","created_by":"sysop","created_on":"2015-11-10T23:19:47.56517+09:00","modified_by":"sysop","modified_on":"2015-11-10T23:19:47.56517+09:00"}, +
{"id":"57611c9c-87b6-11e5-8c4b-48d22415d991","created_by":"admin","created_on":"2015-11-10T23:21:32.399775+09:00","modified_by":"admin","modified_on":"2015-11-10T23:22:27.975266+09:00"}]

Create DF/RDD from nested other DF/RDD (Nested Json) in Spark

I'm a total newbie in Spark&Scala stuff, it would be great if someone could explain this to me.
Let's take following JSON
{
"id": 1,
"persons": [{
"name": "n1",
"lastname": "l1",
"hobbies": [{
"name": "h1",
"activity": "a1"
},
{
"name": "h2",
"activity": "a2"
}]
},
{
"name": "n2",
"lastname": "l2",
"hobbies": [{
"name": "h3",
"activity": "a3"
},
{
"name": "h4",
"activity": "a4"
}]
}]
}
I'm loading this Json to RDD via sc.parralelize(file.json) and to DF via sqlContext.sql.load.json(file.json). So far so good, this gives me RDD and DF (with schema) for mentioned Json, but I want to create annother RDD/DF from existing one that contains all distinct "hobbies" records. How can I achieve sth like that?
The only things I get from my operations are multiple WrappedArrays for Hobbies but I cannot go deeper nor assign them to DF/RDD.
Code for SqlContext I have so far
val jsonData = sqlContext.read.json("path/file.json")
jsonData.registerTempTable("jsonData") //I receive schema for whole file
val hobbies = sqlContext.sql("SELECT persons.hobbies FROM jasonData") //subschema for hobbies
hobbies.show()
That leaves me with
+--------------------+
| hobbies|
+--------------------+
|[WrappedArray([a1...|
+--------------------+
What I expect is more like:
+--------------------+-----------------+
| name | activity |
+--------------------+-----------------|
| h1| a1 |
+--------------------+-----------------+
| h2| a2 |
+--------------------+-----------------+
| h3| a3 |
+--------------------+-----------------+
| h4| a4 |
+--------------------+-----------------+
I loaded your example into the dataframe hobbies exactly as you do it and worked with it. You could run something like the following:
val distinctHobbies = hobbies.rdd.flatMap {row => row.getSeq[List[Row]](0).flatten}.map(row => (row.getString(0), row.getString(1))).distinct
val dhDF = distinctHobbies.toDF("activity", "name")
This essentially flattens your hobbies struct, transforms it into a tuple, and runs a distinct on the returned tuples. We then turn it back into a dataframe under the correct column aliases. Because we are doing this through the underlying RDD, there may also be a more efficient way to do it using just the DataFrame API.
Regardless, when I run on your example, I see:
scala> val distinctHobbies = hobbies.rdd.flatMap {row => row.getSeq[List[Row]](0).flatten}.map(row => (row.getString(0), row.getString(1))).distinct
distinctHobbies: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[121] at distinct at <console>:24
scala> val dhDF = distinctHobbies.toDF("activity", "name")
dhDF: org.apache.spark.sql.DataFrame = [activity: string, name: string]
scala> dhDF.show
...
+--------+----+
|activity|name|
+--------+----+
| a2| h2|
| a1| h1|
| a3| h3|
| a4| h4|
+--------+----+