MySQL: get all rows and separate by column value - mysql

Hi I am still new to working with databases and I need a method to get all rows and separate each column value into array of rows.
For example:
id col1 col2 ..
1 type1 abc ..
2 type1 def ..
3 type2 ghi ..
4 type3 jkl ..
Grouping by col1, I am looking for my output to be [ [ {1}, {2} ], [ {3} ], [ {4} ] ] with the numbers representing the entire row. Any suggestions on an efficient way to do this without needing to parse the output?
Edit:
Expected Output:
{
[ {id:1, col1:type1, col2:abc, ...} , {id:2, col1:type1, col2:def, ...} ],
[ {id:3, col1:type2, col2:ghi, ...} ],
[ {id:4, col1:type3, col2:jkl, ...} ]
]

Related

Oracle: How to load hierachical JSON data into related tables

I have got JSON data in a JSON column of an Oracle database. The JSON contains arrays and I should load the data into tables by means of PL/SQL. E. g.
{
"A1": "V1",
"A2": true,
"A3": 42,
"A4": [
{
"B1": "Q1",
"B4": [
{
"C1": "R1",
"C2": false
},
{
"C1": "R2",
"C2": false
}
]
},
{
"B1": "Q2",
"B4": [
{
"C1": "R3",
"C2": false
},
{
"C1": "R4",
"C2": true
}
]
}
]
}
{
"A1": "V2",
"A2": false,
"A3": 42,
"A4": [
{
"B1": "T1",
"B4": [
{
"C1": "S1",
"C2": false
},
{
"C1": "S2",
"C2": false
}
]
},
{
"B1": "T2",
"B4": [
{
"C1": "S3",
"C2": false
},
{
"C1": "S4",
"C2": true
}
]
}
]
}
The data should be loaded to three tables.
---
A
---
| 0..1
|
| 1..n
---
B (no (business) key available)
---
| 0..1
|
| 1..n
---
C
---
ID
A1
A2
A3
1
V1
true
42
2
V2
false
42
ID
ID A
B1
1
1
Q1
2
1
Q2
3
2
T1
4
2
T2
ID
ID B
C1
C2
1
1
R1
false
2
1
R2
false
3
1
R3
false
4
1
R4
true
5
2
S1
false
6
2
S2
false
7
2
S3
false
8
2
S4
true
I am trying to use JSON_TABLE to flatten the data for an implicit cursor of a for loop. I did so by using NESTED PATH. Now imagine we are in the for loop, i.e. for every "record" in C there is a record in the cursor/loop. While the data of C is "as is", the data of B gets mimeographed, and the data of A likewise, i.e.
A1
A2
A3
B1
C1
C2
V1
true
42
Q1
R1
false
V1
true
42
Q1
R2
false
V1
true
42
Q2
R3
false
V1
true
42
Q2
R4
true
V2
false
42
T1
S1
false
V2
false
42
T1
S2
false
V2
false
42
T2
S3
false
V2
false
42
T2
S4
true
My initial idea was to insert from the "A" data into table "A" only the first occurrence of a A-tuple saving the written ID by RETURNING .. INTO, insert the first occurrence of a B-tuple into "B" saving the written ID likewise and finally insert the C-tuple into "C". The problem here is to detect the change of the A-tuple respectively of the B-tuple. "A" should not be too much of a difficulty as it has a (business) key I also can preserve for checks. However, B has no (business) key.
My idea with B was to use an unknown, maybe inexistent pseudo column JSON_TABLE/NESTED PATH provides, that tells e.g. to which expression of "B" the current "C" belongs to. In fact, it would be a key for "B". Yet, I have not found anything in the Oracle docs and on the internet. Do you happen to know such a column?
As an alternative, I think about not using the NESTED PATH for "C" but return the array as JSON column and afterwards to somehow create an inner for loop, but I do not know how to use a JSON typed PL/SQL variable to extract an array of objects as nested table of some sort to be able to use as for loop cursor. Do you happen to know examples for such?
Is there a way, I miss completely?
Kind regards
Thiemo

Read complex JSON to extract key values

I have a JSON and I'm trying to read part of it to extract keys and values.
Assuming response is my JSON data, here is my code:
data_dump = json.dumps(response)
data = json.loads(data_dump)
Here my data object becomes a list and I'm trying to get the keys as below
id = [key for key in data.keys()]
This fails with the error:
A list object does not have an attribute keys**. How can I get over this to get my below output?
Here is my JSON:
{
"1": {
"task": [
"wakeup",
"getready"
]
},
"2": {
"task": [
"brush",
"shower"
]
},
"3": {
"task": [
"brush",
"shower"
]
},
"activites": ["standup", "play", "sitdown"],
"statuscheck": {
"time": 60,
"color": 1002,
"change(me)": 9898
},
"action": ["1", "2", "3", "4"]
}
The output I need is as below. I do not need data from the rest of JSON.
id
task
1
wakeup, getready
2
brush , shower
If you know that the keys you need are "1" and "2", you could try reading the JSON string as a dataframe, unpivoting it, exploding and grouping:
from pyspark.sql import functions as F
df = (spark.read.json(sc.parallelize([data_dump]))
.selectExpr("stack(2, '1', `1`, '2', `2`) (id, task)")
.withColumn('task', F.explode('task.task'))
.groupBy('id').agg(F.collect_list('task').alias('task'))
)
df.show()
# +---+------------------+
# | id| task|
# +---+------------------+
# | 1|[wakeup, getready]|
# | 2| [brush, shower]|
# +---+------------------+
However, it may be easier to deal with it in Python:
data = json.loads(data_dump)
data2 = [(k, v['task']) for k, v in data.items() if k in ['1', '2']]
df = spark.createDataFrame(data2, ['id', 'task'])
df.show()
# +---+------------------+
# | id| task|
# +---+------------------+
# | 1|[wakeup, getready]|
# | 2| [brush, shower]|
# +---+------------------+

Explode multiple columns from nested JSON but it is giving extra records

I have a JSON document like below:
{
"Data": [{
"Code": "ABC",
"ID": 123456,
"Type": "Yes",
"Geo": "East"
}, {
"Code": "XYZ",
"ID": 987654,
"Type": "No",
"Geo": "West"
}],
"Total": 2,
"AggregateResults": null,
"Errors": null
}
My PySpark sample code:
getjsonresponsedata=json.dumps(getjsondata)
jsonDataList.append(getjsonresponsedata)
jsonRDD = sc.parallelize(jsonDataList)
df_Json=spark.read.json(jsonRDD)
display(df_Json.withColumn("Code",explode(col("Data.Code"))).withColumn("ID",explode(col("Data.ID"))).select('Code','ID'))
When I explode the JSON then I get below records (it looks like cross join)
Code ID
ABC 123456
ABC 987654
XYZ 123456
XYZ 987654
But I expect the records like below:
Code ID
ABC 123456
XYZ 987654
Could you please help me on how to get the expected result?
You only need to explode Data column, then you can select fields from the resulting struct column (Code, Id...). What duplicates the rows here is that you're exploding 2 arrays Data.Code and Data.Id.
Try this instead:
import pyspark.sql.functions as F
df_Json.withColumn("Data", F.explode("Data")).select("Data.Code", "Data.Id").show()
#+----+------+
#|Code| Id|
#+----+------+
#| ABC|123456|
#| XYZ|987654|
#+----+------+
Or using inline function directly on Data array:
df_Json.selectExpr("inline(Data)").show()
#+----+----+------+----+
#|Code| Geo| ID|Type|
#+----+----+------+----+
#| ABC|East|123456| Yes|
#| XYZ|West|987654| No|
#+----+----+------+----+

How to deal with not existing values using JSON_EXTRACT?

I have a list ob objects. Each object contains several properties. Now I want to make a SELECT statement that gives me a list of a single property values. The simplified list look like this:
[
[
{
"day": "2021-10-01",
"entries": [
{
"name": "Start of competition",
"startTimeDelta": "08:30:00"
}
]
},
{
"day": "2021-10-02",
"entries": [
{
"name": "Start of competition",
"startTimeDelta": "03:30:00"
}
]
},
{
"day": "2021-10-03",
"entries": [
{
"name": "Start of competition"
}
]
}
]
]
The working SELECT is now
SELECT
JSON_EXTRACT(column, '$.days[*].entries[0].startTimeDelta') AS list
FROM table
The returned result is
[
"08:30:00",
"03:30:00"
]
But what I want to get (and also have expected) is
[
"08:30:00",
"03:30:00",
null
]
What can I do or how can I change the SELECT statement so that I also get NULL values in the list?
SELECT startTimeDelta
FROM test
CROSS JOIN JSON_TABLE(val,
'$[*][*].entries[*]' COLUMNS (startTimeDelta TIME PATH '$.startTimeDelta')) jsontable
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=491f0f978d200a8a8522e3200509460e
Do you also have a working idea for MySQL< 8? – Lars
What is max amount of objects in the array on the 2nd level? – Akina
Well it's usually less than 10 – Lars
SELECT JSON_EXTRACT(val, CONCAT('$[0][', num, '].entries[0].startTimeDelta')) startTimeDelta
FROM test
-- up to 4 - increase if needed
CROSS JOIN (SELECT 0 num UNION SELECT 1 UNION SELECT 2 UNION SELECT 3) nums
WHERE JSON_EXTRACT(val, CONCAT('$[0][', num, '].entries[0]')) IS NOT NULL;
https://www.db-fiddle.com/f/xnCCSTGQXevcpfPH1GAbUo/0

Subset a JSON Object with MySQL Query

I have a MySQL database and one of the tables is called 'my_table'. In this table, one of the columns is called 'my_json_column' and this column is stored as a JSON object in MySQL. The JSON object has about 17 key:value pairs (see below). I simply want to return a "slimmed-down" JSON Object from a MySQL query that returns 4 of the 17 fields.
I have tried many different MySQL queries, see below, but I can't seem to get a returned subset JSON Object. I am sure it is simple, but I have been unsuccessful.
Something like this:
SELECT
json_extract(my_json_column, '$.X'),
json_extract(my_json_column, '$.Y'),
json_extract(my_json_column, '$.KB'),
json_extract(my_json_column, '$.Name')
FROM my_table;
yields:
5990.510000 90313.550000 5990.510000 "Operator 1"
I want to get this result instead (a returned JSON Object) with key value pairs:
[ { X: 5990.510000, Y: 90313.550, KB: 2105, Name: "Well 1" } ]
Sample data:
{
"Comment" : "No Comment",
"Country" : "USA",
"County" : "County 1",
"Field" : "Field 1",
"GroundElevation" : "5400",
"Identifier" : "11435358700000",
"Interpreter" : "Interpreter 1",
"KB" : 2105,
"Name" : "Well 1",
"Operator" : "Operator 1",
"Owner" : "me",
"SpudDate" : "NA",
"State" : "MI",
"Status" : "ACTIVE",
"TotalDepth" : 5678,
"X" : 5990.510000,
"Y" : 90313.550
}
Thank you in advance.
Use JSON_OBJECT(), available since MySQL 5.6:
Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs
SELECT
JSON_OBJECT(
'X', json_extract(my_json_column, '$.X'),
'Y', json_extract(my_json_column, '$.Y'),
'KB', json_extract(my_json_column, '$.KB'),
'Name', json_extract(my_json_column, '$.Name')
) my_new_json
FROM my_table;
This demo on DB Fiddle with your sample data returns:
| my_new_json |
| ----------------------------------------------------------- |
| {"X": 5990.51, "Y": 90313.55, "KB": 2105, "Name": "Well 1"} |