Representing a DB schema in JSON - json

Let's say I have two tables in my database, employee and car defined thusly.
employee:
+--------------+------------+
| col_name | data_type |
+--------------+------------+
| eid | int |
| name | string |
| salary | int |
| destination | string |
+--------------+------------+
car:
+------------+----------------+
| col_name | data_type |
+------------+----------------+
| cid | int |
| name | string |
| model | string |
| cylinders | int |
| price | int |
+------------+----------------+
I would like to export this schema to a JSON object so that I can populate an HTML dropdown menu based on the table - for instance, the table menu would have employee and car. Selecting employee would populate another dropdown with the column names and types corresponding to that table.
Given this use case, would the optimal json representation of the database be this?
{
"employee": {
"salary": "int",
"destination": "string",
"eid": "int",
"name": "string"
},
"car": {
"price": "int",
"model": "string",
"cylinders": "int",
"name": "string",
"cid": "int"
}
}
EDIT:
Or would this be more appropriate?
{
"employee": [
{
"type": "int",
"colname": "eid"
},
{
"type": "string",
"colname": "name"
},
{
"type": "int",
"colname": "salary"
},
{
"type": "string",
"colname": "destination"
}
],
"car": [
{
"type": "int",
"colname": "cid"
},
{
"type": "string",
"colname": "name"
},
{
"type": "string",
"colname": "model"
},
{
"type": "int",
"colname": "cylinders"
},
{
"type": "int",
"colname": "price"
}
]
}

In the first example, all your data is stored in objects. Assuming the structure is stored in a var mytables, you can get the names with Object.keys(mytables), which returns ['employee', 'car']. Equivalent for the columns inside: Object.keys(mytables['employee'].cols) returns ['salary','destination','eid','name'].
In the second example I would suggest to also store the tables in an array as the columns, like
[name: 'employee',
cols: [ {
"type": "int",
"colname": "cid"
}, ...]
Then you can easily iterate over the arrays and get the names by accessing mytables[i].name
for (t in tables){
console.log(tables[t].name);
for (c in tables[t].cols)
console.log(" - ",tables[t].cols[c].colname, ": ", tables[t].cols[c].type);
}

Related

how to extract and modify inner array objects with parent object data in jq

We are tying to format a json similar to this:
[
{"id": 1,
"type": "A",
"changes": [
{"id": 12},
{"id": 13}
],
"wanted_key": "good",
"unwanted_key": "aaa"
},
{"id": 2,
"type": "A",
"unwanted_key": "aaa"
},
{"id": 3,
"type": "B",
"changes": [
{"id": 31},
{"id": 32}
],
"unwanted_key": "aaa",
"unwanted_key2": "aaa"
},
{"id": 4,
"type": "B",
"unwanted_key3": "aaa"
},
null,
null,
{"id": 7}
]
into something like this:
[
{
"id": 1,
"type": "A",
"wanted_key": true # every record must have this key/value
},
{
"id": 12, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 13, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 32, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN", # records without a type should have this type
"wanted_key": true
}
]
So far, I've been able to:
remove null records
obtain the keys we need with their default
give records without a type a default type
What we are missing:
from records having a changes key, create new records with the type of their parent record
join all records in a single array
Unfortunately we are not entirely sure how to proceed... Any help would be appreciated.
So far our jq goes like this:
del(..|nulls) | map({id, type: (.type // "UNKN"), wanted_key: (true)}) | del(..|nulls)
Here's our test code:
https://jqplay.org/s/eLAWwP1ha8P
The following should work:
map(select(values))
| map(., .type as $type | (.changes[]? + {$type}))
| map({id, type: (.type // "UNKN"), wanted_key: true})
Only select non-null values
Return the original items followed by their inner changes array (+ outer type)
Extract 3 properties for output
Multiple map calls can usually be combined, so this becomes:
map(
select(values)
| ., (.type as $type | (.changes[]? + {$type}))
| {id, type: (.type // "UNKN"), wanted_key: true}
)
Another option without variables:
map(
select(values)
| ., .changes[]? + {type}
| {id, type: (.type // "UNKN"), wanted_key: true}
)
# or:
map(select(values))
| map(., .changes[]? + {type})
| map({id, type: (.type // "UNKN"), wanted_key: true})
or even with a separate normalization step for the unknown type:
map(select(values))
| map(.type //= "UNKN")
| map(., .changes[]? + {type})
| map({id, type, wanted_key: true})
# condensed to a single line:
map(select(values) | .type //= "UNKN" | ., .changes[]? + {type} | {id, type, wanted_key: true})
Explanation:
Select only non-null values from the array
If type is not set, create the property with value "UNKN"
Produce the original array items, followed by their nested changes elements extended with the parent type
Reshape objects to only contain properties id, type, and wanted_key.
Here's one way:
map(
select(values)
| (.type // "UNKN") as $type
| ., .changes[]?
| {id, $type, wanted_key: true}
)
[
{
"id": 1,
"type": "A",
"wanted_key": true
},
{
"id": 12,
"type": "A",
"wanted_key": true
},
{
"id": 13,
"type": "A",
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31,
"type": "B",
"wanted_key": true
},
{
"id": 32,
"type": "B",
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN",
"wanted_key": true
}
]
Demo
Something like below should work
map(
select(type == "object") |
( {id}, {id : ( .changes[]? .id )} ) +
{ type: (.type // "UNKN"), wanted_key: true }
)
jq play - demo

Parse nested Json to splunk query which has string

I have a multiple result for a macAddress which contains the device details.
This is the sample data
"data": {
"a1:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy-J7",
"mac": "a1:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
},
"a0:b2:c3:d4:11:22": {
"deviceIcons": {
"type": "Phone",
"icons": {
"3x": null,
"2x": "image.png"
}
},
"advancedDeviceId": {
"agentId": 113,
"partnerAgentId": "131",
"dhcpHostname": "Galaxy",
"mac": "a0:b2:c3:d4:11:22",
"lastSeen": 12,
"model": "Android Phoe",
"id": 1
}
}
}
}
How can I query in splunk for all the kind of above sample results to get the advancedDeviceId.model and advancedDeviceId.id in tabular format?
I think this will do what you want
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address
Here is a "run anywhere" example that has two events each with two addresses:
| makeresults
| eval _raw="{\"data\":{\"a1:b2:c3:d4:11:21\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Galaxy-J7\",\"mac\":\"a1:b2:c3:d4:11:21\",\"lastSeen\":12,\"model\":\"Android Phoe\",\"id\":1}},\"a0:b2:c3:d4:11:22\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"iPhone 6\",\"mac\":\"a0:b2:c3:d4:11:22\",\"lastSeen\":12,\"model\":\"Apple Phoe\",\"id\":2}}}}"
| append [
| makeresults
| eval _raw="{\"data\":{\"b1:b2:c3:d4:11:23\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Nokia\",\"mac\":\"b1:b2:c3:d4:11:23\",\"lastSeen\":12,\"model\":\"Symbian Phoe\",\"id\":3}},\"b0:b2:c3:d4:11:24\":{\"deviceIcons\":{\"type\":\"Phone\",\"icons\":{\"3x\":null,\"2x\":\"image.png\"}},\"advancedDeviceId\":{\"agentId\":113,\"partnerAgentId\":\"131\",\"dhcpHostname\":\"Windows\",\"mac\":\"b0:b2:c3:d4:11:24\",\"lastSeen\":12,\"model\":\"Windows Phoe\",\"id\":4}}}}"
]
| spath
| untable _time column value
| rex field=column "data.(?<address>[^.]+)\.advancedDeviceId\.(?<item>[^.]+)"
| table _time address item value
| eval {item}=value
| stats list(model) as model
list(id) as id
list(dhcpHostname) as dhcpHostname
list(mac) as mac
by address

migrate mongodb table by specefic columns only to mysql using aws dms

I have a schema named reports in mongo and a collection named totals.
The keys in it looks like:
{ "_id" : { "dt" : "2018-12-02", "dt2" : "2018-04-08", "num" : 1312312312 }, "str" : 1 }
I would like to use DMS to migrate this collection into mysql instance on aws. The table should look like:
create table tab(
dt date,
dt2 date,
num bigint)
Currently, I'm using dms with simple rule:
{
"rules": [
{
"rule-type": "transformation",
"rule-id": "1",
"rule-name": "1",
"rule-target": "table",
"object-locator": {
"schema-name": "reports",
"table-name": "totals"
},
"rule-action": "rename",
"value": "tab",
"old-value": null
},
{
"rule-type": "selection",
"rule-id": "2",
"rule-name": "2",
"object-locator": {
"schema-name": "reports",
"table-name": "totals"
},
"rule-action": "include",
"filters": []
}
]
}
The result is not what I wanted:
MySQL [stats]> desc tab;
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| _doc | longtext | YES | | NULL | |
+-------+----------+------+-----+---------+-------+
MySQL [(none)]> select * from tab limit 1;
+------------------------------------------------------------------------------------------+
| _doc |
+------------------------------------------------------------------------------------------+
| { "_id" : { "dt" : "2018-12-02", "dt2" : "2018-04-08", "num" : 1312312312 }, "str" : 1 } |
+------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Endpoint needed to have nestingLevel=ONE; instead of nestingLevel=NONE;.
Basically it means look at the data as a table instead of a document.

PostgreSQL return JSON objects as key-value pairs

A PostgreSQL instance stores data in JSONB format:
CREATE TABLE myschema.mytable (
id BIGSERIAL PRIMARY KEY,
data JSONB NOT NULL
)
The data array might contain objects like:
{
"observations": {
"temperature": {
"type": "float",
"unit": "C",
"value": "23.1"
},
"pressure": {
"type": "float",
"unit": "mbar",
"value": "1011.3"
}
}
}
A selected row should be returned as key-value pairs in a format like:
temperature,type,float,value,23.1,unit,C,pressure,type,float,value,1011.3,unit,mbar
The following query returns at least each object, while still JSON:
SELECT id, value FROM mytable JOIN jsonb_each_text(mytable.data->'observations') ON true;
1 | {"type": "float", "unit": "mbar", "value": 1140.5}
1 | {"type": "float", "unit": "C", "value": -0.9}
5 | {"type": "float", "unit": "mbar", "value": "1011.3"}
5 | {"type": "float", "unit": "C", "value": "23.1"}
But the results are splitted and not in text format.
How can I return key-value pairs of all objects in data?
This will flatten the json structure and effectively just concatenate the values, along with the top-level key names (e.g. temperature and pressure), for the expected "depth" level. See if this is what you had in mind.
SELECT
id,
(
SELECT STRING_AGG(conc, ',')
FROM (
SELECT CONCAT_WS(',', key, STRING_AGG(value, ',')) AS conc
FROM (
SELECT key, (jsonb_each_text(value)).value
FROM jsonb_each(data->'observations')
) AS x
GROUP BY key
) AS csv
) AS csv
FROM mytable
Result:
| id | csv |
| --- | --------------------------------------------------- |
| 1 | pressure,float,mbar,1011.3,temperature,float,C,23.1 |
| 2 | pressure,bigint,unk,455,temperature,int,F,45 |
https://www.db-fiddle.com/f/ada5mtMgYn5acshi3WLR7S/0

Error in Nested JSON in HIve

I was trying to load this json data in hive
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
using DDL commands
ADD JAR /home/cloudera/Downloads/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar;
CREATE EXTERNAL TABLE format.json_serde (
`id` string,
`type` string,
`name` string,
`ppu` float,
batters` struct < `batter`:array < struct <`bid`:string, `btype`:string >>>,
`topping`:array < struct<`tid`:int, `ttype`:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';
is throwing me error
FAILED: ParseException line 7:11 cannot recognize input near ':' 'array' '<' in column type </b>
You got typos
ttype`:string should be ttype:string
battersstruct should be batters struct
topping:array should be topping array
JSON SerDe mapping is done by name.
Your structs fields names should match the actual names, e.g. id and not bid or tid, otherwise you'll get NULL values for these fields.
There is already a JSON SerDe whicg is part of the Hive installation.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
create external table json_serde
(
id string
,type string
,name string
,ppu float
,batters struct<batter:array<struct<id:string,type:string>>>
,topping array<struct<id:string,type:string>>
)
row format serde
'org.apache.hive.hcatalog.data.JsonSerDe'
stored as textfile
;
select * from json_serde
;
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | type | name | ppu | batters | topping |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0001 | donut | Cake | 0.550000011920929 | {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil'sFood"}]} | [{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"PowderedSugar"},{"id":"5006","type":"ChocolatewithSprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}] |
+------+-------+------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
It worked when i removed the semicolons near topping. Thanks
CREATE EXTERNAL TABLE format.json_serde (
id string,
type string,
name string,
ppu float,
batters struct<batter:array<
struct<bid:string, btype:string >>>,
topping array< struct<tid:string, ttype:string>>
)