I have the contacts.json file:
{
"emergencyContacts": [
{
"name": "Jane Doe",
"phone": "888-555-1212",
"relationship": "spouse"
},
{
"name": "Justin Doe",
"phone": "877-123-1212",
"relationship": "parent"
}
]
}
So I wanna access Name key in emergencyContacts array in Julia. I'm trying this:
import JSON
dict = Dict()
open("contacts.json", "r") do f
global dict
dicttxt = readstring(f) # file information to string
dict=JSON.parse(dicttxt) # parse and transform data
end
for (values) in dict["emergencyContacts"]
println(values)
end
This is a poorly specified question:
There is no "firstname" key.
There is no "Employees" array.
Presumably, you are looking for
julia> first_names = String[]
0-element Array{String,1}
julia> for contact in dict["emergencyContacts"]
push!(first_names, split(contact["name"]," ")[1])
end
julia> first_names
2-element Array{String,1}:
"Jane"
"Justin"
The "nested" key called "name" can be extracted for an array element using dict["emergencyContacts"][n]["name"] where n is array index.
Related
i'm new to Couchbase and N1QL syntax and i'm facing an issue.
Let's say we have 3 type of documents:
Doc1 of TypeA with key = typeA:Doc1
{
"type": "typeA"
"id": "Doc1",
"sequences": [
"typeB:Doc2"
]
}
Doc2 of TypeB with key = typeB:Doc2
{
"id": "Doc2",
"processors": [
{
"order": 1,
"id": "typeC:Doc3"
}
]
}
Doc3 of TypeC with key = typeC:Doc3
{
"id": "Doc3",
"prop": "value"
}
What i want to achieve is to nest these 3 objects by their document keys in ordere to have a unique document with this structure:
{
"id": "Doc1",
"sequences": [
{
"id": "Doc2",
"processors": [
{
"order": 1,
"id": "Doc3",
"prop": "value"
}
]
}
]
What i've done is to nest the first two documents to obtain a partial result. But i'm tryng to integrate also the third document.
Here's my attempt:
SELECT dev.*,
ARRAY sq_i FOR sq_i IN prseq END AS sequences
FROM data dev
NEST data prseq ON KEYS dev.sequences
WHERE dev.type = 'TypeA'
Can anyone help me with the third level of nesting?
Thank you.
Use subqueries
SELECT dt.*,
(SELECT ds.*,
(ARRAY OBJECT_ADD((SELECT RAW dp FROM data AS dp USE KEYS v.id)[0], "order", v.`order`)
FOR v IN ds.processors
END) AS processors
FROM data AS ds USE KEYS dt.sequences) AS sequences
FROM data AS dt
WHERE dt.type = 'TypeA';
I have about 100 JSON files, all titled with different dates and I need to merge them into one CSV file that has headers "date", "real_name", "text".
There are no dates listed in the JSON itself, and the real_name is nested. I haven't worked with JSON in a while and am a little lost.
The basic structure of the JSON looks more or less like this:
Filename: 2021-01-18.json
[
{
"client_msg_id": "xxxx",
"type": "message",
"text": "THIS IS THE TEXT I WANT TO PULL",
"user": "XXX",
"user_profile": {
"first_name": "XXX",
"real_name": "THIS IS THE NAME I WANT TO PULL",
"display_name": "XXX",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "yf=A9",
}
]
}
]
So far I have
import glob
read_files = glob.glob("*.json")
output_list = []
all_items = []
for f in read_files:
with open(f, "rb") as infile:
output_list.append(json.load(infile))
data = {}
for obj in output_list[]
data['date'] = f
data['text'] = 'text'
data['real_name'] = 'real_name'
all_items.append(data)
Once you've read the JSON object, just index into the dictionaries for the data. You might need obj[0]['text'], etc., if your JSON data is really in a list in each file, but that seems odd and I'm assuming your data was pasted from output_list after you'd collected the data. So assuming your file content is exactly like below:
{
"client_msg_id": "xxxx",
"type": "message",
"text": "THIS IS THE TEXT I WANT TO PULL",
"user": "XXX",
"user_profile": {
"first_name": "XXX",
"real_name": "THIS IS THE NAME I WANT TO PULL",
"display_name": "XXX",
"is_restricted": false,
"is_ultra_restricted": false
},
"blocks": [
{
"type": "rich_text",
"block_id": "yf=A9",
}
]
}
test.py:
import json
import glob
from pathlib import Path
read_files = glob.glob("*.json")
output_list = []
all_items = []
for f in read_files:
with open(f, "rb") as infile:
output_list.append(json.load(infile))
data = {}
for obj in output_list:
data['date'] = Path(f).stem
data['text'] = obj['text']
data['real_name'] = obj['user_profile']['real_name']
all_items.append(data)
print(all_items)
Output:
[{'date': '2021-01-18', 'text': 'THIS IS THE TEXT I WANT TO PULL', 'real_name': 'THIS IS THE NAME I WANT TO PULL'}]
Im using Spark 2.4.3 and Scala 2.11
Below is my current JSON string in a DataFrame column.
Im trying to store the schema of this JSON string in another column using schema_of_json function.
But its throwing below the error. How could I resolve this?
{
"company": {
"companyId": "123",
"companyName": "ABC"
},
"customer": {
"customerDetails": {
"customerId": "CUST-100",
"customerName": "CUST-AAA",
"status": "ACTIVE",
"phone": {
"phoneDetails": {
"home": {
"phoneno": "666-777-9999"
},
"mobile": {
"phoneno": "333-444-5555"
}
}
}
},
"address": {
"loc": "NORTH",
"adressDetails": [
{
"street": "BBB",
"city": "YYYYY",
"province": "AB",
"country": "US"
},
{
"street": "UUU",
"city": "GGGGG",
"province": "NB",
"country": "US"
}
]
}
}
}
Code:
val df = spark.read.textFile("./src/main/resources/json/company.txt")
df.printSchema()
df.show()
root
|-- value: string (nullable = true)
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"company":{"companyId":"123","companyName":"ABC"},"customer":{"customerDetails":{"customerId":"CUST-100","customerName":"CUST-AAA","status":"ACTIVE","phone":{"phoneDetails":{"home":{"phoneno":"666-777-9999"},"mobile":{"phoneno":"333-444-5555"}}}},"address":{"loc":"NORTH","adressDetails":[{"street":"BBB","city":"YYYYY","province":"AB","country":"US"},{"street":"UUU","city":"GGGGG","province":"NB","country":"US"}]}}}|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
df.withColumn("jsonSchema",schema_of_json(col("value")))
Error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'schemaofjson(`value`)' due to data type mismatch: The input json should be a string literal and not null; however, got `value`.;;
'Project [value#0, schemaofjson(value#0) AS jsonSchema#10]
+- Project [value#0]
+- Relation[value#0] text
The workaround solution that I found was to pass the column value as below to the schema_of_json function.
df.withColumn("jsonSchema",schema_of_json(df.select(col("value")).first.getString(0)))
Courtesy:
Implicit schema discovery on a JSON-formatted Spark DataFrame column
Since SPARK-24709 was introduced schema_of_json accepts just literal strings. You can extract schema of String in DDL format by calling
spark.read
.json(df.select("value").as[String])
.schema
.toDDL
If one is looking for a pyspark answer :
import pyspark.sql.functions as F
import pyspark.sql.types as T
import json
def process(json_content):
if json_content is None :
return []
try:
# Parse the content of the json, extract the keys only
keys = json.loads(json_content).keys()
return list(keys)
except Exception as e:
return [e]
udf_function = F.udf(process_file, T.ArrayType(T.StringType()))
my_df = my_df.withColumn("schema", udf_function(F.col("json_raw"))
In my Django project, my View is converting a ValuesQuerySet to a JSON string:
import json
# ...
device_list = list(Device.objects.values())
device_json = json.dumps(device_list)
The resulting JSON string:
[{"field1": "value", "location_id": 1, "id": 1, "field2": "value"},
{...}]
How can I include the data within the location object represented by "location_id": 1, instead of the ID number? Something like this:
[{"field1": "value", "location_name": "name", "location_region": "region", "another_location_field": "value", "id": 1, "field2": "value"},
{...}]
I found that you can use Field Lookups to follow relationships and access fields in another related model:
import json
# ...
device_list = list(Device.objects.values('field1', 'field2', 'location__name', 'location__region'))
json.dumps(device_list)
The resulting JSON string:
[{field1": "value", "field2": "value", "location__name": "name", "location__region": "region"},
{...}]
I want to store an array of key value items, a common way to do this could be something like:
// the JSON data may store several data types, not just key value lists,
// but, must be able to identify some data as a key value list
// --> more "common" way to store a key value array
{
[
{"key": "slide0001.html", "value": "Looking Ahead"},
{"key": "slide0008.html", "value": "Forecast"},
{"key": "slide0021.html", "value": "Summary"},
// another THOUSANDS KEY VALUE PAIRS
// ...
],
"otherdata" : { "one": "1", "two": "2", "three": "3" }
}
But, when there is many pairs / items, the string length becomes prohibited,
and I want a compact way, this could be an example:
// --> (1) a "compact" way to store a key value array
{
[
{"slide0001.html", "Looking Ahead"},
{"slide0008.html", "Forecast"},
{"slide0021.html", "Summary"},
// another THOUSANDS KEY VALUE PAIRS
// ...
],
"otherdata" : { "one": "1", "two": "2", "three": "3" }
}
Additionally, I want a way to identify the data as a keyvalue array,
because, I may want to store other data in the same JSON file.
I have these examples:
// --> (2) a "compact" way to store a key value array
{
"keyvaluelist":
[
{"slide0001.html", "Looking Ahead"},
{"slide0008.html", "Forecast"},
{"slide0021.html", "Summary"},
// another THOUSANDS KEY VALUE PAIRS
// ...
],
"otherdata" : { "one": "1", "two": "2", "three": "3" }
}
// --> (3) a "compact" way to store a key value array
{
"mylist":
{
"type": "keyvaluearray",
"data":
[
{"slide0001.html", "Looking Ahead"},
{"slide0008.html", "Forecast"},
{"slide0021.html", "Summary"},
// another THOUSANDS KEY VALUE PAIRS
// ...
]
},
"otherdata" : { "one": "1", "two": "2", "three": "3" }
}
What do you thing, which one do you suggest, do you have another way ?
Thanks.
UPDATE 1: Remove invalid code. Javascript => JSON
UPDATE 2: Add non key value data
UPDATE 3: Replace "[" and "]" for "{" and "}" in each key value pair
So why don't you simply use a key-value literal?
var params = {
'slide0001.html': 'Looking Ahead',
'slide0002.html': 'Forecase',
...
};
return params['slide0001.html']; // returns: Looking Ahead
If the logic parsing this knows that {"key": "slide0001.html", "value": "Looking Ahead"} is a key/value pair, then you could transform it in an array and hold a few constants specifying which index maps to which key.
For example:
var data = ["slide0001.html", "Looking Ahead"];
var C_KEY = 0;
var C_VALUE = 1;
var value = data[C_VALUE];
So, now, your data can be:
[
["slide0001.html", "Looking Ahead"],
["slide0008.html", "Forecast"],
["slide0021.html", "Summary"]
]
If your parsing logic doesn't know ahead of time about the structure of the data, you can add some metadata to describe it. For example:
{ meta: { keys: [ "key", "value" ] },
data: [
["slide0001.html", "Looking Ahead"],
["slide0008.html", "Forecast"],
["slide0021.html", "Summary"]
]
}
... which would then be handled by the parser.
To me, this is the most "natural" way to structure such data in JSON, provided that all of the keys are strings.
{
"keyvaluelist": {
"slide0001.html": "Looking Ahead",
"slide0008.html": "Forecast",
"slide0021.html": "Summary"
},
"otherdata": {
"one": "1",
"two": "2",
"three": "3"
},
"anotherthing": "thing1",
"onelastthing": "thing2"
}
I read this as
a JSON object with four elements
element 1 is a map of key/value pairs named "keyvaluelist",
element 2 is a map of key/value pairs named "otherdata",
element 3 is a string named "anotherthing",
element 4 is a string named "onelastthing"
The first element or second element could alternatively be described as objects themselves, of course, with three elements each.
For use key/value pair in json use an object and don't use array
Find name/value in array is hard but in object is easy
Ex:
var exObj = {
"mainData": {
"slide0001.html": "Looking Ahead",
"slide0008.html": "Forecast",
"slide0021.html": "Summary",
// another THOUSANDS KEY VALUE PAIRS
// ...
},
"otherdata" : { "one": "1", "two": "2", "three": "3" }
};
var mainData = exObj.mainData;
// for use:
Object.keys(mainData).forEach(function(n,i){
var v = mainData[n];
console.log('name' + i + ': ' + n + ', value' + i + ': ' + v);
});
// and string length is minimum
console.log(JSON.stringify(exObj));
console.log(JSON.stringify(exObj).length);