I have large pandas tabular dataframe to convert into JSON.
The standard .to_json() functions does not make a compact format for JSON.
How to get JSON output forma like this, using pandas only ?
{"index": [ 0, 1 ,3 ],
"col1": [ "250", "1" ,"3" ],
"col2": [ "250", "1" ,"3" ]
}
This is a much compact format form of JSON for tabular data.
(I can do a loop over the rows.... but)
It seems you need to_dict first and then dict to json:
df = pd.DataFrame({"index": [ 0, 1 ,3 ],
"col1": [ "250", "1" ,"3" ],
"col2": [ "250", "1" ,"3" ]
})
print (df)
col1 col2 index
0 250 250 0
1 1 1 1
2 3 3 3
print (df.to_dict(orient='list'))
{'col1': ['250', '1', '3'], 'col2': ['250', '1', '3'], 'index': [0, 1, 3]}
import json
print (json.dumps(df.to_dict(orient='list')))
{"col1": ["250", "1", "3"], "col2": ["250", "1", "3"], "index": [0, 1, 3]}
Because it is not implemented yet:
print (df.to_json(orient='list'))
ValueError: Invalid value 'list' for option 'orient'
EDIT:
If index is not column, add reset_index:
df = pd.DataFrame({"col1": [250, 1, 3],
"col2": [250, 1, 3]})
print (df)
col1 col2
0 250 250
1 1 1
2 3 3
print (df.reset_index().to_dict(orient='list'))
{'col1': [250, 1, 3], 'index': [0, 1, 2], 'col2': [250, 1, 3]}
You can use to_dict and json (and add the index as extra column if required via assign):
import json
df = pd.DataFrame({"col1": [250, 1, 3],
"col2": [250, 1, 3]})
json_dict = df.assign(index=df.index).to_dict(orient="list")
print(json.dumps(json_dict))
>>> '{"index": [0, 1, 2], "col1": [250, 1, 3], "col2": [250, 1, 3]}'
Related
I want to extract value from a column using its key. The data is stored as a dictionary. I want the values of "hotel_id" from the column.
Data in column is stored like this: (Examples)
["search_params": {"region": "PK", "sort_by": "POPULARITY", "currency": "PKR", "hotel_id": "688867", "slot_only": false, "segment_id": "6f9a9cc5-be52-4ae4-b5d2-cc19c6753085", "occupancies": [{"rooms": 1, "adults": 2, "children": 0, "child_age_list": []}], "check_in_date": "2022-11-13"]
[ "search_result": {"hotel_dict": {"241825": {"source": 3, "hotel_id": 241825, "availability": {"rooms": {"rooms_id_list": [3187909, 3187910, 3187911, 3187912], "rooms_data_dict": {"3187909": {"room_id": 3187909, "rate_options": [{"adult": 3, "child": 0, "bar_rate": 7500, "rate_key": "542984|991065", "extra_bed": 0, "occupancy": {"2": 1}]
I tried a query but it works for specific number of characters and does not ignore spaces and commas (dirty data).
SELECT substr(substr(info, instr(info, 'hotel_id') + 11),
2,
instr(substr(info, instr(info, 'hotel_id') + 1), '"') - 2
) result
from df
The result I get is:
result
0 688867
1 41825,
2 41771,
3 394910
4 ull, "
5 394910
We are tying to format a json similar to this:
[
{"id": 1,
"type": "A",
"changes": [
{"id": 12},
{"id": 13}
],
"wanted_key": "good",
"unwanted_key": "aaa"
},
{"id": 2,
"type": "A",
"unwanted_key": "aaa"
},
{"id": 3,
"type": "B",
"changes": [
{"id": 31},
{"id": 32}
],
"unwanted_key": "aaa",
"unwanted_key2": "aaa"
},
{"id": 4,
"type": "B",
"unwanted_key3": "aaa"
},
null,
null,
{"id": 7}
]
into something like this:
[
{
"id": 1,
"type": "A",
"wanted_key": true # every record must have this key/value
},
{
"id": 12, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 13, # note: this was in the "changes" property of record id 1
"type": "A", # type should be the same type than record id 1
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 32, # note: this was in the "changes" property of record id 3
"type": "B", # type should be the same type than record id 3
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN", # records without a type should have this type
"wanted_key": true
}
]
So far, I've been able to:
remove null records
obtain the keys we need with their default
give records without a type a default type
What we are missing:
from records having a changes key, create new records with the type of their parent record
join all records in a single array
Unfortunately we are not entirely sure how to proceed... Any help would be appreciated.
So far our jq goes like this:
del(..|nulls) | map({id, type: (.type // "UNKN"), wanted_key: (true)}) | del(..|nulls)
Here's our test code:
https://jqplay.org/s/eLAWwP1ha8P
The following should work:
map(select(values))
| map(., .type as $type | (.changes[]? + {$type}))
| map({id, type: (.type // "UNKN"), wanted_key: true})
Only select non-null values
Return the original items followed by their inner changes array (+ outer type)
Extract 3 properties for output
Multiple map calls can usually be combined, so this becomes:
map(
select(values)
| ., (.type as $type | (.changes[]? + {$type}))
| {id, type: (.type // "UNKN"), wanted_key: true}
)
Another option without variables:
map(
select(values)
| ., .changes[]? + {type}
| {id, type: (.type // "UNKN"), wanted_key: true}
)
# or:
map(select(values))
| map(., .changes[]? + {type})
| map({id, type: (.type // "UNKN"), wanted_key: true})
or even with a separate normalization step for the unknown type:
map(select(values))
| map(.type //= "UNKN")
| map(., .changes[]? + {type})
| map({id, type, wanted_key: true})
# condensed to a single line:
map(select(values) | .type //= "UNKN" | ., .changes[]? + {type} | {id, type, wanted_key: true})
Explanation:
Select only non-null values from the array
If type is not set, create the property with value "UNKN"
Produce the original array items, followed by their nested changes elements extended with the parent type
Reshape objects to only contain properties id, type, and wanted_key.
Here's one way:
map(
select(values)
| (.type // "UNKN") as $type
| ., .changes[]?
| {id, $type, wanted_key: true}
)
[
{
"id": 1,
"type": "A",
"wanted_key": true
},
{
"id": 12,
"type": "A",
"wanted_key": true
},
{
"id": 13,
"type": "A",
"wanted_key": true
},
{
"id": 2,
"type": "A",
"wanted_key": true
},
{
"id": 3,
"type": "B",
"wanted_key": true
},
{
"id": 31,
"type": "B",
"wanted_key": true
},
{
"id": 32,
"type": "B",
"wanted_key": true
},
{
"id": 4,
"type": "B",
"wanted_key": true
},
{
"id": 7,
"type": "UNKN",
"wanted_key": true
}
]
Demo
Something like below should work
map(
select(type == "object") |
( {id}, {id : ( .changes[]? .id )} ) +
{ type: (.type // "UNKN"), wanted_key: true }
)
jq play - demo
I have a table of cart with 2 columns (user_num, data).
user_num will have the phone number of user and
data will have an array of object like [{ "id": 1, "quantity": 1 }, { "id": 2, "quantity": 2 }, { "id": 3, "quantity": 3 }] here id is product id.
user_num | data
----------+--------------------------------------------------------------------------------------
1 | [{ "id": 1, "quantity": 1 }, { "id": 2, "quantity": 2 }, { "id": 3, "quantity": 3 }]
I want to add more data of products in above array of objects in PostgreSQL.
Thanks!
To add the value use the JSONB array append operator ||
Demo
update
test
set
data = data || '[{"id": 4, "quantity": 4}, {"id": 5, "quantity": 5}]'
where
user_num = 1;
I have an array of objects, and want to filter the arrays in the b property to only have elements matching the a property of the object.
[
{
"a": 3,
"b": [
1,
2,
3
]
},
{
"a": 5,
"b": [
3,
5,
4,
3,
5
]
}
]
produces
[
{
"a": 3,
"b": [
3
]
},
{
"a": 5,
"b": [
5,
5
]
}
]
Currently, I've arrived at
[.[] | (.a as $a | .b |= [.[] | select(. == $a)])]
That works, but I'm wondering if there's a better (shorter, more readable) way.
I can think of two ways to do this with less code and both are variants of what you have already figured out on your own.
map(.a as $a | .b |= map(select(. == $a)))
del(.[] | .a as $a | .b[] | select(. != $a))
I have a json input as follow
{
"unique": 1924,
"coordinates": [
{
"time": "2015-01-25T00:00:01.683",
"xyz": [
{
"z": 4,
"y": 2,
"x": 1,
"id": 99,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
},
{
"z": 9,
"y": 9,
"x": 8,
"id": 100,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
},
{
"z": 9,
"y": 6,
"x": 10,
"id": 101,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
}
]
},
{
"time": "2015-01-25T00:00:02.790",
"xyz": [
{
"z": 0,
"y": 3,
"x": 7,
"id": 99,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
},
{
"z": 4,
"y": 6,
"x": 2,
"id": 100,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
},
{
"z": 2,
"y": 9,
"x": 51,
"id": 101,
"inner_arr" : [
{
"a": 1,
"b": 2
},
{
"a": 3,
"b": 4
}
]
}
]
}
]
}
I want to parse this input with jq and store values in bash arrays:
#!/bin/bash
z=()
x=()
y=()
id=()
a=()
b=()
jq --raw-output '.coordinates[] | .xyz[] | (.z) as $z, (.y) as $y,7 (.x) as $x, (.id) as $id, .inner_arr[].a $a, .inner_arr[].b as $b | $z, $y, $x, $id, $a, $b' <<< "$input"
echo -e "${z}"
Expected output for above echo command:
4
9
9
0
4
2
echo -e "${a}"
Expected output for above echo command:
1
3
1
3
1
3
1
3
1
3
1
3
How can I do it with jq with a single jq call looping through all arrays in a cascading fashion?
I want to save CPU by calling jq just once and extract all single or array values.
You cannot set environment variable directly from jq (cf. manual). What you can do is to generate a series of bash declarations for the declare builtin. I suggest to store the declarations in an intermediate bash array (with mapfile) processed directly by declare so that you can stay away from hazardous commands like eval.
mapfile -t < <(
jq --raw-output '
def m(exp): first(.[0] | path(exp)[-1]) + "=(" + (map(exp) | #sh) + ")";
[ .coordinates[].xyz[] ]
| m(.x), m(.y), m(.z), m(.id), m(.inner_arr[].a), m(.inner_arr[].b)
' input
)
declare -a "${MAPFILE[#]}"
The jq script packs all xyz objects in a single array and filters it with the m function for each field represented as a path expression. The function returns a string formatted as field=(val1 val2... valN), where the field name is the last component of the path expression, i.e. x for .x and a for .inner_arr[].a (extracted on the first item of the array).
Then you can check the shell variables with declare -p var or ${var[#]}. ${var} refers to the first element only.
declare -p MAPFILE
declare -p z
echo a: "${a[#]}" / size = ${#a[#]}
declare -a MAPFILE=([0]="x=(1 8 10 7 2 51)" [1]="y=(2 9 6 3 6 9)" [2]="z=(4 9 9 0 4 2)" [3]="id=(99 100 101 99 100 101)" [4]="a=(1 3 1 3 1 3 1 3 1 3 1 3)" [5]="b=(2 4 2 4 2 4 2 4 2 4 2 4)")
declare -a z=([0]="4" [1]="9" [2]="9" [3]="0" [4]="4" [5]="2")
a: 1 3 1 3 1 3 1 3 1 3 1 3 / size = 12