fromJSON encoding issue - json

I try to convert a json object to R dataframe, here is the json object:
json <-
'[
{"Name" : "a", "Age" : 32, "Occupation" : "凡达"},
{"Name" : "b", "Age" : 21, "Occupation" : "打蜡设计费"},
{"Name" : "c", "Age" : 20, "Occupation" : "的拉斯克奖飞"}
]'
then I use fromJSON, mydf <- jsonlite::fromJSON(json), the result is
Name Age Occupation
1 a 32 <U+51E1><U+8FBE>
2 b 21 <U+6253><U+8721><U+8BBE><U+8BA1><U+8D39>
3 c 20 <U+7684><U+62C9><U+65AF><U+514B><U+5956><U+98DE>
I was wondering how this happens, and is there any solution?
Using the package rjson can solve the problem, but the output is a list, but I want a dataframe output.
Thank you.
I've tried Sys.setlocale(locale = "Chinese"), well the characters are indeed Chinese,but the results are still weird like below:
Name Age Occupation
1 a 32 ·²´ï
2 b 21 ´òÀ¯Éè¼Æ·Ñ
3 c 20 µÄÀ­Ë¹¿Ë½±·É

Related

creating json array using column name as key and column as value in postgres

i have table named list in a PostgreSQL database
create table list (firstname text,lastname text,age integer);
insert into list values ('SHARON','XAVIER',25);
insert into list values ('RON','PETER',17);
insert into list values ('KIM','BENNY',14);
select * from list;
firstname | lastname | age
-----------+----------+-----
SHARON | XAVIER | 25
RON | PETER | 17
KIM | BENNY | 14
i need to create JSON array from this table like this ::
[ column name : column value]
[
{ "firstname" : "SHARON","lastname" : "XAVIER" , "age" : 25},
{ "firstname" : "RON","lastname" : "PETER" , "age" : 17},
{ "firstname" : "KIM","lastname" : "BENNY" , "age" : 14}
]
any possible options ?
You can use to_jsonb() to convert an entire row to a JSON value, then use jsonb_agg() to aggregate all those into a single JSON array:
select jsonb_agg(to_jsonb(l))
from list l;
Online example

Sort and Select Top 5 JSON values

I have a two-fold issue and looking for clues as to how to approach it.
I have a json file that is formatted as such:
{
"code": 2000,
"data": {
"1": {
"attribute1": 40,
"attribute2": 1.4,
"attribute3": 5.2,
"attribute4": 124
"attribute5": "65.53%"
},
"94": {
"attribute1": 10,
"attribute2": 4.4,
"attribute3": 2.2,
"attribute4": 12
"attribute5": "45.53%"
},
"96": {
"attribute1": 17,
"attribute2": 9.64,
"attribute3": 5.2,
"attribute4": 62
"attribute5": "51.53%"
}
},
"message": "SUCCESS"
}
My goals are to:
I would first like to sort the data by any of the attributes.
There are around 100 of these, I would like to grab the top 5 (depending on how they are sorted), then...
Output the data in a table e.g.:
These are sorted by: attribute5
---
attribute1 | attribute2 | attribute3 | attribute4 | attribute5
40 |1.4 |5.2|124|65.53%
17 |9.64|5.2|62 |51.53%
10 |4.4 |2.2|12 |45.53%
*also, attribute5 above is a string value
Admittedly, my knowledge here is very limited.
I attempted to mimick the method used here:
python sort list of json by value
I managed to open the file and I can extract the key values from a sample row:
import json
jsonfile = path-to-my-file.json
with open(jsonfile) as j:
data=json.load(j)
k = data["data"]["1"].keys()
print(k)
total=data["data"]
for row in total:
v = data["data"][str(row)].values()
print(v)
this outputs:
dict_keys(['attribute1', 'attribute2', 'attribute3', 'attribute4', 'attribute5'])
dict_values([1, 40, 1.4, 5.2, 124, '65.53%'])
dict_values([94, 10, 4.4, 2.2, 12, '45.53%'])
dict_values([96, 17, 9.64, 5.2, 62, '51.53%'])
Any point in the right direction would be GREATLY appreciated.
Thanks!
If you don't mind using pandas you could do it like this
import pandas as pd
rows = [v for k,v in data["data"].items()]
df = pd.DataFrame(rows)
# then to get the top 5 values by attribute can choose either ascending
# or descending with the ascending keyword and head prints the top 5 rows
df.sort_values('attribute1', ascending=True).head()
This will allow you to sort by any attribute you need at any time and print out a table.
Which will produce output like this depending on what you sort by
attribute1 attribute2 attribute3 attribute4 attribute5
0 40 1.40 5.2 124 65.53%
1 10 4.40 2.2 12 45.53%
2 17 9.64 5.2 62 51.53%
I'll leave this answer here in case you don't want to use pandas but the answer from #MatthewBarlowe is way less complicated and I recommend that.
For sorting by a specific attribute, this should work:
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
Now, sorted_keys is a list of the keys in order of the attribute they were sorted by.
Then, to print this as a table, I used the tabulate library. The final code for me looked like this:
from tabulate import tabulate
import json
SORT_BY = "attribute4"
with open("test.json") as j:
data = json.load(j)
items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))
print(f"\nSorted by: {SORT_BY}")
print(
tabulate(
[
[sorted_keys[i], *items[sorted_keys[i]].values()]
for i, _ in enumerate(items)
],
headers=["Column", *items["1"].keys()],
)
)
When sorting by 'attribute5', this outputs:
Sorted by: attribute5
Column attribute1 attribute2 attribute3 attribute4 attribute5
-------- ------------ ------------ ------------ ------------ ------------
1 40 1.4 5.2 124 65.53%
96 17 9.64 5.2 62 51.53%
94 10 4.4 2.2 12 45.53%

Pandas converts string-typed JSON value to INT

I have list of objects as JSON. Each object has two properties: id(string) and arg(number).
When I use pandas.read_json(...), the resulting DataFrame has the id interpreted as number as well, which causes problems, since information is lost.
import pandas as pd
json = '[{ "id" : "1", "arg": 1 },{ "id" : "1_1", "arg": 2}, { "id" : "11", "arg": 2}]'
df = pd.read_json(json)
I'd expect to have a DataFrame like this:
id arg
0 "1" 1
1 "1_1" 2
2 "11" 2
I get
id arg
0 1 1
1 11 2
2 11 2
and suddenly, the once unique id is not so unique anymore.
How can I tell pandas to stop doing that?
My search so far only yielded results, where people where trying to achive the opposite - having columns of string beeing interpreted as numbers. I totally don't want to achive that in this case!
If you set the dtype parameter to False, read_json will not infer the types automatically:
df = pd.read_json(json, dtype=False)
Use dtype parameter for preventing cast id to numbers:
df = pd.read_json(json, dtype={'id':str})
print (df)
id arg
0 1 1
1 1_1 2
2 11 2
print (df.dtypes)
id object
arg int64
dtype: object

Subset a JSON Object with MySQL Query

I have a MySQL database and one of the tables is called 'my_table'. In this table, one of the columns is called 'my_json_column' and this column is stored as a JSON object in MySQL. The JSON object has about 17 key:value pairs (see below). I simply want to return a "slimmed-down" JSON Object from a MySQL query that returns 4 of the 17 fields.
I have tried many different MySQL queries, see below, but I can't seem to get a returned subset JSON Object. I am sure it is simple, but I have been unsuccessful.
Something like this:
SELECT
json_extract(my_json_column, '$.X'),
json_extract(my_json_column, '$.Y'),
json_extract(my_json_column, '$.KB'),
json_extract(my_json_column, '$.Name')
FROM my_table;
yields:
5990.510000 90313.550000 5990.510000 "Operator 1"
I want to get this result instead (a returned JSON Object) with key value pairs:
[ { X: 5990.510000, Y: 90313.550, KB: 2105, Name: "Well 1" } ]
Sample data:
{
"Comment" : "No Comment",
"Country" : "USA",
"County" : "County 1",
"Field" : "Field 1",
"GroundElevation" : "5400",
"Identifier" : "11435358700000",
"Interpreter" : "Interpreter 1",
"KB" : 2105,
"Name" : "Well 1",
"Operator" : "Operator 1",
"Owner" : "me",
"SpudDate" : "NA",
"State" : "MI",
"Status" : "ACTIVE",
"TotalDepth" : 5678,
"X" : 5990.510000,
"Y" : 90313.550
}
Thank you in advance.
Use JSON_OBJECT(), available since MySQL 5.6:
Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs
SELECT
JSON_OBJECT(
'X', json_extract(my_json_column, '$.X'),
'Y', json_extract(my_json_column, '$.Y'),
'KB', json_extract(my_json_column, '$.KB'),
'Name', json_extract(my_json_column, '$.Name')
) my_new_json
FROM my_table;
This demo on DB Fiddle with your sample data returns:
| my_new_json |
| ----------------------------------------------------------- |
| {"X": 5990.51, "Y": 90313.55, "KB": 2105, "Name": "Well 1"} |

How to export pandas dataframe to json in specific format

My dataframe is
'col1' , 'col2'
A , 89
A , 232
C , 545
D , 998
and would like to export as follow :
{
'A' : [ 89, 232 ],
'C' : [545],
'D' : [998]
}
However, all the to_json does not fit this format (orient='records', ...).
Is there a way to ouput like this ?
Use groupby for convert to list and then to_json:
json = df.groupby('col1')['col2'].apply(list).to_json()
print (json)
{"A":[89,232],"C":[545],"D":[998]}
Detail:
print (df.groupby('col1')['col2'].apply(list))
col1
A [89, 232]
C [545]
D [998]
Name: col2, dtype: object