JSON to R data frame: preserve repeated values - json

I have a JSON data source that is a list of objects. Some of the object properties are themselves lists. I want to turn the whole thing into a data frame, preserving the lists as data frame values.
Example JSON data:
[{
"id": "A",
"p1": [1, 2, 3],
"p2": "foo"
},{
"id": "B",
"p1": [4, 5, 6],
"p2": "bar"
}]
Desired data frame:
id p2 p1
1 A foo 1, 2, 3
2 B bar 4, 5, 6
Failed attempt 1
I have found this nicely straightforward way of parsing my JSON:
unlisted_data <- lapply(fromJSON(json_str), function(x){unlist(x)})
data.frame(do.call("rbind", unlisted_data))
However, the unlisting process spreads my repeated value across multiple columns:
id p11 p12 p13 p2
1 A 1 2 3 foo
2 B 4 5 6 bar
I expected that calling unlist with the recursive = FALSE option would take care of this, but it doesn't.
Failed attempt 2
I noticed that I can almost do this with the I function:
> data.frame(I(parsed_json[[1]]))
parsed_json..1..
id A
p1 1, 2, 3
p2 foo
But the rows and columns are reversed. Transposing the result mangles the repeated data:
> t(data.frame(I(parsed_json[[1]])))
id p1 p2
parsed_json..1.. "A" Numeric,3 "foo"

The jsonlite package can handle this just fine:
library(jsonlite)
fromJSON(txt)
# id p1 p2
#1 A 1, 2, 3 foo
#2 B 4, 5, 6 bar
fromJSON(txt)$p1
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 4 5 6

Related

MySQL merge json field with new data while removing duplicates, where the json values are simple scalar values

Suppose that I have a MySQL table with a JSON field that contains only numbers, like this (note: using MySQL 8):
CREATE TABLE my_table (
id int,
some_field json
);
Sample data:
id: 1
some_field: [1, 2, 3, 4, 5]
id: 2
some_field: [3, 6, 7]
id: 3
some_field: null
I would like to merge another array of data with the existing values of some_field, while removing duplicates. I was hoping that this might work, but it didn't:
update my_table set some_field = JSON_MERGE([1, 2, 3], some_field)
The result of this would be:
id: 1
some_field: [1, 2, 3, 4, 5]
id: 2
some_field: [1, 2, 3, 6, 7]
id: 3
some_field: [1, 2, 3]
Considering you have 3 records in your table and you want to merge 1 and 2 as mentioned in your example.
I hope JavaScript is suitable to follow through for you.
// Get both the records
const records = db.execute(“SELECT id, some_field FROM my_table WHERE id=1 OR id=2”);
// You get both the rows.
// Merging row1, you can either use the Set data structure if you’re dealing with numbers like your example, or you could loop using a map and use the spread operator if using JSON. Since your object is an array, I’ll just be explaining to merge 2 arrays.
records[0].some_field = Array.from(new Set(records[0].some_field + record[1].some_field))
// Same for second record.
records[1].some_field = Array.from(new Set(records[0].some_field + record[1].some_field))
// Now update both the records in the database one by one.

data transformation from pandas to json

I have a dataframe df:
d = {'col1': [1, 2,0,55,12,3], 'col3': ['A','A','A','B','B','B'] }
df = pd.DataFrame(data=d)
df
col1 col3
0 1 A
1 2 A
2 0 A
3 55 B
4 12 B
6 3 B
and want to build a Json from it, as the results looks like this :
json_result = { 'A' : [1,2,0], 'B': [55,12,3] }
basically, I would like for each group of the col3 to affect an array of its corresponding values from the dataframe
Aggregate list and then use Series.to_json:
print (df.groupby('col3')['col1'].agg(list).to_json())
{"A":[1,2,0],"B":[55,12,3]}
or if need dictionary use Series.to_dict:
print (df.groupby('col3')['col1'].agg(list).to_dict())
{'A': [1, 2, 0], 'B': [55, 12, 3]}

Select if array contains an element of another array

I have a table which has a JSON type field where I save a number array like [1, 2, 3, 4].
I want to select records in which its array set contains at least one element of another array I have in a php script.
I know that the JSON_CONTAINS function can be used to see if my array contains an element, but how can I select if both arrays has at least a common number (no matter in what index).
For example:
[1, 2, 3] and [5, 0, 2] -> True
[9, 2, 1] and [0, 5, 3] -> False
[4, 0, 2] and [4, 2, 6] -> True
Currently, Im using multiple JSON_CONTAINS to check if there are common elements, this way:
SELECT *
FROM mytable
WHERE JSON_CONTAINS(ar, 0, '$') OR
JSON_CONTAINS(ar, 1, '$') OR
JSON_CONTAINS(ar, 2, '$')
But I guess there may be a more elegant way of doing this.
I searched but couldn't find the appropiate function, but if this is a dupe, let me know.
Thanks in advance!
https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-overlaps
mysql> SELECT JSON_OVERLAPS("[1,3,5,7]", "[2,5,7]");
+---------------------------------------+
| JSON_OVERLAPS("[1,3,5,7]", "[2,5,7]") |
+---------------------------------------+
| 1 |
+---------------------------------------+

Pandas converts string-typed JSON value to INT

I have list of objects as JSON. Each object has two properties: id(string) and arg(number).
When I use pandas.read_json(...), the resulting DataFrame has the id interpreted as number as well, which causes problems, since information is lost.
import pandas as pd
json = '[{ "id" : "1", "arg": 1 },{ "id" : "1_1", "arg": 2}, { "id" : "11", "arg": 2}]'
df = pd.read_json(json)
I'd expect to have a DataFrame like this:
id arg
0 "1" 1
1 "1_1" 2
2 "11" 2
I get
id arg
0 1 1
1 11 2
2 11 2
and suddenly, the once unique id is not so unique anymore.
How can I tell pandas to stop doing that?
My search so far only yielded results, where people where trying to achive the opposite - having columns of string beeing interpreted as numbers. I totally don't want to achive that in this case!
If you set the dtype parameter to False, read_json will not infer the types automatically:
df = pd.read_json(json, dtype=False)
Use dtype parameter for preventing cast id to numbers:
df = pd.read_json(json, dtype={'id':str})
print (df)
id arg
0 1 1
1 1_1 2
2 11 2
print (df.dtypes)
id object
arg int64
dtype: object

Double Nested JSON to DF

I'm unable to make this JSON:
{
“profiles”: {
“1”: {
“id”: “1”,
“property1”: “value1”,
“property2”: “value2”
},
“2”: {
“id”: “2”,
“property1”: “value21”,
“property2”: “value22”
}
}}
To this format
Desired output
Id Property1 Property2
1 Value1 Value2
2 Value21 Value22
I've attempted different approaches, that just result in one col all data.
Can someone please orient me on this?
Based on this example:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
I would suggest something like:
your_json = {<your_json>}
property1 = []
property2 = []
for key, value in your_json.items():
for k, v in value.items():
property1.append(v['property1'])
property2.append(v['property2'])
data = {'property1': property1, 'property2': property2}
tt = pd.DataFrame.from_dict(data)
print(tt)