JSON to R data frame: preserve repeated values

JSON to R data frame: preserve repeated values - json

I have a JSON data source that is a list of objects. Some of the object properties are themselves lists. I want to turn the whole thing into a data frame, preserving the lists as data frame values.
Example JSON data:
[{
"id": "A",
"p1": [1, 2, 3],
"p2": "foo"
},{
"id": "B",
"p1": [4, 5, 6],
"p2": "bar"
}]
Desired data frame:
id p2 p1
1 A foo 1, 2, 3
2 B bar 4, 5, 6
Failed attempt 1
I have found this nicely straightforward way of parsing my JSON:
unlisted_data <- lapply(fromJSON(json_str), function(x){unlist(x)})
data.frame(do.call("rbind", unlisted_data))
However, the unlisting process spreads my repeated value across multiple columns:
id p11 p12 p13 p2
1 A 1 2 3 foo
2 B 4 5 6 bar
I expected that calling unlist with the recursive = FALSE option would take care of this, but it doesn't.
Failed attempt 2
I noticed that I can almost do this with the I function:
> data.frame(I(parsed_json[[1]]))
parsed_json..1..
id A
p1 1, 2, 3
p2 foo
But the rows and columns are reversed. Transposing the result mangles the repeated data:
> t(data.frame(I(parsed_json[[1]])))
id p1 p2
parsed_json..1.. "A" Numeric,3 "foo"

The jsonlite package can handle this just fine:
library(jsonlite)
fromJSON(txt)
# id p1 p2
#1 A 1, 2, 3 foo
#2 B 4, 5, 6 bar
fromJSON(txt)$p1
#[[1]]
#[1] 1 2 3
#
#[[2]]
#[1] 4 5 6

Related

MySQL merge json field with new data while removing duplicates, where the json values are simple scalar values

Suppose that I have a MySQL table with a JSON field that contains only numbers, like this (note: using MySQL 8):
CREATE TABLE my_table (
id int,
some_field json
);
Sample data:
id: 1
some_field: [1, 2, 3, 4, 5]
id: 2
some_field: [3, 6, 7]
id: 3
some_field: null
I would like to merge another array of data with the existing values of some_field, while removing duplicates. I was hoping that this might work, but it didn't:
update my_table set some_field = JSON_MERGE([1, 2, 3], some_field)
The result of this would be:
id: 1
some_field: [1, 2, 3, 4, 5]
id: 2
some_field: [1, 2, 3, 6, 7]
id: 3
some_field: [1, 2, 3]

Considering you have 3 records in your table and you want to merge 1 and 2 as mentioned in your example.
I hope JavaScript is suitable to follow through for you.
// Get both the records
const records = db.execute(“SELECT id, some_field FROM my_table WHERE id=1 OR id=2”);
// You get both the rows.
// Merging row1, you can either use the Set data structure if you’re dealing with numbers like your example, or you could loop using a map and use the spread operator if using JSON. Since your object is an array, I’ll just be explaining to merge 2 arrays.
records[0].some_field = Array.from(new Set(records[0].some_field + record[1].some_field))
// Same for second record.
records[1].some_field = Array.from(new Set(records[0].some_field + record[1].some_field))
// Now update both the records in the database one by one.

data transformation from pandas to json

I have a dataframe df:
d = {'col1': [1, 2,0,55,12,3], 'col3': ['A','A','A','B','B','B'] }
df = pd.DataFrame(data=d)
df
col1 col3
0 1 A
1 2 A
2 0 A
3 55 B
4 12 B
6 3 B
and want to build a Json from it, as the results looks like this :
json_result = { 'A' : [1,2,0], 'B': [55,12,3] }
basically, I would like for each group of the col3 to affect an array of its corresponding values from the dataframe

Aggregate list and then use Series.to_json:
print (df.groupby('col3')['col1'].agg(list).to_json())
{"A":[1,2,0],"B":[55,12,3]}
or if need dictionary use Series.to_dict:
print (df.groupby('col3')['col1'].agg(list).to_dict())
{'A': [1, 2, 0], 'B': [55, 12, 3]}

Select if array contains an element of another array

I have a table which has a JSON type field where I save a number array like [1, 2, 3, 4].
I want to select records in which its array set contains at least one element of another array I have in a php script.
I know that the JSON_CONTAINS function can be used to see if my array contains an element, but how can I select if both arrays has at least a common number (no matter in what index).
For example:
[1, 2, 3] and [5, 0, 2] -> True
[9, 2, 1] and [0, 5, 3] -> False
[4, 0, 2] and [4, 2, 6] -> True
Currently, Im using multiple JSON_CONTAINS to check if there are common elements, this way:
SELECT *
FROM mytable
WHERE JSON_CONTAINS(ar, 0, '$') OR
JSON_CONTAINS(ar, 1, '$') OR
JSON_CONTAINS(ar, 2, '$')
But I guess there may be a more elegant way of doing this.
I searched but couldn't find the appropiate function, but if this is a dupe, let me know.
Thanks in advance!

https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-overlaps
mysql> SELECT JSON_OVERLAPS("[1,3,5,7]", "[2,5,7]");
+---------------------------------------+
| JSON_OVERLAPS("[1,3,5,7]", "[2,5,7]") |
+---------------------------------------+
| 1 |
+---------------------------------------+

Pandas converts string-typed JSON value to INT

I have list of objects as JSON. Each object has two properties: id(string) and arg(number).
When I use pandas.read_json(...), the resulting DataFrame has the id interpreted as number as well, which causes problems, since information is lost.
import pandas as pd
json = '[{ "id" : "1", "arg": 1 },{ "id" : "1_1", "arg": 2}, { "id" : "11", "arg": 2}]'
df = pd.read_json(json)
I'd expect to have a DataFrame like this:
id arg
0 "1" 1
1 "1_1" 2
2 "11" 2
I get
id arg
0 1 1
1 11 2
2 11 2
and suddenly, the once unique id is not so unique anymore.
How can I tell pandas to stop doing that?
My search so far only yielded results, where people where trying to achive the opposite - having columns of string beeing interpreted as numbers. I totally don't want to achive that in this case!

If you set the dtype parameter to False, read_json will not infer the types automatically:
df = pd.read_json(json, dtype=False)

Use dtype parameter for preventing cast id to numbers:
df = pd.read_json(json, dtype={'id':str})
print (df)
id arg
0 1 1
1 1_1 2
2 11 2
print (df.dtypes)
id object
arg int64
dtype: object

Double Nested JSON to DF

I'm unable to make this JSON:
{
“profiles”: {
“1”: {
“id”: “1”,
“property1”: “value1”,
“property2”: “value2”
},
“2”: {
“id”: “2”,
“property1”: “value21”,
“property2”: “value22”
}
}}
To this format
Desired output
Id Property1 Property2
1 Value1 Value2
2 Value21 Value22
I've attempted different approaches, that just result in one col all data.
Can someone please orient me on this?

Based on this example:
data = {'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']}
pd.DataFrame.from_dict(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
I would suggest something like:
your_json = {<your_json>}
property1 = []
property2 = []
for key, value in your_json.items():
for k, v in value.items():
property1.append(v['property1'])
property2.append(v['property2'])
data = {'property1': property1, 'property2': property2}
tt = pd.DataFrame.from_dict(data)
print(tt)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

JSON to R data frame: preserve repeated values - json

The jsonlite package can handle this just fine: library(jsonlite) fromJSON(txt) # id p1 p2 #1 A 1, 2, 3 foo #2 B 4, 5, 6 bar fromJSON(txt)$p1 #[[1]] #[1] 1 2 3 # #[[2]] #[1] 4 5 6

Related

MySQL merge json field with new data while removing duplicates, where the json values are simple scalar values

data transformation from pandas to json

Select if array contains an element of another array

Pandas converts string-typed JSON value to INT

Double Nested JSON to DF

Categories

Resources