Dataframe in R to be converted to sequence of JSON objects - json

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.

library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex

Related

Postgres json to view

I have a table like this (with an jsonb column):
https://dbfiddle.uk/lGuHdHEJ
If I load this json with python into a dataframe:
import pandas as pd
import json
data={
"id": [1, 2],
"myyear": [2016, 2017],
"value": [5, 9]
}
data=json.dumps(data)
df=pd.read_json(data)
print (df)
I get this result:
id myyear value
0 1 2016 5
1 2 2017 9
How can a get this result directly from the json column via sql in a postgres view?
Note: This assumes that your id, my_year, and value array are consistent and have the same length.
This answer uses PostgresSQL's json_array_elements_text function to explode array elements to the rows.
select jsonb_array_elements_text(payload -> 'id') as "id",
jsonb_array_elements_text(payload -> 'bv_year') as "myyear",
jsonb_array_elements_text(payload -> 'value') as "value"
from main
And this gives the below output,
id myyear value
1 2016 5
2 2017 9
Although this is not the best design to store the properties in jsonb object and could lead to data inconsistencies later. If it's in your control I would suggest storing the data where each property's mapping is clear. Some suggestions,
You can instead have separate columns for each property.
If you want to store it as jsonb only then consider [{id: "", "year": "", "value": ""}]

How to Change a value in a Dataframe based on a lookup from a json file

I want to practice building models and I figured that I'd do it with something that I am familiar with: League of Legends. I'm having trouble replacing an integer in a dataframe with a value in a json.
The datasets I'm using come off of the kaggle. You can grab it and run it for yourself.
https://www.kaggle.com/datasnaek/league-of-legends
I have json file of the form: (it's actually must bigger, but I shortened it)
{
"type": "champion",
"version": "7.17.2",
"data": {
"1": {
"title": "the Dark Child",
"id": 1,
"key": "Annie",
"name": "Annie"
},
"2": {
"title": "the Berserker",
"id": 2,
"key": "Olaf",
"name": "Olaf"
}
}
}
and dataframe of the form
print df
gameDuration t1_champ1id
0 1949 1
1 1851 2
2 1493 1
3 1758 1
4 2094 2
I want to replace the ID in t1_champ1id with the lookup value in the json.
If both of these were dataframe, then I could use the merge option.
This is what I've tried. I don't know if this is the best way to read in the json file.
import pandas
df = pandas.read_csv("lol_file.csv",header=0)
champ = pandas.read_json("champion_info.json", typ='series')
for i in champ.data[0]:
for j in df:
if df.loc[j,('t1_champ1id')] == i:
df.loc[j,('t1_champ1id')] = champ[0][i]['name']
I get the below error:
the label [gameDuration] is not in the [index]'
I'm not sure that this is the most efficient way to do this, but I'm not sure how to do it at all either.
What do y'all think?
Thanks!
for j in df: iterates over the column names in df, which is unnecessary, since you're only looking to match against the column 't1_champ1id'. A better use of pandas functionality is to condense the id:name pairs from your JSON file into a dictionary, and then map it to df['t1_champ1id'].
player_names = {v['id']:v['name'] for v in json_file['data'].itervalues()}
df.loc[:, 't1_champ1id'] = df['t1_champ1id'].map(player_names)
# gameDuration t1_champ1id
# 0 1949 Annie
# 1 1851 Olaf
# 2 1493 Annie
# 3 1758 Annie
# 4 2094 Olaf
Created a dataframe from the 'data' in the json file (also transposed the resulting dataframe and then set the index to what you want to map, the id) then mapped that to the original df.
import json
with open('champion_info.json') as data_file:
champ_json = json.load(data_file)
champs = pd.DataFrame(champ_json['data']).T
champs.set_index('id',inplace=True)
df['champ_name'] = df.t1_champ1id.map(champs['name'])

Loading JSON data to a list in a particular order using PyMongo

Let's say I have the following document in a MongoDB database:
{
"assist_leaders" : {
"Steve Nash" : {
"team" : "Phoenix Suns",
"position" : "PG",
"draft_data" : {
"class" : 1996,
"pick" : 15,
"selected_by" : "Phoenix Suns",
"college" : "Santa Clara"
}
},
"LeBron James" : {
"team" : "Cleveland Cavaliers",
"position" : "SF",
"draft_data" : {
"class" : 2003,
"pick" : 1,
"selected_by" : "Cleveland Cavaliers",
"college" : "None"
}
},
}
}
I'm trying to collect a few values under "draft_data" for each player in an ORDERED list. The list needs to look like the following for this particular document:
[ [1996, 15, "Phoenix Suns"], [2003, 1, "Cleveland Cavaliers"] ]
That is, each nested list must contain the values corresponding to the "pick", "selected_by", and "class" keys, in that order. I also need the "Steve Nash" data to come before the "LeBron James" data.
How can I achieve this using pymongo? Note that the structure of the data is not set in stone so I can change this if that makes the code simpler.
I'd extract the data and turn it into a list in Python, once you've retrieved the document from MongoDB:
for doc in db.collection.find():
for name, info in doc['assist_leaders'].items():
draft_data = info['draft_data']
lst = [draft_data['class'], draft_data['pick'], draft_data['selected_by']]
print name, lst
List comprehension is the way to go here (Note: don't forget .iteritems() in Python2 or .items() in Python3 or you'll get a ValueError: too many values to unpack).
import pymongo
import numpy as np
client = pymongo.MongoClient()
db = client[database_name]
dataList = [v for i in ["Steve Nash", "LeBron James"]
for key in ["class", "pick", "selected_by"]
for document in db.collection_name.find({"assist_leaders": {"$exists": 1}})
for k, v in document["assist_leaders"][i]["draft_data"].iteritems()
if k == key]
print dataList
# [1996, 15, "Phoenix Suns", 2003, 1, "Cleveland Cavaliers"]
matrix = np.reshape(dataList, [2,3])
print matrix
# [ [1996, 15, "Phoenix Suns"],
# [2003, 1, "Cleveland Cavaliers"] ]

Ragged list or data frame to JSON

I am trying to create a ragged list in R that corresponds to the D3 tree structure of flare.json. My data is in a data.frame:
path <- data.frame(P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic","",""),
P4=c("end","end","",""), size=c(5,12,23,45))
path
P1 P2 P3 P4 size
1 direct direct direct end 5
2 direct direct organic end 12
3 organic end 23
4 direct end 45
but it could also be a list or reshaped if necessary:
path <- list()
path[[1]] <- list(name=c("direct","direct","direct","end"),size=5)
path[[2]] <- list(name=c("direct","direct","organic","end"), size=12)
path[[3]] <- list(name=c("organic", "end"), size=23)
path[[4]] <- list(name=c("direct", "end"), size=45)
The desired output is:
rl <- list()
rl <- list(name="root", children=list())
rl$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[1] <- list(list(name="direct", children=list()))
rl$children[[1]]$children[[1]]$children[[1]]$children[1] <- list(list(name="end", size=5))
rl$children[[1]]$children[[1]]$children[2] <- list(list(name="organic", children=list()))
rl$children[[1]]$children[[1]]$children[[2]]$children[1] <- list(list(name="end", size=12))
rl$children[[1]]$children[2] <- list(list(name="end", size=23))
rl$children[2] = list(list(name="organic", children=list()))
rl$children[[2]]$children[1] <- list(list(name="end", size=45))
So when I print to json it's:
require(RJSONIO)
cat(toJSON(rl, pretty=T))
{
"name" : "root",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "direct",
"children" : [
{
"name" : "end",
"size" : 5
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 12
}
]
}
]
},
{
"name" : "end",
"size" : 23
}
]
},
{
"name" : "organic",
"children" : [
{
"name" : "end",
"size" : 45
}
]
}
]
}
I am having a hard time wrapping my head around the recursive steps that are necessary to create this list structure in R. In JS I can pretty easily move around the nodes and at each node determine whether to add a new node or keep moving down the tree by using push as needed, eg: new = {"name": node, "children": []}; or new = {"name": node, "size": size}; as in this example. I tried to split the data.frame as in this example:
makeList<-function(x){
if(ncol(x)>2){
listSplit<-split(x,x[1],drop=T)
lapply(names(listSplit),function(y){list(name=y,children=makeList(listSplit[[y]]))})
} else {
lapply(seq(nrow(x[1])),function(y){list(name=x[,1][y],size=x[,2][y])})
}
}
jsonOut<-toJSON(list(name="root",children=makeList(path)))
but it gives me an error
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
The function given in the linked Q&A is essentially what you need, however it was failing on your data set because of the null values for some rows in the later columns. Instead of just blindly repeating the recursion until you run out of columns, you need to check for your "end" value, and use that to switch to making leaves:
makeList<-function(x){
listSplit<-split(x[-1],x[1], drop=TRUE);
lapply(names(listSplit),function(y){
if (y == "end") {
l <- list();
rows = listSplit[[y]];
for(i in 1:nrow(rows) ) {
l <- c(l, list(name=y, size=rows[i,"size"] ) );
}
l;
}
else {
list(name=y,children=makeList(listSplit[[y]]))
}
});
}
I believe this does what you want, though it has some limitations. In particular, it is assumed that every branch in your network is unique (i.e. there can't be two rows in your data frame that are equal for every column other than size):
df.split <- function(p.df) {
p.lst.tmp <- unname(split(p.df, p.df[, 1]))
p.lst <- lapply(
p.lst.tmp,
function(x) {
if(ncol(x) == 2L && nrow(x) == 1L) {
return(list(name=x[1, 1], size=unname(x[, 2])))
} else if (isTRUE(is.na(unname(x[ ,2])))) {
return(list(name=x[1, 1], size=unname(x[, ncol(x)])))
}
list(name=x[1, 1], children=df.split(x[, -1, drop=F]))
}
)
p.lst
}
all.equal(rl, df.split(path)[[1]])
# [1] TRUE
Though note you had the organic size switched, so I had to fix your rl to get this result (rl has it as 45, but your path as 23). Also, I modified your path data.frame slightly:
path <- data.frame(
root=rep("root", 4),
P1=c("direct","direct","organic","direct"),
P2=c("direct","direct","end","end"),
P3=c("direct","organic",NA,NA),
P4=c("end","end",NA,NA),
size=c(5,12,23,45),
stringsAsFactors=F
)
WARNING: I haven't tested this with other structures, so it's possible it will hit corner cases that you'll need to debug.

R list(structure(list())) to data frame

I have a JSON data source providing a list of hashes:
[
{ "a": "foo",
"b": "sdfshk"
},
{ "a": "foo",
"b": "ihlkyhul"
}
]
I use fromJSON() in the rjson package to convert that to an R data structure. It returns:
list(
structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b"))
)
I need to get this into an R data frame, but data.frame() turns that into a single-row data frame with four columns instead of a 2x2 data frame as expected. I lack the R-fu to do the transform from one to the other, though it looks like it should be straightforward.
Bonus points:
The actual problem is a bit more complex, because the JSON data source isn't as regular as I show above. The objects it returns vary in type. That is, the field set in each can be one of a few different types:
[
{ "a": "foo",
"b": "asdfhalsdhfla"
},
{ "a": "bar",
"c": "akjdhflakjhsdlfkah",
"d": "jfhglskhfglskd",
},
{ "a": "foo",
"b": "dfhlkhldsfg"
}
]
As you can see, the "a" field in each object is a type tag, indicating which other fields the object will have.
I'm not too particular how the solution copes with this.
It wouldn't be horrible if the two object types were just mooshed together, so you get columns a, b, c, and d, and the rows simply have N/A or NULL values where the JSON source object doesn't have a value for a given field. I believe I can clean the resulting data frame with subset(df, a == "foo"). I'll end up with some empty columns that way, but it won't matter to my program.
It would be better if the solution provides a way to select which JSON source rows go into the data frame and which get rejected, so the result has only the columns and rows actually required.
If you have a jagged list you want converted to a data.frame, you could use Hadley's plyr's rbind.fill. Saved my neck on a couple of occasions. Let me know if this is what you're looking for. Notice that I modified your first example to include only "b" in the third element to make it jagged.
> x <- list(
+ structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
+ structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b")),
+ structure(list(b = "asdf"), .Names = "b")
+ )
>
> library(plyr)
> do.call("rbind.fill", lapply(x, as.data.frame))
a b
1 foo sdfshk
2 foo ihlkyhul
3 <NA> asdf