unexpected predictions behavior using VW cover - reinforcement-learning

I am using vowpalwabbit for a contextual bandit problem. I want to use the cover option as explained here.
I am facing 2 issues with this:
once the learning phase is over and I use the vw model to make predictions, these predictions (in this case, pmf over the actions) are not stable
if I save the model and reload it to memory, the predictions are different
Here is an example (using the python wrapper of VW):
import vowpalwabbit.pyvw as pyvw
data_train = ["1:0:0.5 |features a b", "2:-1:0.5 |features a c", "2:0:0.5 |features b c",
"1:-2:0.5 |features b d", "2:0:0.5 |features a d", "1:0:0.5 |features a c d",
"1:-1:0.5 |features a c", "2:-1:0.5 |features a c"]
data_test = ["|features a b", "|features a b"]
model1 = pyvw.vw(cb_explore=2, cover=10)
for data in data_train:
model1.learn(data)
model1.save("saved_model.model")
model2 = pyvw.vw(cb_explore=2, cover=10, i="saved_model.model")
for data in data_test:
print(data)
print(model1.predict(data))
print(model2.predict(data))
I get the following output:
|features a b
[0.75, 0.25]
[0.5, 0.5]
|features a b
[0.7642977237701416, 0.2357022762298584]
[0.5, 0.5]
As you can see, predictions for model 1 are changing (slightly) while predictions for model 2 (which should be the same as model 1) are different.
If I replace cover with bag, I do not get this problem. What is the explanation for this, and is there a way to fix it in VW?

Thank you for reporting this, this seems like a bug.
I have opened an issue for this here so you can follow the progress.

Related

How to constrain a parameter in a mixed effect model in R?

I'm trying to fit a mixed effect model with a constrained parameter, and am struggling to make it work. Adding a small bit of complexity, is that one of the terms should be a polynomial.
Essentially what I'm looking for is something like the following, where var 1 is fixed at a certain value.
mod1 <- lmer(outcome ~ var1 + poly(var2,2) + (1 | Study), df)
It seems like it can be done using lmer with the Nelder-Mead option, but I can quite wrap my head around how to make it work.
I've also tried using the lavaan package, but I've never used it before and am getting hung up somewhere. Here is an example...
library(lavaan)
reprex_df <- structure(list(outcome = c(0.54, 5.06, 15.35, 5.4, 5.3, 1.57,
2.11, 2.71, 9.09, 7.96, 28.8, 4.4, 3.38, 15.43, 4.05), var1 = c(0.55,
3.42, 2.24, 2.24, 3.44, 1.82, 1.82, 2.23, 5.41, 2.61, 6.94, 3.98,
2.23, 5.29, 3.28), var2 = c(111, 235, 60, 197, 369, 342.78, 240.99,
406.5, 264, 263.8, 76, 679, 338, 116, 683), study = c("Study 1",
"Study 2", "Study 2", "Study 2", "Study 3",
"Study 4", "Study 4", "Study 6", "Study 5",
"Study 7", "Study 2", "Study 7", "Study 6",
"Study 5", "Study 2")), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"))
I think I can make a basic model (without the polynomial)
reprex_df
test.model <- ' outcome ~ var1 + var2 + study'
test.model <- sem(test.model,
data = reprex_df, cluster = "study")
coef(test.model)
But when I try and constrain var1 to a specific value I'm getting an error
test.model.constr <- ' outcome ~ var1 + var2 + study
var1 == 4.87
'
test.model.constr <- sem(test.model.constr,
data = reprex_df, cluster = "study")
Any help in constraining the parameter (using either lmer or lavaan) and/or adding a polynomial term in lavaan would be very much appreciated.
It's tricky because lmer "profiles out" the fixed-effect parameters, i.e. they're not explicitly fitted as part of the nonlinear optimization step.
Assuming var1 is numeric/continuous and we want to set a coefficient of b, how about
mod1 <- lmer(outcome ~ 1 + offset(b*var1) + poly(var2,2) + (1 | Study), df)
? This adds the term b*var1 directly to the model.
The glmmTMB package has a map argument that allows the user to fix any of the parameters explicitly to a particular value (or to constrain several parameters to have a common value).

How to Change a value in a Dataframe based on a lookup from a json file

I want to practice building models and I figured that I'd do it with something that I am familiar with: League of Legends. I'm having trouble replacing an integer in a dataframe with a value in a json.
The datasets I'm using come off of the kaggle. You can grab it and run it for yourself.
https://www.kaggle.com/datasnaek/league-of-legends
I have json file of the form: (it's actually must bigger, but I shortened it)
{
"type": "champion",
"version": "7.17.2",
"data": {
"1": {
"title": "the Dark Child",
"id": 1,
"key": "Annie",
"name": "Annie"
},
"2": {
"title": "the Berserker",
"id": 2,
"key": "Olaf",
"name": "Olaf"
}
}
}
and dataframe of the form
print df
gameDuration t1_champ1id
0 1949 1
1 1851 2
2 1493 1
3 1758 1
4 2094 2
I want to replace the ID in t1_champ1id with the lookup value in the json.
If both of these were dataframe, then I could use the merge option.
This is what I've tried. I don't know if this is the best way to read in the json file.
import pandas
df = pandas.read_csv("lol_file.csv",header=0)
champ = pandas.read_json("champion_info.json", typ='series')
for i in champ.data[0]:
for j in df:
if df.loc[j,('t1_champ1id')] == i:
df.loc[j,('t1_champ1id')] = champ[0][i]['name']
I get the below error:
the label [gameDuration] is not in the [index]'
I'm not sure that this is the most efficient way to do this, but I'm not sure how to do it at all either.
What do y'all think?
Thanks!
for j in df: iterates over the column names in df, which is unnecessary, since you're only looking to match against the column 't1_champ1id'. A better use of pandas functionality is to condense the id:name pairs from your JSON file into a dictionary, and then map it to df['t1_champ1id'].
player_names = {v['id']:v['name'] for v in json_file['data'].itervalues()}
df.loc[:, 't1_champ1id'] = df['t1_champ1id'].map(player_names)
# gameDuration t1_champ1id
# 0 1949 Annie
# 1 1851 Olaf
# 2 1493 Annie
# 3 1758 Annie
# 4 2094 Olaf
Created a dataframe from the 'data' in the json file (also transposed the resulting dataframe and then set the index to what you want to map, the id) then mapped that to the original df.
import json
with open('champion_info.json') as data_file:
champ_json = json.load(data_file)
champs = pd.DataFrame(champ_json['data']).T
champs.set_index('id',inplace=True)
df['champ_name'] = df.t1_champ1id.map(champs['name'])

igraph layout_reingold_tilford gives errors

I get the following error message when trying to use the layout_reingold_tilford layout
File "C:\Python27\lib\site-packages\igraph\layout.py", line 80, in init
self._coords = [list(coord) for coord in coords]
TypeError: 'int' object is not iterable
I have found the following question which has a simple question and answer but when I try the example I get the same error
Plot a tree-like graph with root node at the top
import igraph as ig
g = ig.Graph(n = 12, directed=True)
g.add_edges([(1,0),(2,1), (3,2), (4,3),
(5,1),
(6,2), (7,6), (8,7),
(9,0),
(10,0), (11,10)])
g.vs["label"] = ["A", "B", "A", "B", "C", "F", "C", "B", "D", "C", "D", "F"]
layout = g.layout_reingold_tilford(mode="in", root=0)
ig.plot(g, layout=layout)
Looking at the C implementation of this function, root is considered to be in iterable only, however the documentation is a bit confusing: "the index of the root vertex or root vertices".
Try to use root=[0] instead.

R: Converting a nested list with empty elements to data.frame (from json)

I have imported a json file like this one:
library(rjson)
json_str <- '[{"id": 1, "code": 7909, "text": [{"col1": "a", "col2": "some text"}], "date": "2015-12-01"}, {"id": 2, "code": 7651, "text": [], "date": "2015-12-01"}, {"id": 3, "code": 4768, "text": [{"col1": "aaa", "col2": "Blah, blah"}, {"col1": "bbb", "col2": "Blah, blah, blah"}], "date": "2015-12-01"}]'
my.list <- fromJSON(json_str)
str(my.list)
Needless to say the real file is much longer.
As a result I get a nested list of 3 elements where each element is a list of 4, and then, the element $text is a list of variable length from nothing to any number of elements, in my case, usually no more than 3.
After some research I have found several answers about converting a list to data.frame, for example here and here. However, none of them work when one or more of the nested lists in '$text` is empty.
do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))
library(data.table)
rbindlist(my.list, fill=TRUE)
Both return an error.
I would like to either convert the list in $text to several columns of the data.frame or just one (pasting the content).
Another option would be to be able to skip some elements (say $text) and convert the rest of the list, then in a separate line convert those elements (say $text) to a different data.frame. I think I could somehow relate one data.frame to the other.
Can anyone give me any idea on how to do this.
Thanks
By the sounds of it, something like the following should work:
do.call(rbind.data.frame, lapply(my.list, function(x) {
x[["text"]] <- toString(unlist(x[["text"]]))
x
}))
## id code text date
## 2 1 7909 a, some text 2015-12-01
## 21 2 7651 2015-12-01
## 3 3 4768 aaa, Blah, blah, bbb, Blah, blah, blah 2015-12-01
This follows your idea of pasting the values together (here using toString) to form a single column in the data.frame.

Dataframe in R to be converted to sequence of JSON objects

I had asked the same question after editing 2 times of a previous question I had posted. I am sorry for the bad usage of this website. I have flagged it for deletion and I am posting a proper new question on the same here. Please look into this.
I am basically working on a recommender system code. The output has to be converted to sequence of JSON objects. I have a matrix that has a look up table for every item ID, with the list of the closest items it is related to and the the similarity scores associated with their combinations.
Let me explain through a example.
Suppose I have a matrix
In the below example, Item 1 is similar to Items 22 and 23 with similarity scores 0.8 and 0.5 respectively. And the remaining rows follow the same structure.
X1 X2 X3 X4 X5
1 22 23 0.8 0.5
34 4 87 0.4 0.4
23 7 92 0.6 0.5
I want a JSON structure for every item (every X1 for every row) along with the recommended items and the similarity scores for each combination as a separate JSON entity and this being done in sequence. I don't want an entire JSON object containing these individual ones.
Assume there is one more entity called "coid" that will be given as input to the code. I assume it is XYZ and it is same for all the rows.
{ "_id" : { "coid" : "XYZ", "iid" : "1"}, "items" : [ { "item" : "22", "score" : 0.8},{ "item": "23", "score" : 0.5}] }
{ "_id" : { "coid" : "XYZ", "iid" : "34"},"items" : [ { "item" : "4", "score" : 0.4},{ "item": "87", "score" : 0.4}] }
{ "_id" : { "coid" : "XYZ", "iid" : "23"},"items" : [ { "item" : "7", "score" : 0.6},{ "item": "92", "score" : 0.5}] }
As in the above, each entity is a valid JSON structure/object but they are not put together into a separate JSON object as a whole.
I appreciate all the help done for the previous question but somehow I feel this new alteration I have here is not related to them because in the end, if you do a toJSON(some entity), then it converts the entire thing to one JSON object. I don't want that.
I want individual ones like these to be written to a file.
I am very sorry for my ignorance and inconvenience. Please help.
Thanks.
library(rjson)
## Your matrix
mat <- matrix(c(1,34,23,
22, 4, 7,
23,87,92,
0.8, 0.4, 0.6,
0.5, 0.4, 0.5), byrow=FALSE, nrow=3)
I use a function (not very interesting name makejson) that takes a row of the matrix and returns a JSON object. It makes two list objects, _id and items, and combines them to a JSON object
makejson <- function(x, coid="ABC") {
`_id` <- list(coid = coid, iid=x[1])
nitem <- (length(x) - 1) / 2 # Number of items
items <- list()
for(i in seq(1, nitem)) {
items[[i]] <- list(item = x[i + 1], score = x[i + 1 + nitem])
}
toJSON(list(`_id`=`_id`, items=items))
}
Then using apply (or a for loop) I use the function for each row of the matrix.
res <- apply(mat, 1, makejson, coid="XYZ")
cat(res, sep = "\n")
## {"_id":{"coid":"XYZ","iid":1},"items":[{"item":22,"score":0.8},{"item":23,"score":0.5}]}
## {"_id":{"coid":"XYZ","iid":34},"items":[{"item":4,"score":0.4},{"item":87,"score":0.4}]}
## {"_id":{"coid":"XYZ","iid":23},"items":[{"item":7,"score":0.6},{"item":92,"score":0.5}]}
The result can be saved to a file with cat by specifying the file argument.
## cat(res, sep="\n", file="out.json")
There is a small difference in your output and mine, the numbers are in quotes ("). If you want to have it like that, mat has to be character.
## mat <- matrix(as.character(c(1,34,23, ...
Hope it helps,
alex