Confusion with Vowpal Wabbit Contextual Bandit training data formatting - reinforcement-learning

I am new to Vowpal Wabbit and am working on a multi-arm bandit model to recommend different CTAs for sign up pop ups. I already completed the walkthrough on the main site but am a bit confuse on what the training data is supposed to look like for the --cb_explore_adf version. So far, for regular versions (with set action totals) the data looks like:
action:cost:probability | features
which makes sense, but then when you get to the adf version, it becomes:
| a:1 b:0.5
0:0.1:0.75 | a:0.5 b:1 c:2
shared | s_1 s_2
0:1.0:0.5 | a:1 b:1 c:1
| a:0.5 b:2 c:1
I've read the documentation numerous times and I still don't understand how this works.
I think an example of data similar to mine of how it would be adapted to the above version would be great.
Example of my use case:
2 actions: 1 and 2
3 features: language, country, favorite sport
Some of the docs that I've looked at:
https://vowpalwabbit.org/tutorials/cb_simulation.html
[EDIT]:
Playing around with it, I created a train.txt with this input:
shared |user language=en nation=CAN
|action arm=10-OC-ValueProp10
0:0:0.5 |action arm=11-OC-ValueProp11
shared |user language=it nation=ITA
|action arm=10-OC-ValueProp10
0:0:0.5 |action arm=11-OC-ValueProp11
shared |user language=it nation=ITA
0:0:0.5 |action arm=10-OC-ValueProp10
|action arm=11-OC-ValueProp11
shared |user language=it nation=ITA
0:0:0.5 |action arm=10-OC-ValueProp10
|action arm=11-OC-ValueProp11
But when I run this:
vw = pyvw.vw("-d full_data.txt --cb_explore_adf -q ua --quiet --epsilon 0.2")
vw.predict("|user language=en nation=USA")
I get a [1.0] which doesn't make sense. I am sure that I am doing something wrong.

ADF stands for action dependent features. So each event/example consists of multiple lines, with the first line being an optional set of shared features (marked with shared).
Apart from the shared line, each line corresponds with an action.
So, when you provide VW with the input:
|user language=en nation=USA
You are asking for a prediction for only 1 action (since there is no shared line), which is why you are getting back a PMF (probability mass function, or the probability to choose each distinct item) which is simply [1.0]. This states the single action should be chosen with a probability of 1.0. However, reading the features it looks as though you are actually passing what should e the shared features.
For each prediction you need to provide all of the features for each action, as essentially the action itself is defined as the set of its features (ADF).
Your predict data should look something like (notice the label is omitted):
shared |user language=it nation=ITA
|action arm=10-OC-ValueProp10
|action arm=11-OC-ValueProp11
VW will then emit something that looks like [0.9, 0.1]. You should then sample from this PMF (to allow for exploration) to determine which is the chosen action.
Training data
The format of the training data is a bit confusing since the same format was reused from non-adf. The action portion of the label is actually unused since the label must be on the line as the action it is for.
shared |user language=en nation=CAN
|action arm=10-OC-ValueProp10
0:0:0.5 |action arm=11-OC-ValueProp11
In the above example it says that action two here had a cost of 0, and when it was picked the probability of choosing it was 0.5 (the value in the PMF)

Related

Create regression tables with estout/esttab for interactions in Stata

In one of my models I use the standard built-in notation for interaction terms in Stata, in another, I have to manually code this. In the end, I would like to present nice regression tables, using esttab. How can I show identical, but slightly different coded, interaction terms in the same row? Or imagine, it's actually another interaction, how can I force esttab to ignore that?
// interaction model 1
sysuse auto, clear
regress price weight c.mpg##c.mpg foreign
estimates store model1
// interaction model 2
gen int_mpg_mpg = mpg*mpg
regress price weight mpg int_mpg_mpg foreign
estimates store model2
// make nice regression table sidy-by-side
// manual label manual interactions
label variable int_mpg_mpg "Mileage (mpg) # Mileage (mpg)"
esttab model1 model2, label
// export to latex
label variable int_mpg_mpg "Mileage (mpg) $\times$ Mileage (mpg) "
esttab model1 model2 using "table.tex", ///
label nobaselevels beta not interaction(" $\times$ ") style(tex) replace
Output to console:
Output to LaTeX:
In both cases the manual variable label shows up as a name in regression tables. But identical variables names are not aligned in the same row. I am more interested in the solution for the LaTeX output, but the problem seems to be unrelated to LaTeX.
esttab won't be able to ignore something like that as the variables are unique in how they're specified. I would recommend doing all your interaction terms in the same way that works across both specifications such as interaction model 2.
For multiple different interaction terms, you can rename the interaction terms themselves before the regressions. For example, to estimate heterogenous treatment effects by different covariates, you could run:
foreach var of varlist age education {
cap drop interaction
gen interaction = `var'
reg outcome i.treatment##c.interaction
est store `var'
}
In an esttab or estout there will be one row for the interaction effect, and one row for the main effect. This is a bit of a crude workaround, but normally does the job.
The issue should be addressed on the level "how Stata names the equations and coefficients across estimators". I adapted the code from Andrew:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1551586-align-nls-and-mle-estimates-for-the-same-variable-in-the-same-row-esttab
He is using Ben Jann's program erepost from SSC (ssc install erepost).
* model 1
sysuse auto, clear
eststo clear
gen const=1
qui regress price weight c.mpg##c.mpg foreign
mat b=e(b)
* store estimates
eststo model1
* model 2
gen int_mpg_mpg = mpg*mpg // generate interaction manually
qui regress price weight mpg int_mpg_mpg foreign
* rename interaction with additional package erepost
local coln "b:weight b:mpg b:c.mpg#c.mpg b:foreign b:_cons"
mat colnames b= `coln'
capt prog drop replace_b
program replace_b, eclass
erepost b= b, rename
end
replace_b
eststo model2
esttab model1 model2, mtitle("Interaction V1" "Interaction V2")
Now, all interactions (automatic and manual) are aligned:
--------------------------------------------
(1) (2)
Interactio~1 Interactio~2
--------------------------------------------
main
weight 3.038*** 3.038***
(3.84) (3.84)
mpg -298.1 -298.1
(-0.82) (-0.82)
c.mpg#c.mpg 5.862 5.862
(0.90) (0.90)
foreign 3420.2*** 3420.2***
(4.62) (4.62)
_cons -527.0 -527.0
(-0.08) (-0.08)
--------------------------------------------
N 74 74
--------------------------------------------

Google Fit estimated steps through REST API decrease in time with some users

We are using Googlefit REST API in a process with thousands of users, to get daily steps. With most of users, process is OK, although we are finding some users with this specific behaviour: users step increase during the day, but at some point, they decrease significantly.
We are finding a few issues related to this with Huawei health apps mainly (and some Xiaomi health apps).
We use this dataSourceId to get daily steps: derived:com.google.step_count.delta:com.google.android.gms:estimated_steps
An example of one of our requests to get data for 15th March (Spanish Times):
POST https://www.googleapis.com/fitness/v1/users/me/dataSources
Accept: application/json
Content-Type: application/json;encoding=utf-8
Authorization: Bearer XXXXXXX
{
"aggregateBy": [{
"dataTypeName": "com.google.step_count.delta",
"dataSourceId": "derived:com.google.step_count.delta:com.google.android.gms:estimated_steps"
}],
"bucketByTime": { "durationMillis": 86400000 },
"startTimeMillis": 1615244400000,
"endTimeMillis": 1615330800000
}
With most of users, this goes well (it gets the same data that shows up to the user in googlefit app), but with some users as described, numbers during day increase at first, and decrease later. Some users' data in the googlefit app is much greater (or significantly greater) than the one found through the REST API.
We have even traced this with a specific user during the day. Using buckets of 'durationMillis': 3600000, we have painted a histogram of hourly steps in one day (with a custom made process).
For the same day, in different moments of time (a couple of hours difference in this case), we get this for the EXACT SAME USER:
20210315-07 | ########################################################## | 1568
20210315-08 | ############################################################ | 1628
20210315-09 | ########################################################## | 1574
20210315-10 | ####################### | 636
20210315-11 | ################################################### | 1383
20210315-12 | ###################################################### | 1477
20210315-13 | ############################################### | 1284
20210315-14 | #################### | 552
vs. this, that was retrieved A COUPLE OF HOURS LATER:
20210315-08 | ################# | 430
20210315-09 | ######### | 229
20210315-10 | ################# | 410
20210315-11 | ###################################################### | 1337
20210315-12 | ############################################################ | 1477
20210315-13 | #################################################### | 1284
20210315-14 | ###################### | 552
("20210315-14" means 14.00 at 15th March of 2021)
This is the returning JSON in the first case:
[{"startTimeNanos":"1615763400000000000","endTimeNanos":"1615763460000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":6,"mapVal":[]}]},
{"startTimeNanos":"1615788060000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1568,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615795080000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1628,"mapVal":[]}]},
{"startTimeNanos":"1615795200000000000","endTimeNanos":"1615798500000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1574,"mapVal":[]}]},
{"startTimeNanos":"1615798860000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":636,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1383,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
This is the returning JSON in the latter case:
[{"startTimeNanos":"1615788300000000000","endTimeNanos":"1615791600000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":517,"mapVal":[]}]},
{"startTimeNanos":"1615791600000000000","endTimeNanos":"1615794540000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":430,"mapVal":[]}]},
{"startTimeNanos":"1615796400000000000","endTimeNanos":"1615798200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":229,"mapVal":[]}]},
{"startTimeNanos":"1615798980000000000","endTimeNanos":"1615802400000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":410,"mapVal":[]}]},
{"startTimeNanos":"1615802400000000000","endTimeNanos":"1615806000000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1337,"mapVal":[]}]},
{"startTimeNanos":"1615806000000000000","endTimeNanos":"1615809480000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1477,"mapVal":[]}]},
{"startTimeNanos":"1615809660000000000","endTimeNanos":"1615813200000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":1284,"mapVal":[]}]},
{"startTimeNanos":"1615813380000000000","endTimeNanos":"1615815420000000000","dataTypeName":"com.google.step_count.delta","originDataSourceId":"raw:com.google.step_count.delta:com.huawei.health:","value":[{"intVal":552,"mapVal":[]}]}]
AS you can see, all points always come from originDataSourceId: "raw:com.google.step_count.delta:com.huawei.health"
It looks like a process of Googlefit is doing some kind of adjustments, removing some steps or datapoints, although we cannot find a way to detect what and why, and we cannot explain to the user what is happening or what he or we can do to make his app data to be exactly like ours (or the other way around). His googlefit app shows a number that is not the same one as the one that the REST API shows.
User has already disabled the "googlefit app tracking activities" option.
I would love to know, or try to get some hints to know:
What can I do to debug even more?
Any hint about why is happening this?
Is there anyway, from a configuration point of view (for the user) to prevent this to happen?
Is there anyway, from a development point of view, to prevent this to happen?
Thanks and regards.
UPDATE AFTER Andy Turner's question (thanks for the comment!).
We were able to "catch" this during several hours: 18.58 (around 6K steps), 21.58 (around 25K steps), 22.58 (around 17K steps), 23.58 (around 26K steps). We exported datasets for those, and here is the result.
Another important info: Data is coming only from "raw:com.google.step_count.delta:com.huawei.health". We went through other datasets that might look suspicious, and all were empty (apart from derived and so on).
If we interpret this correctly, probably it's huawei which is sending sometimes a value, and next time, another thing; so it's probably some misconfiguration in the huawei part.
Here are the datasets exported:
https://gist.github.com/jmarti-theinit/8d98996873a9c499a14899a9b62162f3
Result of the GIST is:
Length of 18.58 points 165
Length of 21.58 points 503
Length of 22.58 points 294
Length of 23.58 points 537
How many points in 21.58 that exist in 18.58 => 165
How many points in 22.58 that exist in 18.58 => 57
How many points in 22.58 that exist in 21.58 => 294
How many points in 23.58 that exist in 18.58 => 165
How many points in 23.58 that exist in 21.58 => 503
How many points in 23.58 that exist in 22.58 => 294
So our bet is points are removed and added by devices behind huawei (for example only 57 are common in 18.58 - 22.58), and we cannot control anything more from googlefit's side. Is that correct? Anything else we could see?
We're having similar issues using the REST API.
Here you have what coincides with the case of Jordi:
we are also from Spain (and our users too), although we use servers in Spain and the US
we get the same daily steps value as the google fit app for some users, but not for other users
daily steps increases during the current day, but every next day we make the request, daily steps decrease sometimes
we are making the same request, from the start of day to the end of the day, with 86400000 as bucket time and same data type and data source id
We are in the final development phase, so we're testing with a few users only. Our users have Xiaomi mi band devices.
We think that the problem could be a desynchronization of the servers that we're hitting, because if we test with other apps like this one, they show the correct values. We've created another google cloud console oauth client credentials and new email accounts to test with a brand new users and oauth clients, but the results are the same.
This is the recommended way to get the daily steps andwe are using exactly the same request
https://developers.google.com/fit/scenarios/read-daily-step-total
and even with the "try it" option in the documentation the results are wrong.
What else can we do to help you resolve the issue?
Thank you very much!

What is the difference between `kur test` and `kur evaluate`

What exactly do kur test and kur evaluate differ?
The differences we see from console
(dlnd-tf-lab) ->kur evaluate mnist.yml
Evaluating: 100%|████████████████████████████| 10000/10000 [00:04<00:00, 2417.95samples/s]
LABEL CORRECT TOTAL ACCURACY
0 949 980 96.8%
1 1096 1135 96.6%
2 861 1032 83.4%
3 868 1010 85.9%
4 929 982 94.6%
5 761 892 85.3%
6 849 958 88.6%
7 935 1028 91.0%
8 828 974 85.0%
9 859 1009 85.1%
ALL 8935 10000 89.3%
Focus on one: /Users/Natsume/Downloads/kur/examples
(dlnd-tf-lab) ->kur test mnist.yml
Testing, loss=0.458: 100%|█████████████████████| 3200/3200 [00:01<00:00, 2427.42samples/s]
Without understanding the source codes behind kur test and kur evaluate, how can we understand what exactly do they differ?
#ajsyp the developer of Kur (deep learning library) provided the following answer, which I found to be very helpful.
kur test is used when you know what the "correct answer" is, and you
simply want to see how well your model performs on a held-out sample.
kur evaluate is pure inference: it is for generating results from
your trained model.
Typically in machine learning you split your available data into 3
sets: training, validation, and testing (people sometimes call these
different things, just so you're aware). For a particular model
architecture / selection of model hyperparameters, you train on the
training set, and use the validation set to measure how well the model
performs (is it learning correctly? is it overtraining? etc). But you
usually want to compare many different model hyperparameters: maybe
you tweak the number of layers, or their size, for example.
So how do you select the "best" model? The most naive thing to do is
to pick the model with the lowest validation loss. But then you run
the risk of optimizing/tweaking your model to work well on the
validation set.
So the test set comes into play: you use the test set as a very final,
end of the day, test of how well each of your models is performing.
It's very important to hide that test set for as long as possible,
otherwise you have no impartial way of knowing how good your model is
or how it might compare to other models.
kur test is intended to be used to run a test set through the model
to calculate loss (and run any applicable hooks).
But now let's say you have a trained model, say an image recognition
model, and now you want to actually use it! You get some new data (you
probably don't even have "truth" labels for them, just the raw
images), and you want the model to classify the images. That's what
kur evaluate is for: it takes a trained model and uses it "in
production mode," where you don't have/need truth values.

In Vowpal Wabbit, what is the difference between a namespace and feature?

While carrying out analysis in R or python we are only aware of feature names (their values) and use them. In Vowpal Wabbit we also have Namespaces.
I am unable to understand:
a. what is meant by Namespace;
b. how is it different from features;
c. when is it used? And when not used? That is, can we avoid using it.
d. And how is it used?
Will be grateful for one or two examples. Sorry for so many questions.
In vowpal wabbit name-spaces are used in order to conveniently generate interaction features on-the-fly during run-time without the need to predeclare them.
A simple example format, without a name space is:
1 | a:2 b:3
where 1 is the label, and a, b are regular input features.
Note that there's a space after the |.
Contrast the above with using two name spaces x and y (note no space between the | separator and the name-spaces):
1 |x a:2 |y b:3
This example is essentially equivalent (except for feature hash locations) to the first example. It still has two features with the same values as the original example. The difference is that now with these name-spaces, we can cross features by passing options to vw. For example:
vw -q xy
will generate additional features on-the-fly by crossing all features in name-space x with all features in name-space y. The names of the auto-generated features will be the concatenation of the names from the two name-spaces and the values will be the products of their respective values. In this particular case, it would be as if our data-set had one additional feature: ab:6 (*)
Obviously, this is a very simple example, imagine that you have an example with 3 features in a name-space:
1 |x a:2 b:3 c:5
By adding -q xx to vw you could automatically generate 6 additional interaction features: aa, ab, ac, bb, bc, cc on the fly. And if you had 3 name-spaces, say: x, y, z, you could cross any (or any wanted subset) of them: -q xx -q xy -q xz -q yz -q yy -q zz on the command-line to get all possible interactions between the separate sets of features.
That's all there is to it. It is a powerful feature allowing you to experiment and add interaction features on the fly.
There are several options which accept (1st letters of) name-spaces as arguments, among them:
-q
--cubic
--ignore
--keep
--redefine (very new)
--lrq
Check out the vw command line arguments wiki for more details.
(*) In practice, the feature names will have the name spaces prepended to them with a ^ separator in between so the actual hashed string would be x^a^y^b:6 rather than ab:6 (You may verify this by using the --audit option) but this is just a detail.

single-cycle MIPS timeing questions

I read the book "Computer Organiztion and Design", in chapter 4, it describes a single-cycle MIPS machine. however, I have several doubles about it.
If the data memory and the instruction memory in the design are SRAMs, how can any instructions be finished in a signle clock cycle . Take a load instruction as an example, I think the single-cycle MIPS design still has to go through the following stages. only the ID and EXE stage are merged.
| 1 | 2 | 3 | 4 |
| WB | | | |
| | IF | | |
| | | ID\EXE | |
| | | MEM |
if the data memory is updated at the negedge clock, the ID, EXE and MEM stage can be merged, but there are still three stages left.
Can any one explain how the "Single-Cycle" works? Thanks!
The single cycle processor that you read about in chapter 4 is a slight oversimplification over what is implementable in reality. They don't show some of the tricky details. For instance one timing assumption you could make is to assume that your memory reads are combinational and mempory writes take 1 positive edge to complete, i.e. simiar to the register file. So in that case when the clock edge arrives you have your IF stage populated with an instruction. Then for the duration of that cycle, you decode and execute the instruciton and writeback happens on the next clock edge. In case it is a data store same thing is true, the memory will be written on the next clock edge. In case of loads, you are assuming combinational memory read, so your data will arrive before the clock edge and on the edge it will be written to the register file.
Now, this is not the best way to implement this and you have to make several assumptions. In a slightly more realistic unpipelined processor you could have a stall signal that would roll over to the next cycle if you are waiting for the memory request. So you can imagine you would have a Stall_on_IF signal and Stall_on_LD signal that would tell you to stall this cycle until your instruction/data arrives. When they do arrive, you latch them and continue execution next cycle.