I have a large dataset and I want to apply comparegroups function in R. The dataset has number of grouping variables that I want to compare other variables on it; I wanted to loop over these grouping variables. the function is not accepting the loop
`vars <- c("fibrosis_stage", "Steatosis_stage", "patient_classification")
for (var in vars){
model <- compareGroups(var~.,data = data)
result.model <- createTable(model)
export2xls(result.model,paste0(var,"comparisons.xlsx"))
}`
but I got the following error:
Error in model.frame.default(formula = var ~ ., data = list(Gender = c(2L, :
variable lengths differ
I tried even to make it in a function; with the grouping varaible as input but I had an error also.
Any one can help?
Related
x = fopen('pm10_data.txt');
fseek(x, 8,0);
dat = fscanf (x,'%f',[2,1000]);
dat = transpose(dat);
a = dat(:,1);
b = dat(:,2);
[r,p] = cor_test (a,b)
fclose(x);
r
p
this is what i got,
r =
scalar structure containing the fields:
method = Pearson's product moment correlation
params = 76
stat = 6.2156
dist = t
pval = 2.5292e-08
alternative = !=
Run error
error: element number 2 undefined in return list
error: called from
tester.octave at line 7 column 6
Presumably you're referring to the cor_test function from the statistics package, even though you don't show loading this in your workspace.
According to the documentation of cor_test:
The output is a structure with the following elements:
PVAL The p-value of the test.
STAT The value of the test statistic.
DIST The distribution of the test statistic.
PARAMS The parameters of the null distribution of the test statistic.
ALTERNATIVE The alternative hypothesis.
METHOD The method used for testing.
If no output argument is given, the p-value is displayed.
This seems to be what you're getting too.
If you want the p value explicitly from that structure, you can access that as r.pval
The syntax [a, b, ...] = functionname( args, ... ) expects the function to return more than one argument, and capture all the returned arguments into the named variables (i.e. a, b, etc).
In this case, cor_test only returns a single argument, even though that argument is a struct (which means it has fields you can access).
The error you're getting effectively means you requested a second output argument p, but the function you're using does not return a second output argument. It only returns that struct you already captured in r.
Here is a ranger call:
rf_fit <-
rf_mod %>%
fit(my_outcome_factor ~ ., data = data_train)
and the output:
Error in parse.formula(formula, data, env = parent.frame()) :
Error: Illegal column names in formula interface. Fix column names or use alternative interface in ranger.
How can I tell which columns are illegal?
I tried setting up a function foo() to debug:
foo <- function() {
browser()
ranger:::parse.formula(my_outcome_factor ~ ., data = data_train)
}
This function didn't help me much, because I don't know how to get a breakpoint in the right spot.
It's ranger version 0.12.1.
I have data loaded from JSON and am trying to extract arbitrary nested values using a list as input, where the list corresponds to the names of successive children. I want a function get_value(data,lookup) that returns the value from data by treating each entry in lookup as a nested child.
In the example below, when lookup=['alldata','TimeSeries','rates'], the return value should be [1.3241,1.3233].
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
res = data
for item in lookup:
res = res[item]
return res
lookup = ['alldata','TimeSeries','rates']
get_value(json_data,lookup)
My example works, but there are two problems:
It's inefficient - In my for loop, I copy the whole TimeSeries object to res, only to then replace it with the rates list. As #Andrej Kesely explained, res is a reference at each iteration, so data isn't being copied.
It's not concise - I was hoping to be able to find a concise (eg one or two line) way of extracting the data using something like list comprehension syntax
If you want one-liner and you are using Python 3.8, you can use assignment expression ("walrus operator"):
json_data = {'alldata':{'name':'CAD/USD','TimeSeries':{'dates':['2018-01-01','2018-01-02'],'rates':[1.3241,1.3233]}}}
def get_value(data,lookup):
return [data:=data[item] for item in lookup][-1]
lookup = ['alldata','TimeSeries','rates']
print( get_value(json_data,lookup) )
Prints:
[1.3241, 1.3233]
I don't think you can do it without a loop, but you could use a reducer here to increase readability.
functools.reduce(dict.get, lookup, json_data)
I have written a loop to retrieve data from an API (Limesurvey) based on a list of ids and fill a row of a dataframe with the result of each loop.
I have a list with ids like this:
# list of ids
ids = ['1','427',... ,'847']
My code to query the API based on each item of the list looks as follows:
method = "get_participant_properties"
params = OrderedDict([
("sSessionKey", api.session_key),
("iSurveyID", 12345),
("aTokenQueryProperties", t),
])
# loop through API query with the 'aTokenQueryProperties' stored in the list 'tids'.
attributes = []
for t in ids:
attributes.append(api.query(method=method, params=params))
pd.DataFrame(attributes)
Unfortunately, the result is a dataframe with 158 rows, and each row is the same, i.e. the query result of the last id in my list (847).
You are not passing in the t from the loop. The t in the params definition is unrelated; if I were to run your code now I'd get a NameError exception because t hasn't been set at that point. The t expression in the params mapping is not live, it is not updated each loop iteration.
Set the 'aTokenQueryProperties' key in the loop:
method = "get_participant_properties"
params = OrderedDict([
("sSessionKey", api.session_key),
("iSurveyID", 12345),
("aTokenQueryProperties", None),
])
attributes = []
for t in ids:
params["aTokenQueryProperties"] = t
attributes.append(api.query(method=method, params=params))
Setting "aTokenQueryProperties" to None in the params OrderedDict object at the start is optional; you'd only need to do this if reserving its exact location in the params order is important and even then, because in your example it is the last element in the mapping, you'd end up with the same output anyway.
For an assignment, I am supposed to use SQL to get a list of unique values from a table as a vector in R. I wrote the following code in R:
selection = dbSendQuery(con, statement = "SELECT user_id FROM twitter_message")
user_id = c(dbFetch(selection))
I am supposed to then randomly generate 3 values, preferably using the sample() function. However, when I do that, it generates vectors the size of the original vector (approximately 500 values) rather than selecting 3 values from the vector. I do not know if the error is from how I put the data in a vector or not. I tried writing the following code:
sample(user_id, size = 3, replace = FALSE, prob = NULL)
However, I get an the error:
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
Need to sample from rows not from your dataframe.
user_id[sample(nrow(user_id), 3, replace = FALSE, prob = NULL),]