Computing a Multinominal Logistic multilevel regression using glmr from R - lme4

Problem: I'm trying to perform a Computing a multinominal logistic multilevel regression. I try to follow this approach:
multinomial logistic multilevel models in R
Details: Therefor I computed six separate models with glmr from the lm4 package from R.
I would like to investigate the influence that meaning in life has on people's everyday lives.
As dependent variables I have pleasant days, meaningful days, pleasant-meaningful days and meaningful-unpleasant days.
I have always made pairwise comparisons as described in the link and have always excluded the other cases.
Question I: Is my approach correct?
#Model-1
#comparision_1:meaningful-pleasant days vs. pleasant days
Model.1 <- glmer(comparision_1~ 1+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
summary(eelModel.1)
#comparision_2: meaningfulday-pleasant days vs. meaningfuldays,
eelModel.2 <- glmer(comparision_2~ 1+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
... and so on.
#Model-2
#comparision_1:meaningful-pleasant days vs. pleasant days
Mode2.1 <- glmer(comparision_1~ meaning_in_life+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
summary(eelModel.1)
#comparision_2: meaningfulday-pleasant days vs. meaningfuldays,
Mode2.2 <- glmer(comparision_2~ meaning_in_life+ (1|subject_id), data = data, family = binomial(), na.action = na.omit)
... and so on.
Question II: Are the estimates from the Output the log-odds? Or do I have to compute them?
Thanks for your help,
Christoph

Related

Measurement-repeated ANCOVA in 2x2 Mixed Design

I am calculating in R an ANOVA with repeated measures in 2x2 mixed design. For this I use one of the following inputs in R:
(1)
res.aov <- anova_test(data = datac, dv = Stress, wid = REF,between = Gruppe, within = time )
get_anova_table(res.aov)
(2)
aov <- datac %>%
anova_test(dv = Stress, wid = REF, between = Gruppe, within = time, type = 3)
aov
Both lead to the same results. Now I want to add a covariate from the 1st measurement time point. So far I could not find a suitable R input for the ANCOVA for this repeated measures design.
Does anyone of you perhaps have an idea?
Many greetings
ANOVA <- aov(Stress~time+covariate, data = data)
summary(ANOVA)
For a simple ANCOVA, the R input of an ANOVA with the addition of the covariate applies. Unfortunately, I have no idea how this works in the repeated measures design.

anova_test not returning Mauchly's for three way within subject ANOVA

I am using a data set called sleep (found here: https://drive.google.com/file/d/15ZnsWtzbPpUBQN9qr-KZCnyX-0CYJHL5/view) to run a three way within subject ANOVA comparing Performance based on Stimulation, Deprivation, and Time. I have successfully done this before using anova_test from rstatix. I want to look at the sphericity output but it doesn't appear in the output. I have got it to come up with other three way within subject datasets, so I'm not sure why this is happening. Here is my code:
anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
I also tried to save it to an object and use get_anova_table, but that didn't look any different.
sleep_aov <- anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
get_anova_table(sleep_aov, correction = "GG")
This is an ideal dataset I pulled from the internet, so I'm starting to think the data had a W of 1 (perfect sphericity) and so rstatix is skipping this output. Is this something anova_test does?
Here also is my code using a dataset that does return Mauchly's:
weight_loss_long <- pivot_longer(data = weightloss, cols = c(t1, t2, t3), names_to = "time", values_to = "loss")
weight_loss_long$time <- factor(weight_loss_long$time)
anova_test(data = weight_loss_long, dv = loss, wid = id, within = c(diet, exercises, time))
Not an expert at all, but it might be because your factors have only two levels.
From anova_summary() help:
"Value
return an object of class anova_test a data frame containing the ANOVA table for independent measures ANOVA. However, for repeated/mixed measures ANOVA, it is a list containing the following components are returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values. "

How to get dataset into array

I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. I'm not a total beginner, but I don't have much time to complete this, and I've been suddenly thrown into Tensorflow, which is unexpectedly difficult.
Let me lay it out:
Very simple CSV file of 184 columns that are all float numbers. A row is simply today's price, three buy signals, and the previous 180 days prices
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
This article: https://www.tensorflow.org/guide/datasets
It covers how to load pretty well. It even has a section on changing to numpy arrays, which is what I need to train and test the 'net. However, as the author says in the article leading to this Web page, it is pretty complex. It seems like everything is geared toward doing data manipulation, where we have already normalized our data (nothing has really changed in AI since 1983 in terms of inputs, outputs, and layers).
Here is a way to load it, but not in to Numpy and no example of not manipulating the data.
with tf.Session as sess:
sess.run( tf.global variables initializer())
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
line_count = 0
for row in csv_reader:
?????????
line_count += 1
I need to know how to get the csv file in to the
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
so that I can follow the tutorials to train and test the net.
It's not that clear for me your question. You might be answering, tell me if I'm wrong, how to feed data in your model? There are several fashions to do so.
Use placeholders with feed_dict during the session. This is the basic and easier one but often suffers from training performance issue. Further explanation, check this post.
Use queue. Hard to implement and badly documented, I don't suggest, because it's been taken over by the third method.
tf.data API.
...
So to answer your question by the first method:
# get your array outside the session
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
dataset = np.asarray([data for data in csv_reader])
close_col = dataset[:, 0]
signal_cols = dataset[:, 1: 3]
previous_cols = dataset[:, 3:]
# let's say you load 100 row each time for training
batch_size = 100
# define placeholders like you
...
with tf.Session() as sess:
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation, feed_dict={close: close_col[start: end, ],
signals: signal_col[start: end, ],
previous: previous_col[start: end, ]
}
)
By the third method:
# retrieve your columns like before
...
# let's say you load 100 row each time for training
batch_size = 100
# construct your input pipeline
c_col, s_col, p_col = wrapper(filename)
batch = tf.data.Dataset.from_tensor_slices((close_col, signal_col, previous_col))
batch = batch.shuffle(c_col.shape[0]).batch(batch_size) #mix data --> assemble batches --> prefetch to RAM and ready inject to model
iterator = batch.make_initializable_iterator()
iter_init_operation = iterator.initializer
c_it, s_it, p_it = iterator.get_next() #get next batch operation automatically called at each iteration within the session
# replace your close, signal, previous placeholder in your model by c_it, s_it, p_it when you define your model
...
with tf.Session() as sess:
# you need to initialize the iterators
sess.run([tf.global_variable_initializer, iter_init_operation])
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation)
Good luck!

R - Vectorize a JSON call

Working with Mapquest directions API to plot thousands of routes using ggplot2 in R.
Basic code theory: Have a list of end locations and a single start location. For each end location, a call to fromJSON returns routing coordinates from Mapquest. From there, have already vectorized the assignment of coordinates (read as lists in lists) to the geom_path geom of ggplot2.
Right now, running this on a location set of ~ 1200 records takes ~ 4 minutes. Would love to get that down. Any thoughts on how to vectorize the call to fromJSON (which returns a list of lists)?
Windows 7, 64-bit, R 2.14.2
libraries: plyr, ggplot2, rjson, mapproj, XML
k = 0
start_loc = "263+NORTH+CENTER+ST.,+MESA+ARIZ."
end_loc = funder_trunc[,length(funder_trunc)]
route_urls = paste(mapquest_baseurl, "&from=", start_loc, "&to=", end_loc, "&ambiguities=ignore", sep="")
for (n in route_urls) {
route_legs = fromJSON(file = url(n))$route$legs[[1]]$maneuvers
lats = unlist(lapply(route_legs, function(x) return(x$startPoint[[2]])))
lngs = unlist(lapply(route_legs, function(x) return(x$startPoint[[1]])))
frame = data.frame(cbind(lngs, lats))
path_added = geom_path(aes(lngs, lats), data = frame)
p = p + path_added
k = k + 1
print(paste("Processed ", k, " of ", nrow(funder_trunc), " records in set.", sep=""))
}
Going out on a limb here since I don't use rjson or mapproj, but it seems like calling the server thousands of times is the real culprit. If the mapquest server doesn't have an API that allows you to send multiple requests in one go, you are in trouble. If it does, then you need to find out how to use/modify rjson and/or mapproj to call it...
As #Chase said, you might be able to call it in parallel, but the server won't like getting too many parallel requests from the same client - it might ban you. Btw, it might not even like getting thousands of serial requests in rapid succession from the same client either - but apparently your current code works so I guess it doesn't mind.

Performance problem transforming JSON data

I've got some data in JSON format that I want to do some visualization on. The data (approximately 10MB of JSON) loads pretty fast, but reshaping it into a usable form takes a couple of minutes for just under 100,000 rows. I have something that works, but I think it can be done much better.
It may be easiest to understand by starting with my sample data.
Assuming you run the following command in /tmp:
curl http://public.west.spy.net/so/time-series.json.gz \
| gzip -dc - > time-series.json
You should be able to see my desired output (after a while) here:
require(rjson)
trades <- fromJSON(file="/tmp/time-series.json")$rows
data <- do.call(rbind,
lapply(trades,
function(row)
data.frame(date=strptime(unlist(row$key)[2], "%FT%X"),
price=unlist(row$value)[1],
volume=unlist(row$value)[2])))
someColors <- colorRampPalette(c("#000099", "blue", "orange", "red"),
space="Lab")
smoothScatter(data, colramp=someColors, xaxt="n")
days <- seq(min(data$date), max(data$date), by = 'month')
smoothScatter(data, colramp=someColors, xaxt="n")
axis(1, at=days,
     labels=strftime(days, "%F"),
     tick=FALSE)
You can get a 40x speedup by using plyr. Here is the code and the benchmarking comparison. The conversion to date can be done once you have the data frame and hence I have removed it from the code to facilitate apples-to-apples comparison. I am sure a faster solution exists.
f_ramnath = function(n) plyr::ldply(trades[1:n], unlist)[,-c(1, 2)]
f_dustin = function(n) do.call(rbind, lapply(trades[1:n],
function(row) data.frame(
date = unlist(row$key)[2],
price = unlist(row$value)[1],
volume = unlist(row$value)[2]))
)
f_mrflick = function(n) as.data.frame(do.call(rbind, lapply(trades[1:n],
function(x){
list(date=x$key[2], price=x$value[1], volume=x$value[2])})))
f_mbq = function(n) data.frame(
t(sapply(trades[1:n],'[[','key')),
t(sapply(trades[1:n],'[[','value')))
rbenchmark::benchmark(f_ramnath(100), f_dustin(100), f_mrflick(100), f_mbq(100),
replications = 50)
test elapsed relative
f_ramnath(100) 0.144 3.692308
f_dustin(100) 6.244 160.102564
f_mrflick(100) 0.039 1.000000
f_mbq(100) 0.074 1.897436
EDIT. MrFlick's solution leads to an additional 3.5x speedup. I have updated my tests.
I received another transformation by MrFlick in irc that was significantly faster and worth mentioning here:
data <- as.data.frame(do.call(rbind,
lapply(trades,
function(x) {list(date=x$key[2],
price=x$value[1],
volume=x$value[2])})))
It seems to be made significantly faster by not building the inner frames.
You are doing vectorized operations on single elements, which is very inefficient. Price and volume can be extracted like this:
t(sapply(trades,'[[','value'))
And dates like this:
strptime(sapply(trades,'[[','key')[c(F,T)],'%FT%X')
Now only some sugar and the complete code looks like this:
data.frame(
strptime(sapply(trades,'[[','key')[c(F,T)],'%FT%X'),
t(sapply(trades,'[[','value')))->data
names(data)<-c('date','price','volume')
On my notebook, the whole set gets converted in about 0.7s, while 10k first rows (10%) take circa 8s using the original algorithm.
Is batching an option? Process 1000 rows at a time perhaps depending on how deep your json is. Do you really need to transform all the data? I am not sure about r and what exactly you are dealing with, but I am thinking of a generic approach.
Also do take a look at this: http://jackson.codehaus.org/ : A High-performance JSON processor.