VIPER: some of genes in the pathway gene list are missing in the result - viper

I have a problem when I use VIPER with aracne networks.
I used regulonbrca in aracne to calculate protein activity,
The number of genes in regulonbrca is 6054.
And the number of genes in regulonbrca also in my TPM file is 5918.
But I got only 4506 genes as a viper result.
Why some genes are missing when running viper? Is there a default setting about it?
Here is my code.
expdat= "TPM-brca(entrez_avg).csv"
clsdat= "immunesubtype-BRCA.csv"
regulon= regulonbrca
dat0 = data.frame(read.csv(expdat))
dat1 = dat0[,-c(1,2)]
rownames(dat1) = dat0[,1]
cls0 = data.frame(read.csv(clsdat))
colnames(cls0) = c("id", "description")
cls1 = cls0 %>% dplyr::filter(id %in% colnames(dat1))
dat = dat1[,cls1$id]
cls = data.frame(description=cls1[,2])
rownames(cls) = cls1[,1]
meta = data.frame(labelDescription=c("description"), row.names = colnames(cls))
pheno = new("AnnotatedDataFrame", data=cls, varMetadata=meta)
dset = ExpressionSet(assayData = as.matrix(dat), phenoData = pheno)
signature = rowTtest(dset, "description", "TRUE", "FALSE")
signature = (qnorm(signature$p.value/2, lower.tail = FALSE) * + sign(signature$statistic))[, 1]
nullmodel = ttestNull(dset, "description", "TRUE", "FALSE", per = 1000, repos = TRUE, verbose = FALSE)
vpres = viper(dset, regulon, verbose = FALSE) ## single sample viper
res_ss0 = data.frame(id=row.names(vpres#assayData$exprs), vpres#assayData$exprs)
geneid_ss0 = ldply(sapply(row.names(res_ss0), converter), data.frame) ## gene mapping
colnames(geneid_ss0) = c('id', 'gene')
res_ss = merge(geneid_ss0, res_ss0, key='id')[, -1]

Related

How to resume training in spacy transformers for NER

I have created a spacy transformer model for named entity recognition. Last time I trained till it reached 90% accuracy and I also have a model-best directory from where I can load my trained model for predictions. But now I have some more data samples and I wish to resume training this spacy transformer. I saw that we can do it by changing the config.cfg but clueless about 'what to change?'
This is my config.cfg after running python -m spacy init fill-config ./base_config.cfg ./config.cfg:
[paths]
train = null
dev = null
vectors = null
init_tok2vec = null
[system]
gpu_allocator = "pytorch"
seed = 0
[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"#tokenizers":"spacy.Tokenizer.v1"}
[components]
[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"#scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100
[components.ner.model]
#architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null
[components.ner.model.tok2vec]
#architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"#layers":"reduce_mean.v1"}
upstream = "*"
[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"#annotation_setters":"spacy-transformers.null_annotation_setter.v1"}
[components.transformer.model]
#architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
mixed_precision = false
[components.transformer.model.get_spans]
#span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96
[components.transformer.model.grad_scaler_config]
[components.transformer.model.tokenizer_config]
use_fast = true
[components.transformer.model.transformer_config]
[corpora]
[corpora.dev]
#readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[corpora.train]
#readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null
[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
[training.batcher]
#batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null
[training.logger]
#loggers = "spacy.ConsoleLogger.v1"
progress_bar = false
[training.optimizer]
#optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
[training.optimizer.learn_rate]
#schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005
[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null
[pretraining]
[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null
[initialize.components]
[initialize.tokenizer]
As you can see there is a 'vectors' parameter under [initialize] so I tried giving vectors from 'model-best' like this:
But it gave me this error
OSError: [E884] The pipeline could not be initialized because the vectors could not be found at './model-best/ner'. If your pipeline was already initialized/trained before, call 'resume_training' instead of 'initialize', or initialize only the components that are new.
For those who are wondering that I have been given the wrong path. No, that directory exists. You can see directory structure,
So, please guide me on how I can successfully resume the training from previous weights.
Thank you!
The vectors setting is not related to the transformer or what you're trying to do.
In the new config, you want to use the source option to load the components from the existing pipeline. You would modify the [component] blocks to contain only the source setting and no other settings:
[components.ner]
source = "/path/to/model-best"
[components.transformer]
source = "/path/to/model-best"
See: https://spacy.io/usage/training#config-components
Vector sizes refer to word vectors here. To use the vocabulary from the previously trained Spacy pipeline, you can use the following the code:
[components.ner]
source = "/path/to/model-best"
[initialize]
vectors = ${paths.vectors}
[initialize.before_init]
#callbacks: "spacy.copy_from_base_model.v1"
tokenizer: "/path/to/model-best"
vocab: "/path/to/model-best"

Difference in Computation Speed and Results Between MLR and MLR3

I don't get similar results when I use the same data and models using mlr and mlr3. Also I find mlr runs at least 20-fold faster. I used lung data from survival and I was able to replicate the difference in computation speed and results since I can't share my data.
mlr was completed in 1 min with C-index generally low compared to mlr3 that took 21 min to complete with C-index being much higher despite using same data, same preprocessing, same model and setting seed.
library(tidyverse)
library(tidymodels)
library(PKPDmisc)
library(mlr)
library(parallelMap)
library(survival)
# Data and Data Splitting
data = as_tibble(lung) %>%
mutate(status = if_else(status==1, 0, 1),
sex = factor(sex, levels = c(1:2), labels = c("male", "female")),
ph.ecog = factor(ph.ecog))
na <- sample(1:228, 228*0.1)
data$sex[na] <- NA
data$ph.ecog[na]<- NA
set.seed(123)
split <- data %>% initial_split(prop = 0.8, strata = status)
train <- split %>% training()
test <- split %>% testing()
# Task
task = makeSurvTask(id = "Survival", data = train, target = c("time", "status"))
# Resample
# For model assessment before external validation on test data
set.seed(123)
outer_cv = makeResampleDesc("CV", iter=4, stratify.cols = c("status")) %>%
makeResampleInstance(task)
# For feature selection and parameter tuning
set.seed(123)
inner_cv = makeResampleDesc("CV", iter=4, stratify.cols = c("status"))
# Learners
cox1 = makeLearner(id = "COX1", "surv.coxph") %>%
makeImputeWrapper(classes = list(factor = imputeMode(), numeric = imputeMedian()),
# Create dummy variable for factor features
dummy.classes = "factor") %>%
makePreprocWrapperCaret(ppc.center = TRUE, ppc.scale = TRUE) %>%
makeFeatSelWrapper(resampling = inner_cv, show.info = TRUE,
control = makeFeatSelControlSequential(method = "sfs"))
cox_lasso = makeLearner(id = "COX LASSO", "surv.glmnet") %>%
makeImputeWrapper(classes = list(factor = imputeMode(), numeric = imputeMedian()),
# Create dummy variable for factor features
dummy.classes = "factor") %>%
# Normalize numeric features
makePreprocWrapperCaret(ppc.center = TRUE, ppc.scale = TRUE) %>%
makeTuneWrapper(resampling = inner_cv, show.info = TRUE,
par.set = makeParamSet(makeNumericParam("lambda",lower = -3, upper = 0,
trafo = function(x) 10^x)),
control = makeTuneControlGrid(resolution = 10L))
cox_net = makeLearner(id = "COX NET", "surv.glmnet") %>%
makeImputeWrapper(classes = list(factor = imputeMode(), numeric = imputeMedian()),
# Create dummy variable for factor features
dummy.classes = "factor") %>%
# Normalize numeric features
makePreprocWrapperCaret(ppc.center = TRUE, ppc.scale = TRUE) %>%
makeTuneWrapper(resampling = inner_cv, show.info = TRUE,
par.set = makeParamSet(makeNumericParam("alpha", lower = 0, upper = 1,
trafo = function(x) round(x,2)),
makeNumericParam("lambda",lower = -3, upper = 1,
trafo = function(x) 10^x)),
control = makeTuneControlGrid(resolution = 10L))
# Benchmark
# parallelStartSocket(4)
start_time <- Sys.time()
set.seed(123)
mlr_bmr = benchmark(learners = list(cox1, cox_lasso, cox_net),
tasks = task,
resamplings = outer_cv,
keep.extract= TRUE,
models = TRUE)
end_time <- Sys.time()
mlr_time = end_time - start_time
# parallelStop()
mlr_res <- getBMRPerformances(mlr_bmr, as.df = TRUE) %>%
select(Learner = learner.id, Task = task.id, Cindex = cindex) %>%
mutate(Color_Package = "mlr",
Learner = word(str_replace(Learner, "\\.", " "), 1, -2))
##################################################################
library(mlr3verse)
# Task
task2 = TaskSurv$new(id = "Survival2", backend = train, time = "time", event = "status")
task2$col_roles$stratum = c("status")
# Resmaple
set.seed(123)
outer_cv2 = rsmp("cv", folds = 4)$instantiate(task2)
# For feature selection and parameter tuning
set.seed(123)
inner_cv2 = rsmp("cv", folds = 4)
# Learners
preproc = po("imputemedian", affect_columns = selector_type("numeric")) %>>%
po("imputemode", affect_columns = selector_type("factor")) %>>%
po("scale") %>>%
po("encode")
cox2 = AutoFSelector$new(learner = as_learner(preproc %>>%
lrn("surv.coxph")),
resampling = inner_cv2,
measure = msr("surv.cindex"),
terminator = trm("none"), # need to increase later
fselector = fs("sequential", strategy = "sfs")) # sfs is the default
cox2$id = "COX1"
cox_lasso2 = AutoTuner$new(learner = as_learner(preproc %>>%
lrn("surv.glmnet",
lambda = to_tune(p_dbl(lower = -3, upper = 0,
trafo = function(x) 10^x)))),
resampling = inner_cv2,
measure = msr("surv.cindex"),
terminator = trm("none"),
tuner = tnr("grid_search", resolution = 10))
cox_lasso2$id = "COX LASSO"
cox_net2 = AutoTuner$new(learner = as_learner(preproc %>>%
lrn("surv.glmnet",
alpha = to_tune(p_dbl(lower = 0, upper = 1)),
lambda = to_tune(p_dbl(lower = -3, upper = 1,
trafo = function(x) 10^x)))),
resampling = inner_cv2,
measure = msr("surv.cindex"),
terminator = trm("none"),
tuner = tnr("grid_search", resolution = 10))
cox_net2$id = "COX NET"
# Benchmark
desgin = benchmark_grid(tasks = task2,
learners = c(cox2, cox_lasso2, cox_net2),
resamplings = outer_cv2)
# future::plan("multisession")
# Error: Output type of PipeOp select during training (Task) incompatible with input type of PipeOp surv.coxph (TaskSurv)
start_time <- Sys.time()
set.seed(123)
mlr3_bmr = mlr3::benchmark(desgin)
end_time <- Sys.time()
mlr3_time = end_time - start_time
mlr3_res <- as.data.table(mlr3_bmr$score()) %>%
select(Task=task_id, Learner=learner_id, Cindex=surv.harrell_c) %>%
mutate(Color_Package = "mlr3")
mlr_res %>%
bind_rows(mlr3_res) %>%
ggplot(aes(Learner, Cindex, fill= Color_Package )) +
geom_boxplot(position=position_dodge(.8)) +
stat_summary(fun= mean, geom = "point", aes(group = Color_Package ),
position=position_dodge(.8), size = 3) +
labs(x="", y = " C-Index") +
theme_bw() + base_theme() + theme(legend.position = "top")

Android Room how to query related table?

First thing, my data is POKEMON!!! enjoy 😉
I need to do this on the database side, filtering and sorting the returned data isn't an option as using Paging...
I'm using Room I have my database working well but I now want to query the pokemonType list in my relation
Given this data class
data class PokemonWithTypesAndSpecies #JvmOverloads constructor(
#Ignore
var matches : Int = 0,
#Embedded
val pokemon: Pokemon,
#Relation(
parentColumn = Pokemon.POKEMON_ID,
entity = PokemonType::class,
entityColumn = PokemonType.TYPE_ID,
associateBy = Junction(
value = PokemonTypesJoin::class,
parentColumn = Pokemon.POKEMON_ID,
entityColumn = PokemonType.TYPE_ID
)
)
val types: List<PokemonType>,
#Relation(
parentColumn = Pokemon.POKEMON_ID,
entity = PokemonSpecies::class,
entityColumn = PokemonSpecies.SPECIES_ID,
associateBy = Junction(
value = PokemonSpeciesJoin::class,
parentColumn = Pokemon.POKEMON_ID,
entityColumn = PokemonSpecies.SPECIES_ID
)
)
val species: PokemonSpecies?
)
I can get my data with a simple query and even search it
#Query("SELECT * FROM Pokemon WHERE pokemon_name LIKE :search")
fun searchPokemonWithTypesAndSpecies(search: String): LiveData<List<PokemonWithTypesAndSpecies>>
But now what I want is to add filtering on pokemon types which as you can see is a list (which is probably a transaction under the hood) and is in a seperate table, so given a list of string called filters I want to:
only return pokemon that contain an item in filters
sort pokemon by both number of matching types and by ID
So I want my tests to look something like this
val bulbasaurSpeciesID = 1
val squirtleSpeciesID = 2
val charmanderSpeciesID = 3
val charizardSpeciesID = 4
val pidgeySpeciesID = 5
val moltresSpeciesID = 6
val bulbasaurID = 1
val squirtleID = 2
val charmanderID = 3
val charizardID = 4
val pidgeyID = 5
val moltresID = 6
val grassTypeID = 1
val poisonTypeID = 2
val fireTypeID = 3
val waterTypeID = 4
val flyingTypeID = 5
val emptySearch = "%%"
val allPokemonTypes = listOf(
"normal",
"water",
"fire",
"grass",
"electric",
"ice",
"fighting",
"poison",
"ground",
"flying",
"psychic",
"bug",
"rock",
"ghost",
"dark",
"dragon",
"steel",
"fairy",
"unknown",
)
#Before
fun createDb() {
val context = ApplicationProvider.getApplicationContext<Context>()
db = Room.inMemoryDatabaseBuilder(
context, PokemonRoomDatabase::class.java,
).setTransactionExecutor(Executors.newSingleThreadExecutor())
.allowMainThreadQueries()
.build()
pokemonDao = db.pokemonDao()
speciesDao = db.pokemonSpeciesDao()
speciesJoinDao = db.pokemonSpeciesJoinDao()
pokemonTypeDao = db.pokemonTypeDao()
pokemonTypeJoinDao = db.pokemonTypeJoinDao()
}
#After
#Throws(IOException::class)
fun closeDb() {
db.close()
}
#Test
#Throws(Exception::class)
fun testFiltering() {
insertPokemonForFilterTest()
val pokemon =
pokemonDao.searchAndFilterPokemon(search = emptySearch, filters = allPokemonTypes)
.getValueBlocking(scope)
assertThat(pokemon?.size, equalTo(6)) // This fails list size is 9 with the current query
val pokemonFiltered =
pokemonDao.searchAndFilterPokemon(search = emptySearch, filters = listOf("fire", "flying"))
.getValueBlocking(scope)
assertThat(pokemon?.size, equalTo(4))
assertThat(pokemonFiltered!![0].pokemon.name, equalTo("charizard")) // matches 2 filters and ID is 4
assertThat(pokemonFiltered!![1].pokemon.name, equalTo("moltres")) // matches 2 filters and ID is 6
assertThat(pokemonFiltered!![2].pokemon.name, equalTo("charmander")) // matches one filter and ID is 3
assertThat(pokemonFiltered!![3].pokemon.name, equalTo("pidgey")) // mayches one filter and ID is 5
}
private fun insertPokemonForFilterTest() = runBlocking {
insertBulbasaur()
insertSquirtle()
insertCharmander()
insertCharizard()
insertMoltres()
insertPidgey()
}
private fun insertBulbasaur() = runBlocking {
val bulbasaur = bulbasaur()
val grassJoin = PokemonTypesJoin(pokemon_id = bulbasaurID, type_id = grassTypeID)
val poisonJoin = PokemonTypesJoin(pokemon_id = bulbasaurID, type_id = poisonTypeID)
val bulbasaurSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = bulbasaurID, species_id = bulbasaurSpeciesID)
pokemonDao.insertPokemon(bulbasaur.pokemon)
speciesDao.insertSpecies(bulbasaur.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(bulbasaurSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = bulbasaur.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = bulbasaur.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(grassJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(poisonJoin)
}
private fun insertSquirtle() = runBlocking {
val squirtle = squirtle()
val squirtleSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = squirtleID, species_id = squirtleSpeciesID)
val waterJoin = PokemonTypesJoin(pokemon_id = squirtleID, type_id = waterTypeID)
pokemonDao.insertPokemon(squirtle.pokemon)
speciesDao.insertSpecies(squirtle.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(squirtleSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = squirtle.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(waterJoin)
}
private fun insertCharmander() = runBlocking {
val charmander = charmander()
val fireJoin = PokemonTypesJoin(pokemon_id = charmanderID, type_id = fireTypeID)
val charmanderSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = charmanderID, species_id = charmanderSpeciesID)
pokemonDao.insertPokemon(charmander.pokemon)
speciesDao.insertSpecies(charmander.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(charmanderSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = charmander.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
}
private fun insertCharizard() = runBlocking {
val charizard = charizard()
val charizardSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = charizardID, species_id = charizardSpeciesID)
val fireJoin = PokemonTypesJoin(pokemon_id = charizardID, type_id = fireTypeID)
val flyingJoin = PokemonTypesJoin(pokemon_id = charizardID, type_id = flyingTypeID)
pokemonDao.insertPokemon(charizard.pokemon)
speciesDao.insertSpecies(charizard.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(charizardSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = charizard.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = charizard.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
private fun insertPidgey() = runBlocking {
val pidgey = pidgey()
val pidgeySpeciesJoin =
PokemonSpeciesJoin(pokemon_id = pidgeyID, species_id = pidgeySpeciesID)
val flyingJoin = PokemonTypesJoin(pokemon_id = pidgeyID, type_id = flyingTypeID)
pokemonDao.insertPokemon(pidgey.pokemon)
speciesDao.insertSpecies(pidgey.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(pidgeySpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = pidgey.types[0])
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
private fun insertMoltres() = runBlocking {
val moltres = moltres()
val moltresSpeciesJoin =
PokemonSpeciesJoin(pokemon_id = moltresID, species_id = moltresSpeciesID)
val fireJoin = PokemonTypesJoin(pokemon_id = moltresID, type_id = fireTypeID)
val flyingJoin = PokemonTypesJoin(pokemon_id = moltresID, type_id = flyingTypeID)
pokemonDao.insertPokemon(moltres.pokemon)
speciesDao.insertSpecies(moltres.species!!)
speciesJoinDao.insertPokemonSpeciesJoin(moltresSpeciesJoin)
pokemonTypeDao.insertPokemonType(pokemonType = moltres.types[0])
pokemonTypeDao.insertPokemonType(pokemonType = moltres.types[1])
pokemonTypeJoinDao.insertPokemonTypeJoin(fireJoin)
pokemonTypeJoinDao.insertPokemonTypeJoin(flyingJoin)
}
fun bulbasaur(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = bulbasaurID, name = "bulbasaur"),
species = PokemonSpecies(
id = bulbasaurSpeciesID,
species = "Seed pokemon",
pokedexEntry = "There is a plant seed on its back right from the day this Pokémon is born. The seed slowly grows larger."
),
types = listOf(
PokemonType(id = poisonTypeID, name = "poison", slot = 1),
PokemonType(id = grassTypeID, name = "grass", slot = 2)
)
)
fun squirtle(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = squirtleID, name = "squirtle"),
species = PokemonSpecies(
id = squirtleSpeciesID,
species = "Turtle pokemon",
pokedexEntry = "Small shell pokemon"
),
types = listOf(PokemonType(id = waterTypeID, name = "water", slot = 1))
)
fun charmander(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = charmanderID, name = "charmander"),
species = PokemonSpecies(
id = charmanderSpeciesID,
species = "Fire lizard pokemon",
pokedexEntry = "If the flame on this pokemon's tail goes out it will die"
),
types = listOf(PokemonType(id = fireTypeID, name = "fire", slot = 1))
)
fun charizard(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = charizardID, name = "charizard"),
species = PokemonSpecies(
id = charizardSpeciesID,
species = "Fire flying lizard pokemon",
pokedexEntry = "Spits fire that is hot enough to melt boulders. Known to cause forest fires unintentionally"
),
types = listOf(
PokemonType(id = fireTypeID, name = "fire", slot = 1),
PokemonType(id = flyingTypeID, name = "flying", slot = 2)
)
)
fun moltres(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = moltresID, name = "moltres"),
species = PokemonSpecies(
id = moltresSpeciesID,
species = "Fire bird pokemon",
pokedexEntry = "Known as the legendary bird of fire. Every flap of its wings creates a dazzling flash of flames"
),
types = listOf(
PokemonType(id = fireTypeID, name = "fire", slot = 1),
PokemonType(id = flyingTypeID, name = "flying", slot = 2)
)
)
fun pidgey(): PokemonWithTypesAndSpecies = PokemonWithTypesAndSpecies(
pokemon = Pokemon(id = pidgeyID, name = "pidgey"),
species = PokemonSpecies(
id = pidgeySpeciesID,
species = "Bird pokemon",
pokedexEntry = "Pidgey is a Flying Pokémon. Among all the Flying Pokémon, it is the gentlest and easiest to capture. A perfect target for the beginning Pokémon Trainer to test his Pokémon's skills."
),
types = listOf(PokemonType(id = flyingTypeID, name = "flying", slot = 1))
)
And the query would be
#Query("SELECT * FROM Pokemon INNER JOIN PokemonType, PokemonTypesJoin ON Pokemon.pokemon_id = PokemonTypesJoin.pokemon_id AND PokemonType.type_id = PokemonTypesJoin.type_id WHERE pokemon_name LIKE :search AND type_name IN (:filters) ORDER BY pokemon_id ASC")
fun searchAndFilterPokemon(search: String, filters: List<String>): LiveData<List<PokemonWithTypesAndSpecies>>
I'm guessing this doesn't work because at this point Room hasn't collected the types from the other table and it's probably not even querying a list, I think this part
type_name IN (:filters)
is checking a column against a list when what I want is a List against a list 🤷‍♂️ but honestly I'm happy to just say I've fallen and can't get up 🤣 can anyone help? any help appreciated
Maybe I could misuse some columns' names, but try this query:
#Query("SELECT pok.id, pok.name FROM Pokemon AS pok
INNER JOIN PokemonTypesJoin AS p_join ON pok.id = p_join.pokemon_id
INNER JOIN PokemonType AS pok_type ON pok_type.id = p_join.type_id
WHERE pok.name LIKE :search AND pok_type.name IN (:filters)
GROUP BY pok.id, pok.name ORDER BY count(*) DESC, pok.id ASC")
Have you compared Room to Cmobilecom-JPA for android? JPA is very good at query relationships. The advantage of using JPA (standard) is obvious, making your code reusable on android, server side java, or swing project.
Thanks to #serglytikhonov my query works and now looks like this
#Query("""SELECT * FROM Pokemon
INNER JOIN PokemonTypesJoin
ON Pokemon.pokemon_id = PokemonTypesJoin.pokemon_id
INNER JOIN PokemonType
ON PokemonType.type_id = PokemonTypesJoin.type_id
WHERE pokemon_name LIKE :search AND type_name IN (:filters)
GROUP BY Pokemon.pokemon_id, Pokemon.pokemon_name
ORDER BY count(*) DESC, pokemon_id ASC""")
fun searchAndFilterPokemon(search: String, filters: List<String>): LiveData<List<PokemonWithTypesAndSpecies>>
the main piece being this count(*) and group by many thanks

How do I add significance asterisks next to my values in a correlation matrix heat map?

I found this code online at http://www.sthda.com/english/wiki/ggplot2-quick-correlation-matrix-heatmap-r-software-and-data-visualization
It provides instructions for how to create a correlation matrix heat map and it works well. However, I was wondering how to get little stars * next to the values in the matrix that are significant. How would I go about doing that. Any help is greatly appreciated!!
mydata <- mtcars[, c(1,3,4,5,6,7)]
head(mydata)
cormat <- round(cor(mydata),2)
head(cormat)
library(reshape2)
melted_cormat <- melt(cormat)
head(melted_cormat)
library(ggplot2)
ggplot(data = melted_cormat, aes(x=Var1, y=Var2, fill=value)) +
geom_tile()
# Get lower triangle of the correlation matrix
get_lower_tri<-function(cormat){
cormat[upper.tri(cormat)] <- NA
return(cormat)
}
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)]<- NA
return(cormat)
}
upper_tri <- get_upper_tri(cormat)
# Melt the correlation matrix
library(reshape2)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Heatmap
library(ggplot2)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
reorder_cormat <- function(cormat){
# Use correlation between variables as distance
dd <- as.dist((1-cormat)/2)
hc <- hclust(dd)
cormat <-cormat[hc$order, hc$order]
}
# Reorder the correlation matrix
cormat <- reorder_cormat(cormat)
upper_tri <- get_upper_tri(cormat)
# Melt the correlation matrix
melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
geom_tile(color = "white")+
scale_fill_gradient2(low = "blue", high = "red", mid = "white",
midpoint = 0, limit = c(-1,1), space = "Lab",
name="Pearson\nCorrelation") +
theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(angle = 45, vjust = 1,
size = 12, hjust = 1))+
coord_fixed()
# Print the heatmap
print(ggheatmap)
ggheatmap +
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.major = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.ticks = element_blank(),
legend.justification = c(1, 0),
legend.position = c(0.6, 0.7),
legend.direction = "horizontal")+
guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
title.position = "top", title.hjust = 0.5))
cor() doesn't show the significance level, you may have to use rcorr() from Hmisc package
This is quite similar to what you want (the graphic output is not so nice though)
library(ggplot2)
library(reshape2)
library(Hmisc)
library(stats)
abbreviateSTR <- function(value, prefix){ # format string more concisely
lst = c()
for (item in value) {
if (is.nan(item) || is.na(item)) { # if item is NaN return empty string
lst <- c(lst, '')
next
}
item <- round(item, 2) # round to two digits
if (item == 0) { # if rounding results in 0 clarify
item = '<.01'
}
item <- as.character(item)
item <- sub("(^[0])+", "", item) # remove leading 0: 0.05 -> .05
item <- sub("(^-[0])+", "-", item) # remove leading -0: -0.05 -> -.05
lst <- c(lst, paste(prefix, item, sep = ""))
}
return(lst)
}
d <- mtcars
cormatrix = rcorr(as.matrix(d), type='spearman')
cordata = melt(cormatrix$r)
cordata$labelr = abbreviateSTR(melt(cormatrix$r)$value, 'r')
cordata$labelP = abbreviateSTR(melt(cormatrix$P)$value, 'P')
cordata$label = paste(cordata$labelr, "\n",
cordata$labelP, sep = "")
cordata$strike = ""
cordata$strike[cormatrix$P > 0.05] = "X"
txtsize <- par('din')[2] / 2
ggplot(cordata, aes(x=Var1, y=Var2, fill=value)) + geom_tile() +
theme(axis.text.x = element_text(angle=90, hjust=TRUE)) +
xlab("") + ylab("") +
geom_text(label=cordata$label, size=txtsize) +
geom_text(label=cordata$strike, size=txtsize * 4, color="red", alpha=0.4)
Source
difference_p is the P_value of correlation matrix,
ax5 draws the sns.heatmap and return as ax5
data=correlation_p
for y in range(data.shape[0]):
for x in range(data.shape[1]):
if data[y,x]<0.1:
ax4.text(x + 0.5, y + 0.5, '-',size=48,
horizontalalignment='center',
verticalalignment='center',
)

Scrapy returns no output - just a [

I'm trying to run the spider found in this crawler and for simplicity sake I'm using this start_url because it is just a list of 320 movies. (So, the crawler won't run for 5 hours as given in the github page).
I crawl using scrapy crawl imdb -o output.json but the output.json file contains nothing. It has just a [ in it.
import scrapy
from texteval.items import ImdbMovie, ImdbReview
import urlparse
import math
import re
class ImdbSpider(scrapy.Spider):
name = "imdb"
allowed_domains = ["imdb.com"]
start_urls = [
# "http://www.imdb.com/chart/top",
# "http://www.imdb.com/chart/bottom"
"http://www.imdb.com/search/title?countries=csxx&sort=moviemeter,asc"
]
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.robotstxt.ROBOTSTXT_OBEY': True,
}
base_url = "http://www.imdb.com"
def parse(self, response):
movies = response.xpath("//*[#id='main']/table/tr/td[3]/a/#href")
for i in xrange(len(movies)):
l = self.base_url + movies[i].extract()
print l
request = scrapy.Request(l, callback=self.parse_movie)
yield request
next = response.xpath("//*[#id='right']/span/a")[-1]
next_url = self.base_url + next.xpath(".//#href")[0].extract()
next_text = next.xpath(".//text()").extract()[0][:4]
if next_text == "Next":
request = scrapy.Request(next_url, callback=self.parse)
yield request
'''
for sel in response.xpath("//table[#class='chart']/tbody/tr"):
url = urlparse.urljoin(response.url, sel.xpath("td[2]/a/#href").extract()[0].strip())
request = scrapy.Request(url, callback=self.parse_movie)
yield request
'''
def parse_movie(self, response):
movie = ImdbMovie()
i1 = response.url.find('/tt') + 1
i2 = response.url.find('?')
i2 = i2 - 1 if i2 > -1 else i2
movie['id'] = response.url[i1:i2]
movie['url'] = "http://www.imdb.com/title/" + movie['id']
r_tmp = response.xpath("//div[#class='titlePageSprite star-box-giga-star']/text()")
if r_tmp is None or r_tmp == "" or len(r_tmp) < 1:
return
movie['rating'] = int(float(r_tmp.extract()[0].strip()) * 10)
movie['title'] = response.xpath("//span[#itemprop='name']/text()").extract()[0]
movie['reviews_url'] = movie['url'] + "/reviews"
# Number of reviews associated with this movie
n = response.xpath("//*[#id='titleUserReviewsTeaser']/div/div[3]/a[2]/text()")
if n is None or n == "" or len(n) < 1:
return
n = n[0].extract().replace("See all ", "").replace(" user reviews", "")\
.replace(" user review", "").replace(",", "").replace(".", "").replace("See ", "")
if n == "one":
n = 1
else:
n = int(n)
movie['number_of_reviews'] = n
r = int(math.ceil(n / 10))
for x in xrange(1, r):
start = x * 10 - 10
url = movie['reviews_url'] + "?start=" + str(start)
request = scrapy.Request(url, callback=self.parse_review)
request.meta['movieObj'] = movie
yield request
def parse_review(self, response):
ranks = response.xpath("//*[#id='tn15content']/div")[0::2]
texts = response.xpath("//*[#id='tn15content']/p")
del texts[-1]
if len(ranks) != len(texts):
return
for i in xrange(0, len(ranks) - 1):
review = ImdbReview()
review['movieObj'] = response.meta['movieObj']
review['text'] = texts[i].xpath("text()").extract()
rating = ranks[i].xpath(".//img[2]/#src").re("-?\\d+")
if rating is None or rating == "" or len(rating) < 1:
return
review['rating'] = int(rating[0])
yield review
Can someone tell me where am I going wrong?
In my opinion, this web site should be load the list of movies use by js. Fristly, I suggest you should check the output about: movies = response.xpath("//*[#id='main']/table/tr/td[3]/a/#href"). If you want to get js content, you can use webkit in scrapy as a downloader middleware.