R pheatmap row annotation and title font size questions - heatmap

I have been trying to add row annotation in my heatmap created by pheatmap in R. Basically I have a csv file with one particular column (Group) to be used for row annotation in heatmap. However, I'm having trouble with my following codes. The other two issues are: the font size of the title is apparently too big but I could not find a way to decrease it. And I wanted to set the values of any zero to color purewhite but I am not sure it is really white in my output file. The input csv file and output pdf files are linked. I am sticking with pheatmap here since I found that it creates the heatmap that fits my need better than other heatmap functions. Suggestions are appreciated.
> library("pheatmap")
> data <- read.csv("/Users/neo/Test_BP_052215.csv", header = TRUE, row.names = 2, stringsAsFactors=F)
> head(data)
Group WT KO1 KO2
GO:0018904 organic ether metabolic process Metabolism 12.17372951 0.000000 -15.006995
GO:0006641 triglyceride metabolic process Metabolism 5.200847907 0.000000 0.000000
GO:0045444 fat cell differentiation Metabolism 6.374521098 0.000000 -7.927192
GO:0006639 acylglycerol metabolic process Metabolism 6.028616852 0.000000 0.000000
GO:0016125 sterol metabolic process Metabolism 5.760678325 8.262778 0.000000
GO:0016126 sterol biosynthetic process Metabolism -6.237114754 9.622373 0.000000
> heatdata <- data[,-1]
> head(heatdata)
WT KO1 KO2
GO:0018904 organic ether metabolic process 12.17372951 0.000000 -15.006995
GO:0006641 triglyceride metabolic process 5.200847907 0.000000 0.000000
GO:0045444 fat cell differentiation 6.374521098 0.000000 -7.927192
GO:0006639 acylglycerol metabolic process 6.028616852 0.000000 0.000000
GO:0016125 sterol metabolic process 5.760678325 8.262778 0.000000
GO:0016126 sterol biosynthetic process -6.237114754 9.622373 0.000000
> annotation_row <- data.frame(Group = data[,1])
> rownames(annotation_row) = paste("Group", 1:38, sep = "")
> ann_colors = list( Group = c(Metabolism="navy", Cellular="skyblue", Signal="steelblue", Transport="green", Cell="purple", Protein="yellow", Other="firebrick") )
> head(annotation_row)
Group
Group1 Metabolism
Group2 Metabolism
Group3 Metabolism
Group4 Metabolism
Group5 Metabolism
Group6 Metabolism
> col_breaks = unique(c(seq(-16,-0.5,length=200), seq(-0.5,0.5,length=200), seq(0.5,20,length=200)))
> my_palette <- colorRampPalette(c("blue", "white", "red"))(n = 599)
> pheatmap(heatdata, main="Enrichment", color=my_palette, breaks=col_breaks, border_color = "grey20", cellwidth = 15, cellheight = 12, scale = "none", annotation_row = annotation_row, annotation_colors = ann_colors, cluster_rows = F, cluster_cols=F, fontsize_row=10, filename="heatmap_BP_test.pdf")

Related

How to use HuggingFace nlp library's GLUE for CoLA

I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to proceed.
What I've tried so far;
reference and prediction are 2 text sentences
!pip install transformers
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
reference="Security has been beefed across the country as a 2 day nation wide curfew came into effect."
prediction="Security has been tightened across the country as a 2-day nationwide curfew came into effect."
import nlp
glue_metric = nlp.load_metric('glue',name="cola")
#Using BertTokenizer
encoded_reference=tokenizer.encode(reference, add_special_tokens=False)
encoded_prediction=tokenizer.encode(prediction, add_special_tokens=False)
glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
Error I'm getting;
ValueError Traceback (most recent call last)
<ipython-input-9-4c3a3ce7b583> in <module>()
----> 1 glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
6 frames
/usr/local/lib/python3.6/dist-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
198 predictions = self.data["predictions"]
199 references = self.data["references"]
--> 200 output = self._compute(predictions=predictions, references=references, **metrics_kwargs)
201 return output
202
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in _compute(self, predictions, references)
101 return pearson_and_spearman(predictions, references)
102 elif self.config_name in ["mrpc", "qqp"]:
--> 103 return acc_and_f1(predictions, references)
104 elif self.config_name in ["sst2", "mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]:
105 return {"accuracy": simple_accuracy(predictions, references)}
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in acc_and_f1(preds, labels)
60 def acc_and_f1(preds, labels):
61 acc = simple_accuracy(preds, labels)
---> 62 f1 = f1_score(y_true=labels, y_pred=preds)
63 return {
64 "accuracy": acc,
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
1097 pos_label=pos_label, average=average,
1098 sample_weight=sample_weight,
-> 1099 zero_division=zero_division)
1100
1101
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight, zero_division)
1224 warn_for=('f-score',),
1225 sample_weight=sample_weight,
-> 1226 zero_division=zero_division)
1227 return f
1228
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
1482 raise ValueError("beta should be >=0 in the F-beta score")
1483 labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1484 pos_label)
1485
1486 # Calculate tp_sum, pred_sum, true_sum ###
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
1314 raise ValueError("Target is %s but average='binary'. Please "
1315 "choose another average setting, one of %r."
-> 1316 % (y_type, average_options))
1317 elif pos_label not in (None, 1):
1318 warnings.warn("Note that pos_label (set to %r) is ignored when "
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
However, I'm able to get results (pearson and spearmanr) for 'stsb' with the same workaround as given above.
Some help and a workaround for(cola) this is really appreciated. Thank you.
In general, if you are seeing this error with HuggingFace, you are trying to use the f-score as a metric on a text classification problem with more than 2 classes. Pick a different metric, like "accuracy".
For this specific question:
Despite what you entered, it is trying to compute the f-score. From the example notebook, you should set the metric name as:
metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"

what type of model can i use to train this data

I have downloaded and labeled data from
http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring
my task is to gain an insight into the data from what is given, I have round 34 attributes in a data frame(all clean no nan values)
and want to train a model based on one target attribute 'heart_rate' given the rest of the attributes(all are numbers of a participant performing various activities )
I wanted to use Linear regression model but can not use my dataframe for some reason, however, I do not mind starting from 0 if you think I am doing it wrong
my DataFrame columns:
> Index(['timestamp', 'activity_ID', 'heart_rate', 'IMU_hand_temp',
> 'hand_acceleration_16_1', 'hand_acceleration_16_2',
> 'hand_acceleration_16_3', 'hand_gyroscope_rad_7',
> 'hand_gyroscope_rad_8', 'hand_gyroscope_rad_9',
> 'hand_magnetometer_μT_10', 'hand_magnetometer_μT_11',
> 'hand_magnetometer_μT_12', 'IMU_chest_temp', 'chest_acceleration_16_1',
> 'chest_acceleration_16_2', 'chest_acceleration_16_3',
> 'chest_gyroscope_rad_7', 'chest_gyroscope_rad_8',
> 'chest_gyroscope_rad_9', 'chest_magnetometer_μT_10',
> 'chest_magnetometer_μT_11', 'chest_magnetometer_μT_12',
> 'IMU_ankle_temp', 'ankle_acceleration_16_1', 'ankle_acceleration_16_2',
> 'ankle_acceleration_16_3', 'ankle_gyroscope_rad_7',
> 'ankle_gyroscope_rad_8', 'ankle_gyroscope_rad_9',
> 'ankle_magnetometer_μT_10', 'ankle_magnetometer_μT_11',
> 'ankle_magnetometer_μT_12', 'Intensity'],
> dtype='object')
first 5 rows:
timestamp activity_ID heart_rate IMU_hand_temp hand_acceleration_16_1 hand_acceleration_16_2 hand_acceleration_16_3 hand_gyroscope_rad_7 hand_gyroscope_rad_8 hand_gyroscope_rad_9 ... ankle_acceleration_16_1 ankle_acceleration_16_2 ankle_acceleration_16_3 ankle_gyroscope_rad_7 ankle_gyroscope_rad_8 ankle_gyroscope_rad_9 ankle_magnetometer_μT_10 ankle_magnetometer_μT_11 ankle_magnetometer_μT_12 Intensity
2928 37.66 lying 100.0 30.375 2.21530 8.27915 5.58753 -0.004750 0.037579 -0.011145 ... 9.73855 -1.84761 0.095156 0.002908 -0.027714 0.001752 -61.1081 -36.8636 -58.3696 low
2929 37.67 lying 100.0 30.375 2.29196 7.67288 5.74467 -0.171710 0.025479 -0.009538 ... 9.69762 -1.88438 -0.020804 0.020882 0.000945 0.006007 -60.8916 -36.3197 -58.3656 low
2930 37.68 lying 100.0 30.375 2.29090 7.14240 5.82342 -0.238241 0.011214 0.000831 ... 9.69633 -1.92203 -0.059173 -0.035392 -0.052422 -0.004882 -60.3407 -35.7842 -58.6119 low
2931 37.69 lying 100.0 30.375 2.21800 7.14365 5.89930 -0.192912 0.019053 0.013374 ... 9.66370 -1.84714 0.094385 -0.032514 -0.018844 0.026950 -60.7646 -37.1028 -57.8799 low
2932 37.70 lying 100.0 30.375 2.30106 7.25857 6.09259 -0.069961 -0.018328 0.004582 ... 9.77578 -1.88582 0.095775 0.001351 -0.048878 -0.006328 -60.2040 -37.1225 -57.8847 low
if you check the timestamp attribute you will see that the data acquired is in milliseconds so it might be a good idea to use the data from this dataframe as in every 2-5 seconds and train the model
also as an option, I want to use as one of these models for this task Linear,polynomial, multiple linear, agglomerative clustering and kmeans clustering.
my code:
target = subject1.DataFrame(data.target, columns=["heart_rate"])
X = df
y = target[“heart_rate”]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]
Error:
AttributeError Traceback (most recent call last)
<ipython-input-93-b0c3faad3a98> in <module>()
3 #heart_rate
4 # Put the target (housing value -- MEDV) in another DataFrame
----> 5 target = subject1.DataFrame(data.target, columns=["heart_rate"])
c:\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'DataFrame'
for fixing the error I have used:
subject1.columns = subject1.columns.str.strip()
but still did not work
Thank you, sorry if I was not precise enough.
Try this:
X = df.drop("heart_rate", axis=1)
y = df[[“heart_rate”]]
X=X.apply(zscore)
test_size=0.30
seed=7
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=test_size, random_state=seed)
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]

Write directly to MySQL DB from zipped file

I have a series of large zipped files that I have been unzipping to load directly into a MySQL database for querying from R.
I'll continue with this example (on x86_64 GNU/Linux):
> write.csv(iris, file = "iris.csv", row.names = FALSE, quote = FALSE)
> system("gzip iris.csv")
> list.files(pattern = "iris")
[1] "iris.csv" "iris.csv.gz"
I currently load the unzipped file in the following way:
> library(RSQLite)
> con <- dbConnect(RSQLite::SQLite(), dbname = "test_db")
> dbWriteTable(con, name = "iris", value = "iris.csv", field.types = list(Sepal.Length = "decimal(6, 2)", Sepal.Width = "decimal(6, 2)", Petal.Length = "decimal(6, 2)", Petal.Width = "decimal(6, 2)", Species = "varchar(15)"), row.names = FALSE)
[1] TRUE
I am wondering if it is possible to do a direct table write to the DB using the zipped file iris.csv.gz?
EDIT:
I am aware of gzfile but to my understanding its usage would necessitate bringing the file into memory before writing to the MySQL DB, something I am looking to avoid (correct me if I'm misunderstanding)

Importing *.dae file with multiple texture to papervision 3D

usually when you export a 3D object with *.dae format there's a folder that comes with the file, the folder contains the texture of the object, does anybody know how to add a *.dae file and its texture to our project ?
You should place textures into folder with *.dae and load your object and textures like this:
var bm:BitmapFileMaterial = new BitmapFileMaterial('PATH_TO_TEXTURE', true);
var mat:MaterialsList = new MaterialsList();
mat.addMaterial(bm2, 'MATERIAL_NAME');
mat.addMaterial(bm3, 'ANOTHER_MATERIAL_NAME');
var obj:DAE = new DAE();
obj.useOwnContainer = true;
obj.load('PATH_TO_DAE', mat);
Also, materials should be correctly linked in *.dae. Something like this:
...
<library_images>
<image id="TEXTURE_NAME-image" name="TEXTURE_NAME">
<init_from>2/TEXTURE_NAME.png</init_from>
</image>
</library_images>
<library_materials>
<material id="TEXTURE_NAME" name="TEXTURE_NAME">
<instance_effect url="#TEXTURE_NAME-fx"/>
</material>
</library_materials>
...
<library_visual_scenes>
<visual_scene id="RootNode" name="RootNode">
<node id="TEXTURE_NAME_tp3_Mesh01" name="TEXTURE_NAME_tp3_Mesh01">
<matrix sid="matrix">1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 -1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000</matrix>
<instance_geometry url="#TEXTURE_NAME_tp3_Mesh01-lib">
<bind_material>
<technique_common>
<instance_material symbol="MATERIAL_NAME" target="#MATERIAL_NAME"/>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
...

How does multi-texture OBJ->JSON converted files keeps track of face-texture mapping?

I'm trying to manually (no libs such as Three.js) load a JSON 3D model into my webGL code just for fun but I'm having a hard time when my models have more than 1 texture.
In a OBJ->JSON converted file, how do I know which texture is the "active" for the faces that follow? OBJ files use 'usemtl' tag to identify the texture/material in use but I can't seem to find that kind of pointer when working with JSONs.
In time, I'm using the OBJ->JSON converter written by alteredq
Thanks a bunch,
Rod
Take a look at this file: three.js / src / extras / loaders / JSONLoader.js.
The first element of each face in the faces array of the JSON file is a bit field. The first bit says if that face have three o four indices. And the second bit says if that face has a material assigned. Material index, if any, appears after indices.
Example: faces: [2, 46, 44, 42, 0, 1, 45, 46, 48, 3, ...
First face (triangle with material):
Type: 2 (00000010b)
Indices: 46, 44, 42
Material index: 0
Second face (quad without material):
Type: 1 (00000001b)
Indices: 45, 46, 48
Third face (quad with material):
Type: 3 (00000011b)
Indices: ...
Check source code for full meaning of that bit field.
In the OBJ->JSON converter I have written for the KickJS game engine, each material has its own range of indices.
This means a simple OBJ model such as
mtllib plane.mtl
o Plane
v 1.000000 0.000000 -1.000000
v 1.000000 0.000000 1.000000
v -1.000000 0.000000 1.000000
v -1.000000 0.000000 -1.000000
usemtl Material
s 1
f 2 3 4
usemtl Material.001
f 1 2 4
Would be translated into this (With two indices; one for each material):
[
{
"vertex": [1,0,1,-1,0,1,-1,0,-1,1,0,-1],
"name": "Plane mesh",
"normal": [0,-1,0,0,-1,0,0,-1,0,0,0,0],
"indices0": [0,1,2],
"indices1": [3,0,2]
}
]
Use the online model viewer for the convertion:
http://www.kickjs.org/example/model_viewer/model_viewer.html