Error: 'multiclass.roc.default' is not an exported object from 'namespace:pROC' - mlr

I'm using mlr R package (v2.3), trying to use multiclass auc as one of the measures when using CV in a 3-class classification problem. But, I'm getting the following error:
Error: 'multiclass.roc.default' is not an exported object from 'namespace:pROC'
I believe I have all the dependencies installed already (and don't believe that is the actual issue). The error is produced when trying to run the 'resample' function, as follows:
tsk <- makeClassifTask(data = dat, target = "best.model")
lrn <- makeLearner("classif.rpart", predict.type = "prob")
rdesc <- makeResampleDesc("CV", iters=10)
r <- resample(learner = lrn, task = tsk, resampling = rdesc, show.info = T, measures = list(acc, multiclass.auc))
Any suggestion?

This works fine with the latest version of mlr (2.4).

Related

convert pb file to tflite file for running on Coral Dev Board (Segmentation fault (core dumped) )

How to convert a pb file into tflite file using python3 or in terminal.
Don't know any of the details of the model.Here is the link of the pb file.
(Edited)
I have done converting pb file to tflite file by the following code :
import tensorflow.compat.v1 as tf
import numpy as np
graph_def_file = "./models/20170512-110547.pb"
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter = tf.lite.TFLiteConverter.from_frozen_graph(graph_def_file,
input_arrays=["input","phase_train"],
output_arrays=["embeddings"],
input_shapes={"input":[1,160,160,3],"phase_train":False})
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
print("converting")
open("./models/converted_model.tflite", "wb").write(tflite_model)
print("Done")
Error :Getting Segmentation fault(core dumped)
2020-01-20 11:42:18.153263: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-01-20 11:42:18.153363: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-01-20 11:42:18.153385: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-20 11:42:18.905028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-20 11:42:18.906845: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-01-20 11:42:18.906874: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (kalgudi-GA-78LMT-USB3-6-0): /proc/driver/nvidia/version does not exist
2020-01-20 11:42:18.934144: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3616020000 Hz
2020-01-20 11:42:18.934849: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x39aa0f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-20 11:42:18.934910: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Segmentation fault (core dumped)
with out any details of the model, you cannot convert it to a .tflite model. I suggest that you go through this document for Post Training Quantization again. As there are too many details that is redundant to show here.
Here is an example for post training quantization of a frozen graph. The model is taken from here (you can see that it's a full tarball of infomation about the model)
import sys, os, glob
import tensorflow as tf
import pathlib
import numpy as np
if len(sys.argv) != 2:
print('Usage: <' + sys.argv[0] + '> <frozen_graph_file> <representative_image_dir>')
exit()
tf.compat.v1.enable_eager_execution()
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)
def fake_representative_data_gen():
for _ in range(100):
fake_image = np.random.random((1,192,192,3)).astype(np.float32)
yield [fake_image]
frozen_graph = sys.argv[1]
input_array = ['input']
output_array = ['MobilenetV1/Predictions/Reshape_1']
converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(frozen_graph, input_array, output_array)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = fake_representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
quant_dir = pathlib.Path(os.getcwd(), 'output')
quant_dir.mkdir(exist_ok=True, parents=True)
tflite_model_file = quant_dir/'mobilenet_v1_0.25_192_quant.tflite'
tflite_model_file.write_bytes(tflite_model)

How to predict Sentiments after training and testing the model by using NLTK NaiveBayesClassifier in Python?

I am doing sentiment classification using NLTK NaiveBayesClassifier. I trained and test the model with the labeled data. Now I want to predict sentiments of the data that is not labeled. However, I run into the error.
The line that is giving error is :
score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))
The error is :
ValueError: not enough values to unpack (expected 2, got 1)
Below is the code:
import random
import pandas as pd
data = pd.read_csv("label data for testing .csv", header=0)
sentiment_data = list(zip(data['Articles'], data['Sentiment']))
random.shuffle(sentiment_data)
new_data = pd.read_csv("Japan Data.csv", header=0)
train_x, train_y = zip(*sentiment_data[:350])
test_x, test_y = zip(*sentiment_data[350:])
from unidecode import unidecode
from nltk import word_tokenize
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import extract_unigram_feats
TRAINING_COUNT = 350
def clean_text(text):
text = text.replace("<br />", " ")
return text
analyzer = SentimentAnalyzer()
vocabulary = analyzer.all_words([(word_tokenize(unidecode(clean_text(instance))))
for instance in train_x[:TRAINING_COUNT]])
print("Vocabulary: ", len(vocabulary))
print("Computing Unigran Features ...")
unigram_features = analyzer.unigram_word_feats(vocabulary, min_freq=10)
print("Unigram Features: ", len(unigram_features))
analyzer.add_feat_extractor(extract_unigram_feats, unigrams=unigram_features)
# Build the training set
_train_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
for instance in train_x[:TRAINING_COUNT]], labeled=False)
# Build the test set
_test_X = analyzer.apply_features([(word_tokenize(unidecode(clean_text(instance))))
for instance in test_x], labeled=False)
trainer = NaiveBayesClassifier.train
classifier = analyzer.train(trainer, zip(_train_X, train_y[:TRAINING_COUNT]))
score = analyzer.evaluate(list(zip(_test_X, test_y)))
print("Accuracy: ", score['Accuracy'])
score_1 = analyzer.evaluate(list(zip(new_data['Articles'])))
print(score_1)
I understand that the problem is arising because I have to give two parameters is the line which is giving an error but I don't know how to do this.
Thanks in Advance.
Documentation and example
The line that gives you the error calls the method SentimentAnalyzer.evaluate(...) .
This method does the following.
Evaluate and print classifier performance on the test set.
See SentimentAnalyzer.evaluate.
The method has one mandatory parameter: test_set .
test_set – A list of (tokens, label) tuples to use as gold set.
In the example at http://www.nltk.org/howto/sentiment.html test_set has the following structure:
[({'contains(,)': False, 'contains(.)': True, 'contains(and)': False, 'contains(the)': True}, 'subj'), ({'contains(,)': True, 'contains(.)': True, 'contains(and)': False, 'contains(the)': True}, 'subj'), ...]
Here is a symbolic representation of the structure.
[(dictionary,label), ... , (dictionary,label)]
Error in your code
You are passing
list(zip(new_data['Articles']))
to SentimentAnalyzer.evaluate. I assume your getting the error because
list(zip(new_data['Articles']))
does not create a list of (tokens, label) tuples. You can check that by creating a variable which contains the list and printing it or looking at the value of the variable while debugging.
E.G.
test_set = list(zip(new_data['Articles']))
print("begin test_set")
print(test_set)
print("end test_set")
You are calling evaluate correctly 3 lines above the one that is giving the error.
score = analyzer.evaluate(list(zip(_test_X, test_y)))
I guess you want to call SentimentAnalyzer.classify(instance) to predict unlabeled data. See SentimentAnalyzer.classify.

import chosen columns of json file in r

Can't find any solution how to load huge JSON. I try with well known Yelp dataset. It's 3.2 GB and I want to analyse 9 out of 10 columns. I need to skip import $text column, which will give me much lighter file to load. Probably about -70%. I don't want to manipulate the file.
I tried many libraries and stuck. I've found a solution for data.frame to apply pipe function:
df <- read.table(pipe("cut -f1,5,28 myFile.txt"))
from this thread: Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?)
How to do it for JSON? I'd like to do:
json <- read.table(pipe("cut -text yelp_academic_dataset_review.json"))
but this of course throws an error due to wrong format. Is there any possibilities without parsing whole file with regex?
EDIT
Structure of one row: (even can't count them all)
{"review_id":"vT2PALXWX794iUOoSnNXSA","user_id":"uKGWRd4fONB1cXXpU73urg","business_id":"D7FK-xpG4LFIxpMauvUStQ","stars":1,"date":"2016-10-31","text":"some long text here","useful":0,"funny":0,"cool":0,"type":"review"}
SECOND EDIT
Finally, I've created a loop to convert all data into csv file, omitted unwanted column. It's slow but I've got 150 mb (zipped) from 3.2 gb.
# files to process
filepath <- jfile1
fcsv <- "c:/Users/cp/Documents/R DATA/Yelp/tests/reviews.csv"
write.table(x = paste(colns, collapse=","), file = fcsv, quote = F, row.names = F, col.names = F)
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
break
}
# regex process
d <- NULL
for (i in rcols) {
pst <- paste(".*\"",colns[i],"\":*(.*?) *,\"",colns[i+1],"\":.*", sep="")
w <- sub(pst, "\\1", line)
d <- cbind(d, noquote(w))
}
# save on the fly
write.table(x = paste(d, collapse = ","), file = fcsv, append = T, quote = F, row.names = F, col.names = F)
}
close(con)
It can be save to json also. I wonder if it's the most efficient way, but other scripts I tested was slow and often had some encoding issues.
Try this:
library(jsonlite)
df <- as.data.frame(fromJSON('yelp_academic_dataset_review.json', flatten=TRUE))
Then once it is a dataframe delete the column(s) you don't need.
If you don't want to manipulate the file in advance of importing it, I'm not sure what options you have in R. Alternatively, you could make a copy of the file, then delete the text column with this script, then import the copy to R, then delete the copy.

FiPy Simple Convection

I am trying to understand how FiPy works by working an example, in particular I would like to solve the following simple convection equation with periodic boundary:
$$\partial_t u + \partial_x u = 0$$
If initial data is given by $u(x, 0) = F(x)$, then the analytical solution is $u(x, t) = F(x - t)$. I do get a solution, but it is not correct.
What am I missing? Is there a better resource for understanding FiPy than the documentation? It is very sparse...
Here is my attempt
from fipy import *
import numpy as np
# Generate mesh
nx = 20
dx = 2*np.pi/nx
mesh = PeriodicGrid1D(nx=nx, dx=dx)
# Generate solution object with initial discontinuity
phi = CellVariable(name="solution variable", mesh=mesh)
phiAnalytical = CellVariable(name="analytical value", mesh=mesh)
phi.setValue(1.)
phi.setValue(0., where=x > 1.)
# Define the pde
D = [[-1.]]
eq = TransientTerm() == ConvectionTerm(coeff=D)
# Set discretization so analytical solution is exactly one cell translation
dt = 0.01*dx
steps = 2*int(dx/dt)
# Set the analytical value at the end of simulation
phiAnalytical.setValue(np.roll(phi.value, 1))
for step in range(steps):
eq.solve(var=phi, dt=dt)
print(phi.allclose(phiAnalytical, atol=1e-1))
As addressed on the FiPy mailing list, FiPy is not great at handling convection only PDEs (absent diffusion, pure hyperbolic) as it's missing higher order convection schemes. It is better to use CLAWPACK for this class of problem.
FiPy does have one second order scheme that might help with this problem, the VanLeerConvectionTerm, see an example.
If the VanLeerConvectionTerm is used in the above problem, it does do a better job of preserving the shock.
import numpy as np
import fipy
# Generate mesh
nx = 20
dx = 2*np.pi/nx
mesh = fipy.PeriodicGrid1D(nx=nx, dx=dx)
# Generate solution object with initial discontinuity
phi = fipy.CellVariable(name="solution variable", mesh=mesh)
phiAnalytical = fipy.CellVariable(name="analytical value", mesh=mesh)
phi.setValue(1.)
phi.setValue(0., where=mesh.x > 1.)
# Define the pde
D = [[-1.]]
eq = fipy.TransientTerm() == fipy.VanLeerConvectionTerm(coeff=D)
# Set discretization so analytical solution is exactly one cell translation
dt = 0.01*dx
steps = 2*int(dx/dt)
# Set the analytical value at the end of simulation
phiAnalytical.setValue(np.roll(phi.value, 1))
viewer = fipy.Viewer(phi)
for step in range(steps):
eq.solve(var=phi, dt=dt)
viewer.plot()
raw_input('stopped')
print(phi.allclose(phiAnalytical, atol=1e-1))

Calculate inverse log-weighted similarity in a bimodal network, igraph in Python

I'm trying to calculate Adamic-Adar similarity for a network, which have two types of nodes. I'm only interested in calculating similarity between nodes which have outgoing connections. Nodes with incoming connections are a kind of connector and I'm not interested in them.
Data size and characteristic:
> summary(g)
IGRAPH DNW- 3852 24478 --
+ attr: name (v/c), weight (e/n)
Prototype code in Python 2.7:
import glob
import os
import pandas as pd
from igraph import *
os.chdir("data/")
for file in glob.glob("*.graphml"):
print(file)
g = Graph.Read_GraphML(file)
indegree = Graph.degree(g, mode="in")
g['indegree'] = indegree
dev = g.vs.select(indegree == 0)
m = Graph.similarity_inverse_log_weighted(dev.subgraph())
df = pd.melt(m)
df.to_csv(file.split("_only.graphml")[0] + "_similarity.csv", sep=',')
There is something wrong with this code, because dev is of length 1, and m is 0.0, so it doesn't work as expected.
Hint
I have a working code in R, but seems like I'm unable to rewrite it to Python (which I'm doing for the sake of performance, networks are huge). Here it is:
# make sure g is your network
indegree <- degree(g, mode="in")
V(g)$indegree <- indegree
dev <- V(g)[indegree==0]
m <- similarity.invlogweighted(g, dev)
x.m <- melt(m)
colnames(x.m) <- c("dev1", "dev2", "value")
x.m <- x.m[x.m$value > 0, ]
write.csv(x.m, file = sub(".csv",
"_similarity.csv", filename))
You are assigning the in-degrees as a graph attribute, not as a vertex attribute, so you cannot reasonably call g.vs.select() later on. You need this instead:
indegree = g.degree(mode="in")
g.vs["indegree"] = indegree
dev = g.vs.select(indegree=0)
But actually, you could simply write this:
dev = g.vs.select(_indegree=0)
This works because of how the select method works:
Attribute names inferred from keyword arguments are treated specially
if they start with an underscore (_). These are not real attributes
but refer to specific properties of the vertices, e.g., its degree.
The rule is as follows: if an attribute name starts with an underscore,
the rest of the name is interpreted as a method of the Graph object.
This method is called with the vertex sequence as its first argument
(all others left at default values) and vertices are filtered
according to the value returned by the method.