problems outputting xarray to NETCDF4 - output

I've been writing out ensemble means from processing CMIP6 data and while it was writing out the first batch of data, it stumbled and I do not understand the error. Specifically I'm doing the following:
ds=xr.open_mfdataset(fnames,concat_dim='ensemble',combine='nested',decode_times=True)`
ds_emean=ds.mean(dim='ensemble') #creates ensemble mean
#write out ensemble mean
file_out=datapath+'ensemble/'+f+'_'+substr+'_ensemble.nc'
ds_emean.to_netcdf(path=file_out,mode='w',format='NETCDF4')
The ensemble mean it's stumbling on has 10 ensemble members:
<xarray.Dataset>
Dimensions: (time: 1032, ensemble: 10, bnds: 2, j: 300, i: 360, vertices: 4)
Coordinates:
time (time) datetime64[ns] 2015-01-16T12:00:00 ... 2100-12...
j (j) int32 0 1 2 3 4 5 6 ... 293 294 295 296 297 298 299
i (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359
latitude (j, i) float64 dask.array<chunksize=(300, 360), meta=np.ndarray>
longitude (j, i) float64 dask.array<chunksize=(300, 360), meta=np.ndarray>
Dimensions without coordinates: ensemble, bnds, vertices
Data variables:
time_bnds (ensemble, time, bnds) datetime64[ns] dask.array<chunksize=(1, 1032, 2), meta=np.ndarray>
vertices_latitude (ensemble, j, i, vertices) float64 dask.array<chunksize=(1, 300, 360, 4), meta=np.ndarray>
vertices_longitude (ensemble, j, i, vertices) float64 dask.array<chunksize=(1, 300, 360, 4), meta=np.ndarray>
sithick (ensemble, time, j, i) float32 dask.array<chunksize=(1, 1032, 300, 360), meta=np.ndarray>
Attributes: (12/47)
Conventions: CF-1.7 CMIP-6.2
activity_id: ScenarioMIP
branch_method: standard
branch_time_in_child: 60265.0
branch_time_in_parent: 60265.0
creation_date: 2020-09-08T10:22:01Z
...
variable_id: sithick
variant_label: r10i1p1f1
version: v20200908
license: CMIP6 model data produced by CSIRO is licensed un...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/4052b32c-80e4-468d-b940-52c98fedda73
This is the erorr I'm getting:
writing out /Volumes/LaCie/IPCC/CMIP6/ssp585/ensemble/sithick_ACCESS-ESM1_ensemble.nc
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 123145419157504:
#000: H5Dio.c line 199 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: H5Dio.c line 601 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#002: H5Dchunk.c line 2229 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#003: H5Dchunk.c line 3609 in H5D__chunk_lock(): data pipeline read failed
major: Dataset
minor: Filter operation failed
#004: H5Z.c line 1326 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#005: H5Zdeflate.c line 123 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 123145419157504:
#000: H5Dio.c line 199 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#001: H5Dio.c line 601 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#002: H5Dchunk.c line 2229 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#003: H5Dchunk.c line 3609 in H5D__chunk_lock(): data pipeline read failed
major: Dataset
minor: Filter operation failed
#004: H5Z.c line 1326 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#005: H5Zdeflate.c line 123 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object
Traceback (most recent call last):

Related

Coordinate "conversion of nan to int64" error

I am trying to correct the topography of some GPR data using GPR-O. The topographic data I have is in the form of coordinates and elevations for the four corners of each survey square. I have manually entered the coordinate, instead of calling on an xlsx file, to simply try it out using the following:
initialize
surveyparams.minline=01; % Lowest line number
surveyparams.nmorelines=4; % Would be 3 if you had 4 parallel lines with the numbers 0 1 2 3
surveyparams.lineincr=1.0; % Distance between the lines in meters
surveyparams.pnameraw='data/raw/Square_N_1/'; % Directory for the raw data
surveyparams.pnametrf='data/processed/Square_N_1/'; % Directroy for the processed data
preprawdata(surveyparams,3);
data=readdata(surveyparams);
vel=0.13;
elevdat=[226746,134520,867;226749,134524,867;226740,134529,866;226738,134525,866]
xpos=[]
elev=makeElev(elevdat,xpos,data,surveyparams);
data=elevCorrect(data,elev,vel);
maxelev=max(elev);
plotGPRline(data,0,2,vel,maxelev)
However, I receive the following error after the matrix has been created:
error: conversion of nan to int64_t value failed
error: called from
elevCorrect at line 44 column 8
Square_N_1 at line 22 column 12

Julia CSV.read not recognizing "select" keyword

I am reading in a space-delimited file using the CSV library in Julia.
edgeList = CSV.read(
joinpath(dataDirectory, "out.file"),
types=[Int, Int],
header=["node1", "node2"],
skipto=3,
select=[1,2]
)
This yields the following error:
MethodError: no method matching CSV.File(::String; types=DataType[Int64, Int64], header=["node1", "node2"], skipto=3, select=[1, 2])
Closest candidates are:
CSV.File(::Any; header, normalizenames, datarow, skipto, footerskip, limit, transpose, comment, use_mmap, ignoreemptylines, missingstrings, missingstring, delim, ignorerepeated, quotechar, openquotechar, closequotechar, escapechar, dateformat, decimal, truestrings, falsestrings, type, types, typemap, categorical, pool, strict, silencewarnings, threaded, debug, parsingdebug, allowmissing) at /Users/n.jordanjameson/.julia/packages/CSV/4GOjG/src/CSV.jl:221 got unsupported keyword argument "select"
I am using Julia v. 1.6.2. Here is the output versioninfo():
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-5650U CPU # 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)
The version of CSV is 0.10.4. The wiki for this version of CSV is here: https://csv.juliadata.org/stable/reading.html#CSV.read, and it has a select / drop entry.
The file I am trying to read is from here: http://konect.cc/networks/moreno_crime/ (the file I'm using is called "out.moreno_crime_crime"). The first few lines are:
% bip unweighted
% 1476 829 551
1 1
1 2
1 3
1 4
2 5
2 6
2 7
2 8
2 9
2 10
I get a different error than you, can you restart Julia and make sure?
julia> CSV.read("/home/akako/Downloads/moreno_crime/out.moreno_crime_crime"; types=[Int, Int],
header=["node1", "node2"],
skipto=3,
select=[1,2]
)
ERROR: ArgumentError: provide a valid sink argument, like `using DataFrames; CSV.read(source, DataFrame)`
Stacktrace:
[1] read(source::String, sink::Nothing; copycols::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:types, :header, :skipto, :select), Tuple{Vector{DataType}, Vector{String}, Int64, Vector{Int64}}}})
# CSV ~/.julia/packages/CSV/jFiCn/src/CSV.jl:89
[2] top-level scope
# REPL[8]:1
Stacktrace:
this error is telling you you can't CSV.read without a target sink, you might want to use CSV.File
julia> CSV.File("/home/akako/Downloads/moreno_crime/out.moreno_crime_crime"; types=[Int, Int],
header=["node1", "node2"],
skipto=3,
select=[1,2]
)
┌ Warning: thread = 1 warning: parsed expected 2 columns, but didn't reach end of line around data row: 1. Parsing extra columns and widening final columnset
└ # CSV ~/.julia/packages/CSV/jFiCn/src/file.jl:579
1476-element CSV.File:
CSV.Row: (node1 = 1, node2 = 1, Column3 = missing)
CSV.Row: (node1 = 1, node2 = 2, Column3 = missing)
CSV.Row: (node1 = 1, node2 = 3, Column3 = missing)
CSV.Row: (node1 = 1, node2 = 4, Column3 = missing)

How to use HuggingFace nlp library's GLUE for CoLA

I've been trying to use the HuggingFace nlp library's GLUE metric to check whether a given sentence is a grammatical English sentence. But I'm getting an error and is stuck without being able to proceed.
What I've tried so far;
reference and prediction are 2 text sentences
!pip install transformers
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
reference="Security has been beefed across the country as a 2 day nation wide curfew came into effect."
prediction="Security has been tightened across the country as a 2-day nationwide curfew came into effect."
import nlp
glue_metric = nlp.load_metric('glue',name="cola")
#Using BertTokenizer
encoded_reference=tokenizer.encode(reference, add_special_tokens=False)
encoded_prediction=tokenizer.encode(prediction, add_special_tokens=False)
glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
Error I'm getting;
ValueError Traceback (most recent call last)
<ipython-input-9-4c3a3ce7b583> in <module>()
----> 1 glue_score = glue_metric.compute(encoded_prediction, encoded_reference)
6 frames
/usr/local/lib/python3.6/dist-packages/nlp/metric.py in compute(self, predictions, references, timeout, **metrics_kwargs)
198 predictions = self.data["predictions"]
199 references = self.data["references"]
--> 200 output = self._compute(predictions=predictions, references=references, **metrics_kwargs)
201 return output
202
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in _compute(self, predictions, references)
101 return pearson_and_spearman(predictions, references)
102 elif self.config_name in ["mrpc", "qqp"]:
--> 103 return acc_and_f1(predictions, references)
104 elif self.config_name in ["sst2", "mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]:
105 return {"accuracy": simple_accuracy(predictions, references)}
/usr/local/lib/python3.6/dist-packages/nlp/metrics/glue/27b1bc63e520833054bd0d7a8d0bc7f6aab84cc9eed1b576e98c806f9466d302/glue.py in acc_and_f1(preds, labels)
60 def acc_and_f1(preds, labels):
61 acc = simple_accuracy(preds, labels)
---> 62 f1 = f1_score(y_true=labels, y_pred=preds)
63 return {
64 "accuracy": acc,
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in f1_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
1097 pos_label=pos_label, average=average,
1098 sample_weight=sample_weight,
-> 1099 zero_division=zero_division)
1100
1101
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in fbeta_score(y_true, y_pred, beta, labels, pos_label, average, sample_weight, zero_division)
1224 warn_for=('f-score',),
1225 sample_weight=sample_weight,
-> 1226 zero_division=zero_division)
1227 return f
1228
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
1482 raise ValueError("beta should be >=0 in the F-beta score")
1483 labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1484 pos_label)
1485
1486 # Calculate tp_sum, pred_sum, true_sum ###
/usr/local/lib/python3.6/dist-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
1314 raise ValueError("Target is %s but average='binary'. Please "
1315 "choose another average setting, one of %r."
-> 1316 % (y_type, average_options))
1317 elif pos_label not in (None, 1):
1318 warnings.warn("Note that pos_label (set to %r) is ignored when "
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].
However, I'm able to get results (pearson and spearmanr) for 'stsb' with the same workaround as given above.
Some help and a workaround for(cola) this is really appreciated. Thank you.
In general, if you are seeing this error with HuggingFace, you are trying to use the f-score as a metric on a text classification problem with more than 2 classes. Pick a different metric, like "accuracy".
For this specific question:
Despite what you entered, it is trying to compute the f-score. From the example notebook, you should set the metric name as:
metric_name = "pearson" if task == "stsb" else "matthews_correlation" if task == "cola" else "accuracy"

Error in eval(expr, envir, enclos) while using Predict function

When I try to run predict() on the dataset, it keeps giving me error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Here is the part of dataset -
LoanRange Loan.Type N WAFICO WALTV WAOrigRev WAPTValue
1 0-99999 Conventional 109 722.5216 63.55385 6068.239 0.6031879
2 0-99999 FHA 30 696.6348 80.00100 7129.650 0.5623650
3 0-99999 VA 13 698.6986 74.40525 7838.894 0.4892977
4 100000-149999 Conventional 860 731.2333 68.25817 6438.330 0.5962638
5 100000-149999 FHA 285 673.2256 82.42225 8145.068 0.5211495
6 100000-149999 VA 125 704.1686 87.71306 8911.461 0.5020074
7 150000-199999 Conventional 1291 738.7164 70.08944 8125.979 0.6045117
8 150000-199999 FHA 403 672.0891 84.65318 10112.192 0.5199632
9 150000-199999 VA 195 694.1885 90.77495 10909.393 0.5250807
10 200000-249999 Conventional 1162 740.8614 70.65027 8832.563 0.6111419
11 200000-249999 FHA 348 667.6291 85.13457 11013.856 0.5374226
12 200000-249999 VA 221 702.9796 91.76759 11753.642 0.5078298
13 250000-299999 Conventional 948 742.0405 72.22742 9903.160 0.6106858
Following is the code used for predicting count data N after determining the overdispersion-
model2=glm(N~Loan.Type+WAFICO+WALTV+WAOrigRev+WAPTValue, family=quasipoisson(link = "log"), data = DF)
summary(model2)
This is what I have done to create a sequence of count and use predict function-
countaxis <- seq (0,1500,150)
Y <- predict(model2, list(N=countaxis, type = "response")
At this step, I get the error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Can someone please point me where is the problem here.
Think about what exactly you are trying to predict. You are providing the predict function values of N (via countaxis), but in fact the way you set up your model, N is your response variable and the remaining variables are the predictors. That's why R is asking for LoanRange. It actually needs values for LoanRange, Loan.Type, ..., WAPTValue in order to predict N. So you need to feed predict inputs that let the model try to predict N.
For example, you could do something like this:
# create some fake data to predict N
newdata1 = data.frame(rbind(c("0-99999", "Conventional", 722.5216, 63.55385, 6068.239, 0.6031879),
c("150000-199999", "VA", 12.5216, 3.55385, 60.239, 0.0031879)))
colnames(newdata1) = c("LoanRange" ,"Loan.Type", "WAFICO" ,"WALTV" , "WAOrigRev" ,"WAPTValue")
# ensure that numeric variables are indeed numeric and not factors
newdata1$WAFICO = as.numeric(as.character(newdata1$WAFICO))
newdata1$WALTV = as.numeric(as.character(newdata1$WALTV))
newdata1$WAPTValue = as.numeric(as.character(newdata1$WAPTValue))
newdata1$WAOrigRev = as.numeric(as.character(newdata1$WAOrigRev))
# make predictions - this will output values of N
predict(model2, newdata = newdata1, type = "response")

Error in FUN(X[[1L]], ...) : as.edgelist.sna input must be an adjacency matrix/array, edgelist matrix, network, or sparse matrix, or list thereof

I am trying to learn few basic functions in Igraph- But, I am having problems computing the degrees from a gragph: see example below (I copied the following example from this site):
Example of data set:
edges <- matrix(c(103, 86, 24, 103, 103, 2, 92, 103, 87, 103, 103, 101, 103, 44), ncol=2, byrow=T)
Create graph
g <- graph(as.vector(t(edges)))
I can compute the degrees from the matrix edges:
degree(edges)
[1] 378 254 210 390 380 408 294 1230 1084
But I cannot compute the degrees from the graph g:
degree(g)
I am getting the following error:
Error in FUN(X[[1L]], ...) :
as.edgelist.sna input must be an adjacency matrix/array, edgelist matrix, network, or sparse matrix, or list thereof.
Anyone knows why I am getting this error?
So what happened here is igraph::degree is masked by sna::degree.
Just use:
igraph::degree
and it should work
I ran into same issue.
This worked for me:
net <- make_ring(10)
deg <- centralization.degree(net)$res