Are there differences between .img and .tif formats which affects how GDAL .SetNoDataValue() works? - gis

I'm having an issue masking No Data values from a statistical analysis of raster image data.
The input file is a 3-dimensional (rows, columns and layers) .img file
It is opened using the gdal.Open() function and then I want to extract some per layer statistics from each row x column image it contains (layer).
Some of these images contain crap data (or No Data), which has been given a value of -0.0 by the dataset originators. I do not want these pixel values to be included in the statistics calculations, and need to screen them out.
For this i'm using the SetNoDataValue() function, which works well on .tif imagery.
Unfortunately, with .img, -0.0 keeps showing up as each layer's minimum value (it should be around 270.something) in the minCubeValues variable, and the mean and standard deviation values are incorrect as a consequence.
When i've run this sequence on .tif files, it works fine, but it does not for .img files, and I can't figure out why.
I need to get it working directly on .img files as the bulk of my image data are in .img, and are made available as such.
I suspect it comes down to structural differences between .img and .tif files, but I cannot understand how this is being drawn into the ds2 variable.
Anyone experienced this issue before and figured out what was happening?
Code is as follows:
inputFilePathName = str("C:\\ResearchProcess\\2011sst_sstklcube20170628.img")
noDataValue = (-0.0)
ds2 = gdal.Open(inputFilePathName)
layerList = []
minCubeValues = []
maxCubeValues = []
layerMeans = []
for layer in range( ds2.RasterCount ):
layer += 1
srcLayer = ds2.GetRasterBand(layer)
srcLayer.SetNoDataValue(noDataValue) # screens out the No Data value in .tif, for .img, it doesn't :/
stats = srcLayer.ComputeStatistics( 0 )
layerList.append(layer)
minCubeValues.append(stats[0])
maxCubeValues.append(stats[1])
layerMeans.append(stats[2])
print minCubeValues
ds2 = None

Suggestion by Gabriella Giordano to adapt the approach here works beautifully. Had to create a third variable ds3Array as i had to run through the layered data arrays layer by layer in the wider algorithm.
Cheers :)
Working code as follows.
ds2 = gdal.Open(inputFilePathName)
ds2Array = ds2.ReadAsArray()
layerList = []
minCubeValues = []
maxCubeValues = []
layerMeans = []
layerStdevs = []
for layer in range( ds2.RasterCount ):
layer += 1
ds3Array = ds2Array[layer -1]
ds3Array = np.ma.masked_equal(ds3Array, noDataValue)
minCubeValues.append(ds3Array.min())
layerMeans.append(ds3Array.mean())
layerStdevs.append(ds3Array.std())
Thank you :)

Unfortunately ERDAS imagine (.img) does not support nodata values as GeoTiff or other formats that define a mask to get rid of them.
As stated in the documentation of the SetNodataValue() API that you can find here, the effect of setting a nodata value is therefore driver-dependent.
Nevertheless, for this specific case you could try this solution (go to second answer provided by user Mike T).

Related

Is there a way to change the y axis of the calc.ece function when plotting?

I am trying to plot the calc.ece function and have been successful with just the basic plot function. However, I need to be able to zoom in to show the observed and calibrated lines better. Such as in this example here
LR.same = c(4135, 4135, 4135, 4135, 4135, 4135)
LR.different = c(0.00334, 0.00334, 0.00334, 0.00334, 0.00334)
ece.1 = calc.ece(LR.same, LR.different)
plot(ece.1)
I cannot use the ylim as I get this error Error in xy.coords(x, y, xlabel, ylabel, log) : argument "x" is missing, with no default. I am unsure what to do. Any ideas?
Well, I still cannot change the y-axis in R, but if you just unlist() the data and export you get the data. I then put the data into excel and added a column of log10 odds which is calculated by log10(prior/(1-prior)). Then I just created a scatter plot of this making the log10 odds the x-axis. That seemed to work.

Read every nth batch in pyarrow.dataset.Dataset

In Pyarrow now you can do:
a = ds.dataset("blah.parquet")
b = a.to_batches()
first_batch = next(b)
What if I want the iterator to return me every Nth batch instead of every other? Seems like this could be something in FragmentScanOptions but that's not documented at all.
No, there is no way to do that today. I'm not sure what you're after but if you are trying to sample your data there are a few choices but none that achieve quite this effect.
To load only a fraction of your data from disk you can use pyarrow.dataset.head
There is a request in place for randomly sampling a dataset although the proposed implementation would still load all of the data into memory (and just drop rows according to some random probability).
Update: If your dataset is only parquet files then there are some rather custom parts and pieces that you can cobble together to achieve what you want.
a = ds.dataset("blah.parquet")
all_fragments = []
for fragment in a.get_fragments():
for row_group_fragment in fragment.split_by_row_group():
all_fragments.append(row_group_fragment)
sampled_fragments = all_fragments[::2]
# Have to construct the sample dataset manually
sampled_dataset = ds.FileSystemDataset(sampled_fragments, schema=a.schema, format=a.format)
# Iterator which will only return some of the batches
# of the source dataset
sampled_dataset.to_batches()

Tf-slim: ValueError: Variable vgg_19/conv1/conv1_1/weights already exists, disallowed. Did you mean to set reuse=True in VarScope?

I am using tf-slim to extract features from several batches of images. The problem is my code works for the first batch , after that I get the error in the title.My code is something like this:
for i in range(0, num_batches):
#Obtain the starting and ending images number for each batch
batch_start = i*training_batch_size
batch_end = min((i+1)*training_batch_size, read_images_number)
#obtain the images from the batch
images = preprocessed_images[batch_start: batch_end]
with slim.arg_scope(vgg.vgg_arg_scope()) as sc:
_, end_points = vgg.vgg_19(tf.to_float(images), num_classes=1000, is_training=False)
init_fn = slim.assign_from_checkpoint_fn(os.path.join(checkpoints_dir, 'vgg_19.ckpt'),slim.get_model_variables('vgg_19'))
feature_conv_2_2 = end_points['vgg_19/pool5']
So as you can see, in each batch, I select a batch of images and use the vgg-19 model to extract features from the pool5 layer. But after the first iteration I get error in the line where I am trying to obtain the end-points. One solution, as I found on the internet is to reset the graph each time , but I don't want to do that because I have some weights in my graph in later part of the code which I train using these extracted features. I don't want to reset them. Any leads highly appreciated. Thanks!
You should create your graph once, not in a loop. The error message tells you exactly that - you try to build the same graph twice.
So it should be (in pseudocode)
create_graph()
load_checkpoint()
for each batch:
process_data()

Seq2Seq in CNTK : Run Time Error Function Only supports 2 dynamic axis

I am trying to implement a basic translation model where input and output are sentences in different languages in CNTK using LSTMs.
To achieve this I am creating model as follows :
def create_model(x):
with c.layers.default_options():
m = c.layers.Recurrence(c.layers.LSTM(input_vocab_size))(x)
m = sequence.last(m)
y = c.layers.Recurrence(c.layers.LSTM(label_vocab_size))(m)
return m
batch_axis = Axis.default_batch_axis()
input_seq_axis = Axis('inputAxis')
input_dynamic_axes = [batch_axis, input_seq_axis]
raw_input = input_variable(shape = (input_vocab_dim), dynamic_axes = input_dynamic_axes, name = 'raw_input')
z= create_model(raw_input)
But I am getting following error :
RuntimeError: Currently PastValue/FutureValue Function only supports input operand with 2 dynamic axis (1 sequence-axis and 1 batch-axis)
As per I understand, dynamic axis are basically those axis which gets decided after data gets loaded, in this case batch size and length of input sentence. I don't think I am changing the dynamic axis of input anywhere.
Any help is highly appreciated.
The last() operation strips the dynamic axis, since it reduces the input sequence to a single value (the thought vector).
The thought vector should then become the initial state for the second recurrence. So it should not be passed as the data argument to the second recurrence.
In the current version, the initial_state argument of Recurrence() cannot be data dependent. This will be soon possible, it is already under code review and will be merged to master soon.
Until then, there is a more complicated way to pass a data-dependent initial state, where you manually construct the recurrence (without Recurrence() layer), and manually add the initial hidden state in the recurrence. It is illustrated in the sequence-2-sequence sample.
This might be :
input_dynamic_axes= [Axis.default_batch_axis(), Axis.default_dynamic_axis()]
The first one will be the number of sample in your minibatch, the second will be the sequence length automagically inferred by CNTK

d3.js How to not graph values outside of range?

I have a multi-bar graph with 7 different bar listings. Dates are on the x axis and decimal values are on the y axis. Some of these listings have empty strings ("") for their decimal values and they are graphed as 0.000. I don't want these to show up at all. I tried using chart.yDomain.([0, 3]); and setting the empty values to -1 and they don't show up on the graph, but the spacing between the bars is the same as if they were graphed.
I also tried not putting empty value pairs into the graph datum array, but that messed up the date sorting since not every listing has a value for each date.
Here's an example of the JSON data I am using for the graphing:
"x_data":["08\/15\/13","11\/11\/13","11\/13\/13","11\/14\/13","11\/18\/13","11\/19\/13","11\/20\/13","11\/25\/13","12\/05\/13","12\/09\/13","12\/11\/13","12\/12\/13"],
"y_data":[[["","","","","","","",0.875,"",0.41,"",""]],[["","","","","","","","",0.285,"",0.92,""]],[["",0.203,0.17,0.223,0.193,0.303,0.263,"","","","",""]],[["",0.433,0.333,0.665,0.353,0.413,0.458,"","","","",""]],[["",0.355,0.3,0.263,0.258,0.355,0.215,"","","","",""]],[["",0.195,0.43,0.243,0.28,0.44,0.4,"","","","",""]],[[1.218,"","","","","","","","","","",""]]]}
Here is a screen shot of how it looks without setting the domain:
http://i.imgur.com/TO3wwWF.png?1
Here is a screen shot of what it looks like when I do set the domain:
http://i.imgur.com/NEwgkJf.png?1
Since you haven't provided a fiddle or equivalent, it's not possible to provide a copy-and-paste answer, but a general approach would be to remove the null values from the data before creating the chart.
Since the data in your example isn't formatted exactly as D3.js expects, I'll assume you're not simply fetching it using D3's built-in request function (e.g. d3.json('url/to/data.json')) but, rather, have the data in local variable. Assuming you also want to preserve the structure above, you could do something like the following. (It's not optimized to make the logic as clear as possible.)
var cleandata = {
x_data: [],
y_data: []
};
data.y_data.forEach(function(y_value, idx){
if (y_value) {
cleandata.x_data.push(data.x_data[idx]);
cleandata.y_data.push(data.y_data[idx]);
}
})