what is the difference between "torch.utils.data.TensorDataset" and "torch.utils.data.Dataset" - the docs are not clear about that and I could not find any answers on google.
The Dataset class is an abstract class that is used to define new types of (customs) datasets. Instead, the TensorDataset is a ready to use class to represent your data as list of tensors.
You can define your custom dataset in the following way:
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Your code
self.instances = your_data
def __getitem__(self, idx):
return self.instances[idx] # In case you stored your data on a list called instances
def __len__(self):
return len(self.instances)
If you just want to create a dataset that contains tensors for input features and labels, then use the TensorDataset directly:
dataset = TensorDataset(input_features, labels)
Note that input_features and labels must match on the length of the first dimension.
Related
This might be a stupid question as nobody on fastai is trying to answer it. If it is one you can go ahead and tell me but also please tell me the answer, because right now I am completely lost.
I am currently working on a U-Net model for the segmentation of cells in microscopy images. Due to class imbalances and to amplify the importance of cell boundaries, I calculated a pixelwise weightmap for each image that I pass into fastai. Therefore I created a new ItemBase class to save labels and weights together:
class WeightedLabels(ItemBase):
"""
Custom ItemBase to store and process labels and pixelwise weights together.
Also handling the target_size of the labels.
"""
def __init__(self, lbl: Image, wgt: Image, target_size: Tuple = None):
self.lbl, self.wgt = lbl, wgt
self.obj, self.data = (lbl, wgt), [lbl.data, wgt.data]
self.target_size = target_size
...
I use extensive augmentation, like elastic deformation, mirroring and rotations on both weights and labels, as well as the original image. I determine the Loss with a custom Cross-entropy loss function that uses the weights to get the weighted loss for each pixel and averages them.
My problem is, that I do not get a very good performace. I have the feeling that might be because of fastai trying to predict the weights as well. My questions are:
Am I right to assume my model tries to predict both?
If so, how do I tell the learner what to use for updating the layers and to only predict part of my labels, while still applying augmentation to both?
Here's the code for how I implemented my custom LabelList and my custom ItemList:
class CustomSegmentationLabelList(ImageList):
"'Item List' suitable for WeightedLabels containing labels and pixelweights"
_processor = vision.data.SegmentationProcessor
def __init__(self,
items: Iterator,
wghts = None,
classes: Collection = None,
target_size: Tuple = None,
loss_func=CrossEntropyFlat(axis=1),
**kwargs):
super().__init__(items, **kwargs)
self.copy_new.append('classes')
self.copy_new.append('wghts')
self.classes, self.loss_func, self.wghts = classes, loss_func, wghts
self.target_size = target_size
def open(self, fn):
res = io.imread(fn)
res = pil2tensor(res, np.float32)
return Image(res)
def get(self, i):
fn = super().get(i)
wt = self.wghts[i]
return WeightedLabels(fn, self.open(wt), self.target_size)
def reconstruct(self, t: Tensor):
return WeightedLabels(Image(t[0]), Image(t[1]), self.target_size)
class CustomSegmentationItemList(ImageList):
"'ItemList' suitable for segmentation with pixelwise weighted loss"
_label_cls, _square_show_res = CustomSegmentationLabelList, False
def label_from_funcs(self, get_labels: Callable, get_weights: Callable,
label_cls: Callable = None, classes=None,
target_size: Tuple = None, **kwargs) -> 'LabelList':
"Get weights and labels from two functions. Saves them in a CustomSegmentationLabelList"
kwargs = {}
wghts = [get_weights(o) for o in self.items]
labels = [get_labels(o) for o in self.items]
if target_size:
print(
f'Masks will be cropped to {target_size}. Choose \'target_size \\= None \' to keep initial size.')
else:
print(f'Masks will not be cropped.')
y = CustomSegmentationLabelList(
labels, wghts, classes, target_size, path=self.path)
res = self._label_list(x=self, y=y)
return res
Also here the part, where I initiate my databunch object:
data = (CustomSegmentationItemList.from_df(img_df,IMG_PATH, convert_mode=IMAGE_TYPE)
.split_by_rand_pct(valid_pct=(1/N_SPLITS),seed=SEED)
.label_from_funcs(get_labels, get_weights, target_size=MASK_SHAPE, classes = array(['background','cell']))
.transform(tfms=tfms, tfm_y=True)
.databunch(bs=BATCH_SIZE))
I use the regular learner, where I pass in my U-Net model, the data, my loss function and some additional arguments that shouldn't really matter here. When trying to apply my model, after training it, it gives me two identical output tensors. I assume that one is probably for the labels and the other for the weights. Both have the following dimensions: (WxHxC). I do not understand why this is happening, because my model is supposed to only have one output in form of (WxHxC). If this happens during prediction, this probably also happens during training. How can I overcome this?
What methods can be used to enlarge spartial dimension of the layer output blob?
As far as I can see from documentation it can be:
UpSampling2D
Does it only upsample with power of 2?
Also it Repeats the rows and columns of the data by size[0] and size[1] respectively. and it's not very smart.
Conv2DTranspose
Can it have arbitary output size (not power of 2 upsapmple)?
How bilinear interpolation upscale with arbitary dimensions can be done (it can be Conv2DTranspose with fixed weights?)
What other options can be used to enlarge spartial dimension of the layer output blob?
Expanding on the answer from y300, here is a complete example of wrapping TensorFlow bilinear image resizing in a Keras Lambda layer:
from keras import Sequential
from keras.layers import Lambda
import tensorflow as tf
def UpSampling2DBilinear(size):
return Lambda(lambda x: tf.image.resize_bilinear(x, size, align_corners=True))
upsampler = Sequential([UpSampling2DBilinear((256, 256))])
upsampled = upsampler.predict(images)
Note that align_corners=True to get similar performance to other bilinear image upsampling algorithms, as described in this post.
To use bicubic resampling, create a new function and replace resize_bilinear with resize_bicubic.
For an implementation even more similar to UpSampling2D, try this:
from keras import backend as K
def UpSampling2DBilinear(stride, **kwargs):
def layer(x):
input_shape = K.int_shape(x)
output_shape = (stride * input_shape[1], stride * input_shape[2])
return tf.image.resize_bilinear(x, output_shape, align_corners=True)
return Lambda(layer, **kwargs)
This will allow you to use name='', input_shape='' and other arguments for Lamba, and allows you to pass an integer stride/upsample amount.
You can define your own resize Layer:
from keras import layers, models, utils
from keras.backend import tf as ktf
class Interp(layers.Layer):
def __init__(self, new_size, **kwargs):
self.new_size = new_size
super(Interp, self).__init__(**kwargs)
def build(self, input_shape):
super(Interp, self).build(input_shape)
def call(self, inputs, **kwargs):
new_height, new_width = self.new_size
resized = ktf.image.resize_images(inputs, [new_height, new_width],
align_corners=True)
return resized
def compute_output_shape(self, input_shape):
return tuple([None, self.new_size[0], self.new_size[1], input_shape[3]])
def get_config(self):
config = super(Interp, self).get_config()
config['new_size'] = self.new_size
return config
I discovered the new generated columns functionality of MySQL 5.7, and wanted to replace some properties of my models by those kind of columns. Here is a sample of a model:
class Ligne_commande(models.Model):
Quantite = models.IntegerField()
Prix = models.DecimalField(max_digits=8, decimal_places=3)
Discount = models.DecimalField(max_digits=5, decimal_places=3, blank=True, null=True)
#property
def Prix_net(self):
if self.Discount:
return (1 - self.Discount) * self.Prix
return self.Prix
#property
def Prix_total(self):
return self.Quantite * self.Prix_net
I defined generated field classes as subclasses of Django fields (e.g. GeneratedDecimalField as a subclass of DecimalField). This worked in a read-only context and Django migrations handles it correctly, except a detail : generated columns of MySQL does not support forward references and django migrations does not respect the order the fields are defined in a model, so the migration file must be edited to reorder operations.
After that, trying to create or modify an instance returned the mysql error : 'error totally whack'. I suppose Django tries to write generated field and MySQL doesn't like that. After taking a look to django code I realized that, at the lowest level, django uses the _meta.local_concrete_fields list and send it to MySQL. Removing the generated fields from this list fixed the problem.
I encountered another problem: during the modification of an instance, generated fields don't reflect the change that have been made to the fields from which they are computed. If generated fields are used during instance modification, as in my case, this is problematic. To fix that point, I created a "generated field descriptor".
Here is the final code of all of this.
Creation of generated fields in the model, replacing the properties defined above:
Prix_net = mk_generated_field(models.DecimalField, max_digits=8, decimal_places=3,
sql_expr='if(Discount is null, Prix, (1.0 - Discount) * Prix)',
pyfunc=lambda x: x.Prix if not x.Discount else (1 - x.Discount) * x.Prix)
Prix_total = mk_generated_field(models.DecimalField, max_digits=10, decimal_places=2,
sql_expr='Prix_net * Quantite',
pyfunc=lambda x: x.Prix_net * x.Quantite)
Function that creates generated fields. Classes are dynamically created for simplicity:
from django.db.models import fields
def mk_generated_field(field_klass, *args, sql_expr=None, pyfunc=None, **kwargs):
assert issubclass(field_klass, fields.Field)
assert sql_expr
generated_name = 'Generated' + field_klass.__name__
try:
generated_klass = globals()[generated_name]
except KeyError:
globals()[generated_name] = generated_klass = type(generated_name, (field_klass,), {})
def __init__(self, sql_expr, pyfunc=None, *args, **kwargs):
self.sql_expr = sql_expr
self.pyfunc = pyfunc
self.is_generated = True # mark the field
# null must be True otherwise migration will ask for a default value
kwargs.update(null=True, editable=False)
super(generated_klass, self).__init__(*args, **kwargs)
def db_type(self, connection):
assert connection.settings_dict['ENGINE'] == 'django.db.backends.mysql'
result = super(generated_klass, self).db_type(connection)
# double single '%' if any because it will clash with later Django format
sql_expr = re.sub('(?<!%)%(?!%)', '%%', self.sql_expr)
result += ' GENERATED ALWAYS AS (%s)' % sql_expr
return result
def deconstruct(self):
name, path, args, kwargs = super(generated_klass, self).deconstruct()
kwargs.update(sql_expr=self.sql_expr)
return name, path, args, kwargs
generated_klass.__init__ = __init__
generated_klass.db_type = db_type
generated_klass.deconstruct = deconstruct
return generated_klass(sql_expr, pyfunc, *args, **kwargs)
The function to register generated fields in a model. It must be called at django start-up, for example in the ready method of the AppConfig of the application.
from django.utils.datastructures import ImmutableList
def register_generated_fields(model):
local_concrete_fields = list(model._meta.local_concrete_fields[:])
generated_fields = []
for field in model._meta.fields:
if hasattr(field, 'is_generated'):
local_concrete_fields.remove(field)
generated_fields.append(field)
if field.pyfunc:
setattr(model, field.name, GeneratedFieldDescriptor(field.pyfunc))
if generated_fields:
model._meta.local_concrete_fields = ImmutableList(local_concrete_fields)
And the descriptor. Note that it is used only if a pyfunc is defined for the field.
class GeneratedFieldDescriptor(object):
attr_prefix = '_GFD_'
def __init__(self, pyfunc, name=None):
self.pyfunc = pyfunc
self.nickname = self.attr_prefix + (name or str(id(self)))
def __get__(self, instance, owner):
if instance is None:
return self
if hasattr(instance, self.nickname) and not instance.has_changed:
return getattr(instance, self.nickname)
return self.pyfunc(instance)
def __set__(self, instance, value):
setattr(instance, self.nickname, value)
def __delete__(self, instance):
delattr(instance, self.nickname)
Note the instance.has_changed that must tell if the instance is being modified. If found a solution for this here.
I have done extensive tests of my application and it works fine, but I am far from using all django functionalities. My question is: could this settings clash with some use cases of django ?
I have a declarative mapping:
class User(base):
username = Column(Unicode(30), unique=True)
How can I tell sqlalchemy that this attribute may not be modified?
The workaround I came up with is kindof hacky:
from werkzeug.utils import cached_property
# regular #property works, too
class User(base):
_username = Column('username', Unicode(30), unique=True)
#cached_property
def username(self):
return self._username
def __init__(self, username, **kw):
super(User,self).__init__(**kw)
self._username=username
Doing this on the database column permission level will not work because not all databases support that.
You can use the validates SQLAlchemy feature.
from sqlalchemy.orm import validates
...
class User(base):
...
#validates('username')
def validates_username(self, key, value):
if self.username: # Field already exists
raise ValueError('Username cannot be modified.')
return value
reference: https://docs.sqlalchemy.org/en/13/orm/mapped_attributes.html#simple-validators
I can suggest the following ways do protect column from modification:
First is using hook when any attribute is being set:
In case above all column in all tables of Base declarative will be hooked, so you need somehow to store information about whether column can be modified or not. For example you could inherit sqlalchemy.Column class to add some attribute to it and then check attribute in the hook.
class Column(sqlalchemy.Column):
def __init__(self, *args, **kwargs):
self.readonly = kwargs.pop("readonly", False)
super(Column, self).__init__(*args, **kwargs)
# noinspection PyUnusedLocal
#event.listens_for(Base, 'attribute_instrument')
def configure_listener(class_, key, inst):
"""This event is called whenever an attribute on a class is instrumented"""
if not hasattr(inst.property, 'columns'):
return
# noinspection PyUnusedLocal
#event.listens_for(inst, "set", retval=True)
def set_column_value(instance, value, oldvalue, initiator):
"""This event is called whenever a "set" occurs on that instrumented attribute"""
logging.info("%s: %s -> %s" % (inst.property.columns[0], oldvalue, value))
column = inst.property.columns[0]
# CHECK HERE ON CAN COLUMN BE MODIFIED IF NO RAISE ERROR
if not column.readonly:
raise RuntimeError("Column %s can't be changed!" % column.name)
return value
To hook concrete attributes you can do the next way (adding attribute to column not required):
# standard decorator style
#event.listens_for(SomeClass.some_attribute, 'set')
def receive_set(target, value, oldvalue, initiator):
"listen for the 'set' event"
# ... (event handling logic) ...
Here is guide about SQLAlchemy events.
Second way that I can suggest is using standard Python property or SQLAlchemy hybrid_property as you have shown in your question, but using this approach result in code growing.
P.S. I suppose that compact way is add attribute to column and hook all set event.
Slight correction to #AlexQueue answer.
#validates('username')
def validates_username(self, key, value):
if self.username and self.username != value: # Field already exists
raise ValueError('Username cannot be modified.')
return value
We are doing a project which is using the django framework with MySQL database. I wanted to make an array in the models by using
CommaSeparatedIntegerField.
eg:
class MyModel(models.Model):
values = CommaSeparatedIntegerField(max_length = 200)
How will this be represented in MySQL?
There are no arrays in MySQL. If you store array as a comma separated string, then you will have problems with selecting, modifying data and optimization.
So, I'd suggest you to store items in table's rows.
You might be better off avoiding that: https://en.wikipedia.org/wiki/1NF
I had to do something of this sort some time back and what i did was used the way suggested here.
http://justcramer.com/2008/08/08/custom-fields-in-django/
from django.db import models
class SeparatedValuesField(models.TextField):
__metaclass__ = models.SubfieldBase
def __init__(self, *args, **kwargs):
self.token = kwargs.pop('token', ',')
super(SeparatedValuesField, self).__init__(*args, **kwargs)
def to_python(self, value):
if not value: return
if isinstance(value, list):
return value
return value.split(self.token)
def get_db_prep_value(self, value):
if not value: return
assert(isinstance(value, list) or isinstance(value, tuple))
return self.token.join([unicode(s) for s in value])
def value_to_string(self, obj):
value = self._get_val_from_obj(obj)
return self.get_db_prep_value(value)