In a Django application, I'm trying to access an existing MySQL database created with Hibernate (a Java ORM). I reverse engineered the model using:
$ manage.py inspectdb > models.py
This created a nice models file from the Database and many things were quite fine. But I can't find how to properly access boolean fields, which were mapped by Hibernate as columns of type BIT(1).
The inspectdb script by default creates these fields in the model as TextField and adds a comment saying that it couldn't reliably obtain the field type. I changed these to BooleanField but and opened my model objects using the admin, but it doesn't work (the model objects always fetch a value of true for these fields). Using IntegerField won't work as well (e.g. in the admin these fields show strange non-ascii characters).
Any hints of doing this without changing the database? (I need the existing Hibernate mappings and Java application to still work with the database).
Further info: I left these fields as BooleanField and used the interactive shell to look at the fetched values. They are returned as '\x00' (when Java/Hibernate value is false) and '\x01' (when true), instead of Python boolean values "True" and "False".
>>> u = AppUser.objects.all()[0]
>>> u.account_expired
'\x00'
>>> u.account_enabled
'\x01'
Where the model includes:
class AppUser(models.Model):
account_expired = models.BooleanField()
account_enabled = models.BooleanField(blank=True)
# etc...
This is the detailed solution from Dmitry's suggestion:
My derived field class:
class MySQLBooleanField(models.BooleanField):
__metaclass__ = models.SubfieldBase
def to_python(self, value):
if isinstance(value, bool):
return value
return bytearray(value)[0]
def get_db_prep_value(self, value):
return '\x01' if value else '\x00'
Fields in my model:
account_enabled = MySQLBooleanField()
account_expired = MySQLBooleanField()
I had to deal the same problem, but rather than subclassing the field, I extended the MySQL backend to understand the Hibernate way. It's only a few lines of code and has the advantage that the DB introspection can be made to work correctly, as well.
See it here.
hibernateboolsbackend / backends / mysql / base.py
# We want to import everything since we are basically subclassing the module.
from django.db.backends.mysql.base import *
django_conversions.update({
FIELD_TYPE.BIT: lambda x: x != '\x00',
})
DatabaseIntrospection.data_types_reverse.update({
FIELD_TYPE.BIT: 'BooleanField',
})
The django-mysql package provides a BooleanField subclass called Bit1BooleanField that solves this:
from django.db import Model
from django_mysql.models import Bit1BooleanField
class AppUser(Model):
bit1bool = Bit1BooleanField()
Easier than rolling your own, and tested on several Django and Python versions.
I guess that only way is to subclass, say, BooleanField, and override to_python/get_prep_value functions, so the field works seamlessly with django and
your db.
To make it work on django 1.7.1 i've to change the "to_python" function because it was not working to read correctly the data from the db:
def to_python(self, value):
if value in (True, False): return value
if value in ('t', 'True', '1', '\x01'): return True
if value in ('f', 'False', '0', '\x00'): return False
In Python Boolean type is a subclass of integer But whereas Java it is of type Bit.
So in DB for Python Boolean field datatype should be Tinyint instead of BIT type.
Due to the above reason the application behaving unexpectedly returning \x00 and \x01 value.
If we add a boolean field in Django model and run the migration then migrate, It will add a column of type Tinyint instead of bit.
So updating the column type to Tinyint should resolve the issue.
Related
I am trying to create a BentoML service for a CatBoostClassifier model that was trained using a column as a categorical feature. If i save the model and I try to make some predictions with the saved model (not as a BentoML service) all works as expected, but when I create the service using BentML I get an error
_catboost.CatBoostError: Bad value for num_feature[non_default_doc_idx=0,feature_idx=2]="Tertiary": Cannot convert 'b'Tertiary'' to float
The value is found in a column named 'road_type' and the model was trained using 'object' as the data type for the column.
If I try to give a float or an integer for the 'road_type' column I get the following error
_catboost.CatBoostError: catboost/libs/data/model_dataset_compatibility.cpp:53: Feature road_type is Categorical in model but marked different in the dataset
If someone has encountered the same issue and found a solution I would appreciate it. Thanks!
I have tried different approaches for saving the model or loading the model but unfortunately it did not worked.
You can try to explicitly pass the cat_features to the bentoml runner.
It would be something like this:
from catboost import Pool
runner = bentoml.catboost.get("bentoml_catboost_model:latest").to_runner()
cat_features = [2] # specify your cat_features indexes
prediction = runner.predict.run(Pool(input_data, cat_features=cat_features))
I have a Django application. Sometimes in production I get an error when uploading data that one of the values is too long. It would be very helpful for debugging if I could see which value was the one that went over the limit. Can I configure this somehow? I'm using MySQL.
It would also be nice if I could enable/disable this on a per-model or column basis so that I don't leak user data to error logs.
When creating model instances from outside sources, one must take care to validate the input or have other guarantees that this data cannot violate constraints.
When not calling at least full_clean() on the model, but directly calling save, one bypasses Django's validators and will only get alerted to the problem by the database driver at which point it's harder to obtain diagnostics:
class JsonImportManager(models.Manager):
def import(self, json_string: str) -> int:
data_list = json.loads(json_string) # list of objects => list of dicts
failed = 0
for data in data_list:
obj = self.model(**data)
try:
obj.full_clean()
except ValidationError as e:
print(e.message_dict) # or use better formatting function
failed += 1
else:
obj.save()
return failed
This is of course very simple, but it's a good boilerplate to get started with.
Django version: 3.1.0, MySQL backend
I have a JSONField on my model:
class Employee(models.Model):
address = models.JSONField(
encoder=AddressEncoder,
dedocer=AddressDecoder,
default=address_default
)
Then the encoder looks like this:
class AddressEncoder(DjangoJSONEncoder):
def default(self, o):
if isinstance(o, Address):
return dataclasses.asdict(o)
raise TypeError("An Address instance is required, got an {0}".format(type(o)))
Then the address_default looks like this:
def address_default():
encoder = AddressEncoder()
address = Addres(...)
return encoder.encode(address)
Currently I have set the address_default to return a dict. Although it should actually return an Address instance. When I change this so that the address_default returns an instance of Address, an error is raised TypeError: Object of type Address is not JSON serializable. However, in other parts of the code where the address is in fact an instance of Address, no errors are raised. So the custom AddressEncoder does not seem to be used on the value provided by the address_default.
When the address attribute on Employee is set to e.g. a string, no error is thrown. This might have to do with what is explained in Why is Django not using my custom encoder class. The code in AddressEncoder is not executed.
Question:
What is the correct way to set up the address_default, and Encoder/Decoder so that the address attribute can be, and only be, an instance of Address?
I solved the problem. It had to do with my migrations. One of my migrations contained a definition of the address-field without the encoders. Hence changing the address_default to return a non json serializable object throws the corresponding error.
I had to manually find and change that migrations so that the definition of the address field includes the custom encoder.
The check isinstance(self.address, Address) is then done in an overridden save() method on the Employee model.
We are working on a Top-Down-RPG-like Multiplayer game for learning purposes (and fun!) with some friends. We already have some Entities in the Game and Inputs are working, but the network implementation gives us headache :D
The Issues
When trying to convert with dict some values will still contain the pygame.Surface, which I dont want to transfer and it causes errors when trying to jsonfy them. Other objects I would like to transfer in a simplyfied way like Rectangle cannot be converted automatically.
Already functional
Client-Server connection
Transfering JSON objects in both directions
Async networking and synchronized putting into a Queue
Situation
A new player connects to the server and wants to get the current game state with all objects.
Data-Structure
We use a "Entity-Component" based architecture, so we separated the game logic very strictly into "systems", while the data is stored in the "components" of each Entity. The Entity is a very simple container and has nothing more than a ID and a list of components. Example Entity (shorten for better readability):
Entity
|-- Component (Moveable)
|-- Component (Graphic)
| |- complex datatypes like pygame.SURFACE
| `- (...)
`- Component (Inventory)
We tried different approaches, but all seems not to fit very well or feel "hacky".
pickle
Very Python near, so not easy to implement other clients in future. And I´ve read about some security risks when creating items from network in this dynamic way how pickle it offers. It does not even solve the Surface/Rectangle issue.
__dict__
Still contains the reference to the old objects, so a "cleanup" or "filter" for unwanted datatypes happens also in the origin. A deepcopy throws Exception.
...\Python\Python36\lib\copy.py", line 169, in deepcopy
rv = reductor(4)
TypeError: can't pickle pygame.Surface objects
Show some code
The method of the "EnitityManager" Class which should generate the Snapshot of all Entities, including their components. This Snapshot should be converted to JSON without any errors - and if possible without much configuration in this core-class.
class EnitityManager:
def generate_world_snapshot(self):
""" Returns a dictionary with all Entities and their components to send
this to the client. This function will probably generate a lot of data,
but, its to send the whole current game state when a new player
connects or when a complete refresh is required """
# It should be possible to add more objects to the snapshot, so we
# create our own Snapshot-Datastructure
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = deepcopy(e.__dict__)
# Components are Objects, but dictionary is required for transfer
cmp_obj_list = result['entities'][e.id]['components']
# Empty the current list of components, its going to be filled with
# dictionaries of each cmp which are cleaned for the dump, because
# of the errors directly coverting the whole datastructure to JSON
result['entities'][e.id]['components'] = {}
for cmp in cmp_obj_list:
cmp_copy = deepcopy(cmp)
cmp_dict = cmp_copy.__dict__
# Only list, dict, int, str, float and None will stay, while
# other Types are being simply deleted including their key
# Lists and directories will be cleaned ob recursive as well
cmp_dict = self.clean_complex_recursive(cmp_dict)
result['entities'][e.id]['components'][type(cmp_copy).__name__] \
= cmp_dict
logging.debug("EntityMgr: Entity#3: %s" % result['entities'][3])
return result
Expectation and actual results
We can find a way to manually override elements which we dont want. But as the list of components will increase we have to put all the filter logic into this core class, which should not contain any components specializations.
Do we really have to put all the logic into the EntityManager for filtering the right objects? This does not feel good, as I would like to have all convertion to JSON done without any hardcoded configuration.
How to convert all this complex data in a most generic approach?
Thanks for reading so far and thank you very much for your help in advance!
Interesting articles which we were already working threw and maybe helpful for others with similar issues
https://gafferongames.com/post/what_every_programmer_needs_to_know_about_game_networking/
http://code.activestate.com/recipes/408859/
https://docs.python.org/3/library/pickle.html
UPDATE: Solution - thx 2 sloth
We used a combination of the following architecture, which works really great so far and is also good to maintain!
Entity Manager now calls the get_state() function of the entity.
class EntitiyManager:
def generate_world_snapshot(self):
""" Returns a dictionary with all Entities and their components to send
this to the client. This function will probably generate a lot of data,
but, its to send the whole current game state when a new player
connects or when a complete refresh is required """
# It should be possible to add more objects to the snapshot, so we
# create our own Snapshot-Datastructure
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = e.get_state()
return result
The Entity has only some basic attributes to add to the state and forwards the get_state() call to all the Components:
class Entity:
def get_state(self):
state = {'name': self.name, 'id': self.id, 'components': {}}
for cmp in self.components:
state['components'][type(cmp).__name__] = cmp.get_state()
return state
The components itself now inherit their get_state() method from their new superclass components, which simply cares about all simple datatypes:
class Component:
def __init__(self):
logging.debug('generic component created')
def get_state(self):
state = {}
for attr, value in self.__dict__.items():
if value is None or isinstance(value, (str, int, float, bool)):
state[attr] = value
elif isinstance(value, (list, dict)):
# logging.warn("Generating state: not supporting lists yet")
pass
return state
class GraphicComponent(Component):
# (...)
Now every developer has the opportunity to overlay this function to create a more detailed get_state() function for complex types directly in the Component Classes (like Graphic, Movement, Inventory, etc.) if it is required to safe the state in a more accurate way - which is a huge thing for maintaining the code in future, to have these code pieces in one Class.
Next step is to implement the static method for creating the items from the state in the same Class. This makes this working really smooth.
Thank you so much sloth for your help.
Do we really have to put all the logic into the EntityManager for filtering the right objects?
No, you should use polymorphism.
You need a way to represent your game state in a form that can be shared between different systems; so maybe give your components a method that will return all of their state, and a factory method that allows you create the component instances out of that very state.
(Python already has the __repr__ magic method, but you don't have to use it)
So instead of doing all the filtering in the entity manager, just let him call this new method on all components and let each component decide that the result will look like.
Something like this:
...
result = {'entities': {}}
entities = self.get_all_entities()
for e in entities:
result['entities'][e.id] = {'components': {}}
for cmp in e.components:
result['entities'][e.id]['components'][type(cmp).__name__] = cmp.get_state()
...
And a component could implement it like this:
class GraphicComponent:
def __init__(self, pos=...):
self.image = ...
self.rect = ...
self.whatever = ...
def get_state(self):
return { 'pos_x': self.rect.x, 'pos_y': self.rect.y, 'image': 'name_of_image.jpg' }
#staticmethod
def from_state(state):
return GraphicComponent(pos=(state.pos_x, state.pos_y), ...)
And a client's EntityManager that recieves the state from the server would iterate for the component list of each entity and call from_state to create the instances.
In my PostgreSQL database I have:
CREATE TABLE category (
// ...
category_name_localization JSON not null,
);
In Java, I have a JDO class like so:
#javax.jdo.annotations.PersistenceCapable(table = "category" )
public class Category extends _BlueEntity implements Serializable {
//...
private org.json.simple.JSONObject category_name_localization;
#javax.jdo.annotations.Column( name = "category_name_localization" )
public org.json.simple.JSONObject getCategoryNameLocalization() {
return category_name_localization;
}
}
When I use this class, DataNucleus gives the following exception:
org.datanucleus.exceptions.NucleusUserException: Field "com.advantagegroup.blue.ui.entity.Category.category_name_localization" is a map that has been specified without a join table and neither the key nor the value has a mapped-by specified. This is invalid!
at org.datanucleus.store.rdbms.RDBMSStoreManager.newJoinTable(RDBMSStoreManager.java:2720)
at org.datanucleus.store.rdbms.mapping.java.AbstractContainerMapping.initialize(AbstractContainerMapping.java:82)
at org.datanucleus.store.rdbms.mapping.MappingManagerImpl.getMapping(MappingManagerImpl.java:680)
at org.datanucleus.store.rdbms.table.ClassTable.manageMembers(ClassTable.java:518)
at org.datanucleus.store.rdbms.table.ClassTable.manageClass(ClassTable.java:424)
at org.datanucleus.store.rdbms.table.ClassTable.initializeForClass(ClassTable.java:1250)
at org.datanucleus.store.rdbms.table.ClassTable.initialize(ClassTable.java:271)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3288)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2897)
at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:118)
at org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1637)
at org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:665)
at org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2098)
at org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1278)
at org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3668)
at org.datanucleus.state.StateManagerImpl.setIdentity(StateManagerImpl.java:2276)
at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:482)
at org.datanucleus.state.StateManagerImpl.initialiseForPersistentNew(StateManagerImpl.java:122)
at org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:218)
at org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:1986)
at org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1830)
at org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1685)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:712)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:738)
at com.advantagegroup.blue.ui.jdo._BlueJdo.insert(_BlueJdo.java:40)
at ...
This error makes sense in a way, because org.json.simple.JSONObject extends Map. However, this field is not part of any relationships -- it is of type JSON and therefore it is natural to back it with JSONObject
How do I tell JDO / DataNucleus to chill and treat org.json.simple.JSONObject the same way it would a String or a Date?
Thanks!
DC
My understanding of this is that your default attempt is trying to persist a normal Map (since while it doesnt know what a JSONObject is, it does know what a Map is), and it will need a join table for that for RDBMS.
Since you presumably want the JSONObject persisted into a single column then you need to create a JDO AttributeConverter. I've done similar things with my own types and it works fine (i'm on v5.0.5 IIRC).
I also found this in their docs, for when you have your own Map class that it doesn't know how to handle by default in terms of replacing it with a proxy (to intercept the calls to put, putAll etc). If you add that line it will not try to wrap this field with a proxy (which it doesn't know how to do for that type, unless you tell it). If you wanted to auto-detect the JSONObject becoming "dirty" you would need to write a proxy wrapper, as per this page.
This doesn't answer how to map the column for that converter to use a "json" type in PostgreSQL, but i'd guess that if you set the sqlType you may get success in that respect.