Synchronize SQLAlchemy ORM objects with external files on disk - sqlalchemy

I am using the SQLAlchemy ORM to represent the data model of a MySQL database. Suppose that for one of my ORM entity types, there is an external file of auxiliary data stored on disk, with a one-to-one association between database rows and files.
Is there a good software pattern to follow for keeping instances of the ORM objects in-sync with the external files?
For example, it would be ideal to have a way of attaching the data to an instance of the ORM object with some type of setter, which would then hold onto the data internally, and write it to disk when the object is persisted into the database.
Also, deletion of the file should happen in-sync with deletion of the ORM object from the database.
I am guessing I need to make use of the event-listener system in SQLAlchemy to register callbacks with these state transitions. But I'm not sure of the best way to do this, where the event listeners should be registered, how I can encapsulate this logic within a specific ORM entity type, and how I can store arbitrary data in an ORM entity which does not map to a database column.

This library https://pypi.org/project/sqlalchemy-media/ might do what you need

Related

What's stopping me from using a standalone JSON file instead of a local db?

I need to store data for a native mobile app I'm writing and I was wondering: 'why do I need to bother with DB setup when I can just read/write a JSON file?. All the interactions are basic and could most likely be parsed as JSON objects rather than queried.
what are the advantages?
DB are intent to work with standardized data or large data sets. If you know that there is only a few properties to read and it's not changing, JSON may be easier, but if you have a list of items, a DB can optimize the queries with index or ensure consistency through multiple tables

Storage: database vs in-memory objects vs in-memory database

I'm doing a project where I have to store data for a NodeJS express server. It's not a LOT of data, but i have to save it somewhere.
I always hear that a database is good for that kinda stuff, but I thought about just saving all the data in objects in NodeJS and back them up as JSON to disk every minute (or 5 minutes). Would that be a good idea?
What im thinking here is that the response time from objects like that are way faster than from a database, and saving them is easy. But then I heared that there are in-memory databases aswell, so my question is:
Are in-memory databases faster than javascript objects? Are JSON-based data backups a good idea in this aspect? Or should I simply go with a normal database because the performance doesn't really matter in this case?
Thanks!
If this is nothing but a school assignment or toy project with very simple models and access patterns, then sure rolling your own data persistence might make sense.
However, I'd advocate for using a database if:
you have a lot of objects or different types of objects
you need to query or filter objects by various criteria
you need more reliable data persistence
you need multiple services to access the same data
you need access controls
you need any other database feature
Since you ask about speed, for trivial stuff, in-memory objects will likely be faster to access. But, for more complicated stuff (lots of data, object relations, pagination, etc.), a database could start being faster.
You mention in-memory databases but those would only be used if you want the database features without the persistence and would be closer to your in-memory objects but without the file writing. So it just depends on if you care about keeping the data or not.
Also if you haven't ever worked with any kind of database, now's a perfect time to learn :).
What I'm thinking here is that the response time from objects like that is way faster than from a database, and saving them is easy.
That's not true. Databases are the persistence storage, there will always be I/O latency. I would recommend using Mysql for sql database and MongoDB or Cassandra for nosql.
An in-memory database is definitely faster but again you need persistence storage for those data. redis is a very popular in-memory database.
MongoDB store data in BSON (a superset of JSON) like formate, so it will be a good choice in your case.

Redis and MongoDB; How should I store large JSON objects, Performance issue

I am currently developing a Node.js app. It has a mySql database server which I use to store all of the data in the app. However, I find myself storing a lot of data that pertains to the User in session storage. How I have been doing this is by using express-session to store the contents of my User class however these User classes can be quite large. I was thinking about writing a middleware that will save the User class as JSON to either redis or mongodb and store the key to the storage within the session cookie. When I retrieve the JSON from redis or mongodb, I will then parse it and use it to reconstruct my User class.
My question is which method would be the fastest performing and also scalable: storing JSON strings in Redis, or storing a mongo document representation of my User class in MongoDB? Thanks!
EDIT: I am planning to include mongoDB in another part of the app, solving a different issue. Also will the JSON parsing from redis be more time-consuming and memory intensive than parsing from mongo? At what reoccuring user count would server memory sessions become a problem?
express-session has various session store options to save the session data to.
AFAIK, these all work through the same principle: they serialize the session object to a JSON string, and store that string in the store (using the session id as the key).
In other words, your idea of storing the user data as a JSON string in either Redis or MongoDB using a second key is basically exactly the same as what express-session does when using the Redis or MongoDB stores. So I wouldn't expect any performance benefits from that.
Another option would be to store the user data as a proper MongoDB document (not serialized to a JSON string). Under the hood this would still require (de)serialization, although from and to BSON, not JSON. I have never benchmarked which of those two is faster, but I'm gonna guess and say that JSON might be a tad quicker.
There's also a difference between Redis and MongoDB, in that Redis is primarily in-memory and more lightweight. However, MongoDB is more of a "real" database that allows for more elaborate queries and has more options in terms of scalability.
Since it seems to me that you're only storing transient data in your sessions (as the actual data is stored on MySQL), I would suggest the following:
use the Redis session store if the total amount of data you're storing in the sessions will fit in memory;
use the MongoDB session store if not.
tj/connect-redis in conjunction with express-session does the job well! Redis is incredibly fast with JSON and is superb for handling sessions.

Class should support an interface but this requires adding logic to the class in an intrusive way. Can we prevent this?

I have a C++ application that loads lots of data from a database, then executes algorithms on that data (these algorithms are quite CPU- and data-intensive that's way I load all the data before hand), then saves all the data that has been changed back to the database.
The database-part is nicely separate from the rest of the application. In fact, the application does not need to know where the data comes from. The application could even be started on file (in this case a separate file-module loads the files into the application and at the end saves all data back to the files).
Now:
the database layer only wants to save the changed instances back to the database (not the full data), therefore it needs to know what has been changed by the application.
on the other hand, the application doesn't need to know where the data comes from, hence it does not want to feel forced to keep a change-state per instance of its data.
To keep my application and its datastructures as separate as possible from the layer that loads and saves the data (could be database or could be file), I don't want to pollute the application data structures with information about whether instances were changed since startup or not.
But to make the database layer as efficient as possible, it needs a way to determine which data has been changed by the application.
Duplicating all data and comparing the data while saving is not an option since the data could easily fill several GB of memory.
Adding observers to the application data structures is not an option either since performance within the application algorithms is very important (and looping over all observers and calling virtual functions may cause an important performance bottleneck in the algorithms).
Any other solution? Or am I trying to be too 'modular' if I don't want to add logic to my application classes in an intrusive way? Is it better to be pragmatic in these cases?
How do ORM tools solve this problem? Do they also force application classes to keep a kind of change-state, or do they force the classes to have change-observers?
If you can't copy the data and compare, then clearly you need some kind of record somewhere of what has changed. The question, then, is how to update those records.
ORM tools can (if they want) solve the problem by keeping flags in the objects, saying whether the data has been changed or not, and if so what. It sounds as though you're making raw data structures available to the application, rather than objects with neatly encapsulated mutators that could update flags.
So an ORM doesn't normally require applications to track changes in any great detail. The application generally has to say which object(s) to save, but the ORM then works out what needs persisting to the DB in order to do that, and might apply optimizations there.
I guess that means that in your terms, the ORM is adding observers to the data structures in some loose sense. It's not an external observer, it's the object knowing how to mutate itself, but of course there's some overhead to recording what has changed.
One option would be to provide "slow" mutators for your data structures, which update flags, and also "fast" direct access, and a function that marks the object dirty. It would then be the application's choice whether to use the potentially-slower mutators that permit it to ignore the issue, or the potentially-faster mutators which require it to mark the object dirty before it starts (or after it finishes, perhaps, depending what you do about transactions and inconsistent intermediate states).
You would then have two basic situations:
I'm looping over a very large set of objects, conditionally making a single change to a few of them. Use the "slow" mutators, for application simplicity.
I'm making lots of different changes to the same object, and I really care about the performance of the accessors. Use the "fast" mutators, which perhaps directly expose some array in the data. You gain performance in return for knowing more about the persistence model.
There are only two hard problems in Computer Science: cache invalidation and naming things.
Phil Karlton

Storing serialized ruby object in database

I would like to store very large sets of serialized Ruby objects in db (mysql).
1) What are the cons and pros?
2) Is there any alternative way?
3) What are technical difficulties if the objects are really big?
4) Will I face memory issues while serializing and de-serializing if the objects are really big ?
Pros
Allows you to store arbitrary complex objects
Simplified your db schema (no need to represent those complex objects)
Cons
Complicates your models and data layer
Potentially need to handle multiple versions of serialized objects (changes to object definition over time)
Inability to directly query serialized columns
Alternatives
As the previous answer stated, an object database or document oriented database may meet your requirements.
Difficulties
If your objects are quite large you may run into difficulties when moving data between your DBMS and your program. You could minimize this by separating the storage of the object data and the meta data related to the object.
Memory Issues
Running out of memory is definitely a possibility with large enough objects. It also depends on the type of serialization you use. To know how much memory you'd be using, you'd need to profile your app. I'd suggest ruby-prof, bleak_house or memprof.
I'd suggest using a non-binary serialization wherever possible. You don't have to use only one type of serialization for your entire database, but that could get complex and messy.
If this is how you want to proceed, using an object oriented dbms like ObjectStore or a document oriented dbms like CouchDB would probably be your best option. They're better designed and targeted for object serialization.
As an alternative you could use any of the multitude of NoSQL databases. If you can serialize your object to JSON then it should be easily stored in CouchDB.
You have to bear in mind that the serialized objects in terms of disk space are far larger than if you saved them in your own way, and loaded them in your own way. I/O from the hard drive is very slow and if you're looking at complex objects, that take a lot of processing power, it may actually be faster to load the file(s) and process it on each startup; or perhaps saving the data in such a way that's easy to load.