Reflected SQLAlchemy metadata in celery tasks?

Reflected SQLAlchemy metadata in celery tasks? - sqlalchemy

For better testability and other reasons, it is good to have SQLAlchemy database sessions configuration non-global as described very well in the following question:
how to setup sqlalchemy session in celery tasks with no global variable (and also discussed in https://github.com/celery/celery/issues/3561 )
Now, the question is, how to handle metadata elegantly? If my understanding is correct, metadata can be had once, eg:
engine = create_engine(DB_URL, encoding='utf-8', pool_recycle=3600,
pool_size=10)
# db_session = get_session() # this is old global session
meta = MetaData()
meta.reflect(bind=engine)
Reflecting on each task execution is not good for performance reason, metadata is more or less stable and thread-safe structure (if we only read it).
However, metadata sometimes changes (celery is not the "owner" of the db schema), causing errors in workers.
What could be an elegant way to deal with meta in a testable way, plus still be able to react to underlying db changes? (alembic in use, if it is relevant).
I was thinking of using alembic version change as a signal to re-reflect, but not quite sure how to make it work nicely in celery. For instance, if more than one worker will at once sense a change, the global meta may be treated in a non-thread safety way.
If it matters, celery use in the case is standalone, no web framework modules/apps/whatever present in the celery app. The problem is also simplified as only SQLAlchemy Core is in use, not object mapper.

This is only partial solution, and it's for SQLAlchemy ORM (but I guess something similar is easy to implement for Core).
Main points:
engine is at module level, but config (access URL, parameters) is from os.environ
session is in it's own factory function
at module level: BaseModel = automap_base() and then table classes use that BaseModel as superclass and usually just one argument - __tablename__, but arbitrary relationships, attributes can be added there (very similar to normal ORM use)
at module level: BaseModel.prepare(ENGINE, reflect=True)
Tests (using pytest) inject environment variable (eg DB_URL) in conftest.py at module level.
One important moment: database_session is always initiated (that is, factory function called) in the task function, and propagated into all functions explicitly. This way allows to control units of work naturally, usually one transaction per task. This also simplifies testing, because all database-using functions can be provided with fake or real (test) database session.
"Task function" is the above is a function, which is called in the function, which is decorated by task - this way task function can be tested without task machinery.
This is only partial solution, because redoing reflection is not there. If task workers can be stopped for a moment (and database anyway experience downtime due to schema changes) as those usually are backgrounds tasks, so it does not pose a problem. Workers can also be restarted by some external watchdog, which can monitor database changes. This can be made convenient by using supervisord or some other way to control celery workers running in foreground.
All in all, after I solved the problem as described above I value "explicit is better than implicit" philosophy even more. All those magical "app"s, "request"s be it in celery or Flask, may bring minuscule abbreviations in the function signatures, but I'd rather passed some kind of context down the call chain for improved testability and better context understanding and management.

Related

where to put the MySQL connection code from MCPKit in cocoa

I am trying to plan out where to put my MySQL connection code -
MCPConnection *conn = [[MCPConnection alloc] initToHost:#"" withLogin:#"" usingPort:#""];
etc. from MCPKit on the mac. My aim is to make as few connection instances as possible through the whole app (if possible just one will be awesome).
The only 3 places i can think of putting it are -
The app delegate, but then how would I use the same connection in other classes (data encapsulation)
Place code in Each and every Window/View controller that requires it to fulfil each individual purpose (but this means a lot more code and a large number of connections which i do no want).
Or finally setting up an individual class for all the the MySQL queries and a singular connection. (but then how to do that so that the methods in this class are easily callable, or so that the SQL statements/queries can be adaptive enough for every other class that uses the methods).
i hope that hasn't been too confusing - the main point is this - where do i put the connection code, does it carry between classes well, or does it need to be called individually and successively throughout use of the app.
Thank you for your time.

As a simple solution, you can use Singleton pattern and replace init code with call to static getInstance. The only public method of singleton wrapper would be getConnection, which then you use as normal. Yep, you can pull common sql-building methods into that class, but that leads to code bloat, better use Flyweight pattern for separate-type queries.
The only drawback of such design is that access to that single shared database connection must always be serialized, which may render some your UI slow and irresponsive if there are multiple simultaneous threads accessing database.

Prefetching data in with Linq-to-SQL, IOC and Repository pattern

using Linq-to-SQL I'd like to prefetch some data.
1) the common solution is to deal with DataLoadOptions, but in my architecture it won't work because :
the options have to be set before the first query
I'm using IOC, so I don't directly instanciate the DataContext (I cannot execute code at instanciation)
my DataContext is persistent for the duration of a web request
2) I have seen another possibility based on loading the data and its childs in a method, then returning only the data (so the child is already loaded) see an example here
Nonetheless, in my architecture, it cannot not work :
My queries are cascaded out of my repository and can be consumed by many services that will add clauses
I work with interfaces, the concrete instances of the linq-to-sql objects do not leave the repositories (yes, you can work with interfaces AND add clauses)
My repositories are generic
Yes, this architecture is quiet complicated, but it's very cool as I can play with the code like lego ;)
My question is : what are the other possibilities to prefetch a data ?

In my app i use perhaps a variation to your potential solution #2. It's somewhat difficult to explain but simply: i chain and defer lazy loading in my model with custom lazy classes so as to abstract away from the LinqToSql-specific Differed Execution that i take advantage of with IQueryable. Benefits:
My Domain Model and Service layer upwards does not necessarily have to depend on the LinqToSql provider (i can swap out my DAL with interfaces if i want to)
My Service methods can and do return complete object graphs with multiple 'anchor points' for lazy loading using classes that abstract away a particular lazy loading implementation - so i can use LinqToSql-specific Differed Execution or something else (eg. anon delegates. again, refer to this answer)
I can maintain IQueryable results throughout my app (even to the UI if i want to) thus allowing infinite LINQ query chaining without having to worry about performance.

I'm not aware of other possibilities, it seems like you've pushed LinqToSql to its limits (I may be wrong, however).
I think your best options at this point are:
Add some "non-generic" methods to your application to handle just the
specific scenarios where you want/need eager loading and don't
use your "normal", "generic" infrastructure for those methods.
Use an ORM that has more sophisticated support for eager and lazy loading.

I found a solution.
My answer is 'Dependency injection'.
It's generally shipped with IOC, and mean you can have your IOC container manage injection of classes at instanciation.
All I need is to inject a CustomDCParameter class when I instanciate a DC.
That class will contains the rules, and the constructor will apply all of them.

Why would you want Dependency Injection without configuration?

After reading the nice answers in this question, I watched the screencasts by Justin Etheredge. It all seems very nice, with a minimum of setup you get DI right from your code.
Now the question that creeps up to me is: why would you want to use a DI framework that doesn't use configuration files? Isn't that the whole point of using a DI infrastructure so that you can alter the behaviour (the "strategy", so to speak) after building/releasing/whatever the code?
Can anyone give me a good use case that validates using a non-configured DI like Ninject?

I don't think you want a DI-framework without configuration. I think you want a DI-framework with the configuration you need.
I'll take spring as an example. Back in the "old days" we used to put everything in XML files to make everything configurable.
When switching to fully annotated regime you basically define which component roles yor application contains. So a given
service may for instance have one implementation which is for "regular runtime" where there is another implementation that belongs
in the "Stubbed" version of the application. Furthermore, when wiring for integration tests you may be using a third implementation.
When looking at the problem this way you quickly realize that most applications only contain a very limited set of component roles
in the runtime - these are the things that actually cause different versions of a component to be used. And usually a given implementation of a component is always bound to this role; it is really the reason-of-existence of that implementation.
So if you let the "configuration" simply specify which component roles you require, you can get away without much more configuration at all.
Of course, there's always going to be exceptions, but then you just handle the exceptions instead.

I'm on a path with krosenvold, here, only with less text: Within most applications, you have a exactly one implementation per required "service". We simply don't write applications where each object needs 10 or more implementations of each service. So it would make sense to have a simple way say "this is the default implementation, 99% of all objects using this service will be happy with it".
In tests, you usually use a specific mockup, so no need for any config there either (since you do the wiring manually).
This is what convention-over-configuration is all about. Most of the time, the configuration is simply a dump repeating of something that the DI framework should know already :)
In my apps, I use the class object as the key to look up implementations and the "key" happens to be the default implementation. If my DI framework can't find an override in the config, it will just try to instantiate the key. With over 1000 "services", I need four overrides. That would be a lot of useless XML to write.

With dependency injection unit tests become very simple to set up, because you can inject mocks instead of real objects in your object under test. You don't need configuration for that, just create and injects the mocks in the unit test code.

I received this comment on my blog, from Nate Kohari:
Glad you're considering using Ninject!
Ninject takes the stance that the
configuration of your DI framework is
actually part of your application, and
shouldn't be publicly configurable. If
you want certain bindings to be
configurable, you can easily make your
Ninject modules read your app.config.
Having your bindings in code saves you
from the verbosity of XML, and gives
you type-safety, refactorability, and
intellisense.

you don't even need to use a DI framework to apply the dependency injection pattern. you can simply make use of static factory methods for creating your objects, if you don't need configurability apart from recompiling code.
so it all depends on how configurable you want your application to be. if you want it to be configurable/pluggable without code recompilation, you'll want something you can configure via text or xml files.

I'll second the use of DI for testing. I only really consider using DI at the moment for testing, as our application doesn't require any configuration-based flexibility - it's also far too large to consider at the moment.
DI tends to lead to cleaner, more separated design - and that gives advantages all round.

If you want to change the behavior after a release build, then you will need a DI framework that supports external configurations, yes.
But I can think of other scenarios in which this configuration isn't necessary: for example control the injection of the components in your business logic. Or use a DI framework to make unit testing easier.

You should read about PRISM in .NET (it's best practices to do composite applications in .NET). In these best practices each module "Expose" their implementation type inside a shared container. This way each module has clear responsabilities over "who provide the implementation for this interface". I think it will be clear enough when you will understand how PRISM work.

When you use inversion of control you are helping to make your class do as little as possible. Let's say you have some windows service that waits for files and then performs a series of processes on the file. One of the processes is to convert it to ZIP it then Email it.
public class ZipProcessor : IFileProcessor
{
IZipService ZipService;
IEmailService EmailService;
public void Process(string fileName)
{
ZipService.Zip(fileName, Path.ChangeFileExtension(fileName, ".zip"));
EmailService.SendEmailTo(................);
}
}
Why would this class need to actually do the zipping and the emailing when you could have dedicated classes to do this for you? Obviously you wouldn't, but that's only a lead up to my point :-)
In addition to not implementing the Zip and email why should the class know which class implements the service? If you pass interfaces to the constructor of this processor then it never needs to create an instance of a specific class, it is given everything it needs to do the job.
Using a D.I.C. you can configure which classes implement certain interfaces and then just get it to create an instance for you, it will inject the dependencies into the class.
var processor = Container.Resolve<ZipProcessor>();
So now not only have you cleanly separated the class's functionality from shared functionality, but you have also prevented the consumer/provider from having any explicit knowledge of each other. This makes reading code easier to understand because there are less factors to consider at the same time.
Finally, when unit testing you can pass in mocked dependencies. When you test your ZipProcessor your mocked services will merely assert that the class attempted to send an email rather than it really trying to send one.
//Mock the ZIP
var mockZipService = MockRepository.GenerateMock<IZipService>();
mockZipService.Expect(x => x.Zip("Hello.xml", "Hello.zip"));
//Mock the email send
var mockEmailService = MockRepository.GenerateMock<IEmailService>();
mockEmailService.Expect(x => x.SendEmailTo(.................);
//Test the processor
var testSubject = new ZipProcessor(mockZipService, mockEmailService);
testSubject.Process("Hello.xml");
//Assert it used the services in the correct way
mockZipService.VerifyAlLExpectations();
mockEmailService.VerifyAllExceptions();
So in short. You would want to do it to
01: Prevent consumers from knowing explicitly which provider implements the services it needs, which means there's less to understand at once when you read code.
02: Make unit testing easier.
Pete

Singleton for Application Configuration

In all my projects till now, I use to use singleton pattern to access Application configuration throughout the application. Lately I see lot of articles taking about not to use singleton pattern , because this pattern does not promote of testability also it hides the Component dependency.
My question is what is the best way to store Application configuration, which is easily accessible throughout the application without passing the configuration object all over the application ?.
Thanks in Advance
Madhu

I think an application configuration is an excellent use of the Singleton pattern. I tend to use it myself to prevent having to reread the configuration each time I want to access it and because I like to have the configuration be strongly typed (i.e, not have to convert non-string values each time). I usually build in some backdoor methods to my Singleton to support testability -- i.e., the ability to inject an XML configuration so I can set it in my test and the ability to destroy the Singleton so that it gets recreated when needed. Typically these are private methods that I access via reflection so that they are hidden from the public interface.
EDIT We live and learn. While I think application configuration is one of the few places to use a Singleton, I don't do this any more. Typically, now, I will create an interface and a standard class implementation using static, Lazy<T> backing fields for the configuration properties. This allows me to have the "initialize once" behavior for each property with a better design for testability.

Use dependency injection to inject the single configuration object into any classes that need it. This way you can use a mock configuration for testing or whatever you want... you're not explicitly going out and getting something that needs to be initialized with configuration files. With dependency injection, you are not passing the object around either.

For that specific situation I would create one configuration object and pass it around to those who need it.
Since it is the configuration it should be used only in certain parts of the app and not necessarily should be Omnipresent.
However if you haven't had problems using them, and don't want to test it that hard, you should keep going as you did until today.
Read the discussion about why are they considered harmful. I think most of the problems come when a lot of resources are being held by the singleton.
For the app configuration I think it would be safe to keep it like it is.

The singleton pattern seems to be the way to go. Here's a Setting class that I wrote that works well for me.

If any component relies on configuration that can be changed at runtime (for example theme support for widgets), you need to provide some callback or signaling mechanism to notify about the changed config. That's why it is not enough to pass only the needed parameters to the component at creation time (like color).
You also need to provide access to the config from inside of the component (pass complete config to component), or make a component factory that stores references to the config and all its created components so it can eventually apply the changes.
The former has the big downside that it clutters the constructors or blows up the interface, though it is maybe fastest for prototyping. If you take the "Law of Demeter" into account this is a big no because it violates encapsulation.
The latter has the advantage that components keep their specific interface where components only take what they need, and as a bonus gives you a central place for refactoring (the factory). In the long run code maintenance will likely benefit from the factory pattern.
Also, even if the factory was a singleton, it would likely be used in far fewer places than a configuration singleton would have been.

Here is an example done using Castale.Core >> DictionaryAdapter and StructureMap

Proper Logging in OOP context

Here is a problem I've struggled with ever since I first started learning object-oriented programming: how should one implement a logger in "proper" OOP code?
By this, I mean an object that has a method that we want every other object in the code to be able to access; this method would output to console/file/whatever, which we would use for logging--hence, this object would be the logger object.
We don't want to establish the logger object as a global variable, because global variables are bad, right? But we also don't want to have the pass the logger object in the parameters of every single method we call in every single object.
In college, when I brought this up to the professor, he couldn't actually give me an answer. I realize that there are actually packages (for say, Java) that might implement this functionality. What I am ultimately looking for, though, is the knowledge of how to properly and in the OOP way implement this myself.

You do want to establish the logger as a global variable, because global variables are not bad. At least, they aren't inherently bad. A logger is a great example of the proper use of a globally accessible object. Read about the Singleton design pattern if you want more information.

There are some very well thought out solutions. Some involve bypassing OO and using another mechanism (AOP).
Logging doesn't really lend itself too well to OO (which is okay, not everything does). If you have to implement it yourself, I suggest just instantiating "Log" at the top of each class:
private final log=new Log(this);
and all your logging calls are then trivial: log.print("Hey");
Which makes it much easier to use than a singleton.
Have your logger figure out what class you are passing in and use that to annotate the log. Since you then have an instance of log, you can then do things like:
log.addTag("Bill");
And log can add the tag bill to each entry so that you can implement better filtering for your display.
log4j and chainsaw are a perfect out of the box solution though--if you aren't just being academic, use those.

A globally accessible logger is a pain for testing. If you need a "centralized" logging facility create it on program startup and inject it into the classes/methods that need logging.
How do you test methods that use something like this:
public class MyLogger
{
public static void Log(String Message) {}
}
How do you replace it with a mock?
Better:
public interface ILog
{
void Log(String message);
}
public class MyLog : ILog
{
public void Log(String message) {}
}

I've always used the Singleton pattern to implement a logging object.

You could look at the Singleton pattern.

Create the logger as a singleton class and then access it using a static method.

I think you should use AOP (aspect-oriented programming) for this, rather than OOP.

In practice a singleton / global method works fine, in my opinion. Preferably the global thing is just a framework to which you can connect different listeners (observer pattern), e.g. one for console output, one for database output, one for Windows EventLog output, etc.
Beware for overdesign though, I find that in practice a single class with just global methods can work quite nicely.
Or you could use the infrastructure the particular framework you work in offers.

The Enterprise Library Logging Application Block that comes from Microsoft's Pattern & Practices group is a great example of implementing a logging framework in an OOP environment. They have some great documentation on how they have implemented their logging application block and all the source code is available for your own review or modification.
There are other similar implementations: log4net, log4j, log4cxx
They way they have implemented the Enterprise Library Logging Application Block is to have a static Logger class with a number of different methods that actually perform the log operation. If you were looking at patterns this would probably be one of the better uses of the Singleton pattern.

I am all for AOP together with log4*. This really helped us.
Google gave me this article for instance. You can try to search more on that subject.

(IMHO) how 'logging' happens isn't part of your solution design, it's more part of whatever environment you happen to be running in - like System and Calendar in Java.
Your 'good' solution is one that is as loosely coupled to any particular logging implementation as possible so think interfaces. I'd check out the trail here for an example of how Sun tackled it as they probably came up with a pretty good design and laid it all out for you to learn from!

use a static class, it has the least overhead and is accessible from all project types within a simple assembly reference
note that a Singleton is equivalent, but involves unnecessary allocation
if you are using multiple app domains, beware that you may need a proxy object to access the static class from domains other than the main one
also if you have multiple threads you may need to lock around the logging functions to avoid interlacing the output
IMHO logging alone is insufficient, that's why I wrote CALM
good luck!

Maybe inserting Logging in a transparent way would rather belong in the Aspect Oriented Programming idiom. But we're talking OO design here...
The Singleton pattern may be the most useful, in my opinion: you can access the Logging service from any context through a public, static method of a LoggingService class.
Though this may seem a lot like a global variable, it is not: it's properly encapsulated within the singleton class, and not everyone has access to it. This enables you to change the way logging is handled even at runtime, but protects the working of the logging from 'vilain' code.
In the system I work on, we create a number of Logging 'singletons', in order to be able to distinguish messages from different subsystems. These can be switched on/off at runtime, filters can be defined, writing to file is possible... you name it.

I've solved this in the past by adding an instance of a logging class to the base class(es) (or interface, if the language supports that) for the classes that need to access logging. When you log something, the logger looks at the current call stack and determines the invoking code from that, setting the proper metadata about the logging statement (source method, line of code if available, class that logged, etc.) This way a minimal number of classes have loggers, and the loggers don't need to be specifically configured with the metadata that can be determined automatically.
This does add considerable overhead, so it is not necessarily a wise choice for production logging, but aspects of the logger can be disabled conditionally if you design it in such a way.
Realistically, I use commons-logging most of the time (I do a lot of work in java), but there are aspects of the design I described above that I find beneficial. The benefits of having a robust logging system that someone else has already spent significant time debugging has outweighed the need for what could be considered a cleaner design (that's obviously subjective, especially given the lack of detail in this post).
I have had issues with static loggers causing permgen memory issues (at least, I think that's what the problem is), so I'll probably be revisiting loggers soon.

To avoid global variables, I propose to create a global REGISTRY and register your globals there.
For logging, I prefer to provide a singleton class or a class which provides some static methods for logging.
Actually, I'd use one of the existing logging frameworks.

One other possible solution is to have a Log class which encapsulates the logging/stored procedure. That way you can just instantiate a new Log(); whenever you need it without having to use a singleton.
This is my preferred solution, because the only dependency you need to inject is the database if you're logging via database. If you're using files potentially you don't need to inject any dependencies. You can also entirely avoid a global or static logging class/function.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Reflected SQLAlchemy metadata in celery tasks? - sqlalchemy

Related

where to put the MySQL connection code from MCPKit in cocoa

Prefetching data in with Linq-to-SQL, IOC and Repository pattern

Why would you want Dependency Injection without configuration?

Singleton for Application Configuration

Proper Logging in OOP context

Categories

Resources