How to add new dialect to Alembic besides built-in dialects? - sqlalchemy

Alembic support 5 built-in dialects only: https://github.com/sqlalchemy/alembic/tree/master/alembic/ddl
Now I want to manages schema in Apache Hive via alembic and noticed that PyHive supports SQLAlchemy interfaces so technically Alembic can support hive as a new dialect. So I've found this post Integrate PyHive and Alembic but it seems needs hacking alembic/ddl/impl.py within package alembic.
Is there any working way to do this? I don't mind contributing PRs to either alembic or pyHive but needs guidance.

I used this thread in the original email list to get enough information:
Does it mean it must introduce dependency to alembic (since it uses alembic.ddl.impl.DefaultImpl) in a package (.e.g, pyHive) that supports sqlalchemy interfaces?
well you have to put it in a try/except ImportError block so that if alembic isn't installed, it silently passes.
is there any guidance to support this at alembic level in a plug-gable way? E.g., declare a HiveImpl class in env.py of a project uses alembic?
you could put one in your env.py also but if you are the person working on the dialect you can have this built in, see the example in sqlalchemy-redshift: https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/blob/master/sqlalchemy_redshift/dialect.py#L27

Related

Is it possible to write a dual pass checkstyle check?

I have two situations I need a checkstyle check for. Let's say I have a bunch of objects with the annotation #BusinessLogic. I want to do a first pass through all *.java files creating a Set with the full classnames of these objects. Let's say ONE of the classes here is MyBusinessLogic. NEXT, and as part of a custom checkstyle checker, I want to go through and fail the build if there is any lines of code that say "new MyBusinessLogic()" in any of the code. We want to force DI when objects are annotated with #BusinessLogic. Is this possible with checkstyle? I am not sure checkstyle does a dual pass.
Another option I am considering is some gradle plugin perhaps that scans all java files and writes to a file the list of classes annotated with #BusinessLogic and then running checkstyle after that where my checker reads in the file?
My next situation is I have a library delivered as a jar so in that jar, I also have classes annotated with #BusinessLogic and I need to make sure those are also added to my list of classes that should not be newed up manually and only created with dependency injection.
Follow up question from the previous question here after reading through checkstyle docs:
How to enforce this pattern via gradle plugins?
thanks,
Dean
Is it possible to write a dual pass checkstyle check?
Possible, yes, but not officially supported. Support would come at https://github.com/checkstyle/checkstyle/issues/3540 but it hasn't been agreed on.
Multi-file validation is possible with FileSets (still not officially supported), but it becomes harder with TreeWalker checks. This is because TreeWalker doesn't chain finishProcessing to the checks. You can implement your own TreeWalker that will chain this finishProcessing to implementation of AbstractChecks.
You will have to do everything in 1 pass with this method. Log all new XXX and classes with annotation #YYY. In the finishProcessing method, correlate the information obtained between the 2 and print a violation when you have a match.
I have a library delivered as a jar
Checkstyle does not support reading JARs or bytecode. You can always create a hard coded list as an alternative. The only other way is build your own reader into Checkstyle.

Reflected SQLAlchemy metadata in celery tasks?

For better testability and other reasons, it is good to have SQLAlchemy database sessions configuration non-global as described very well in the following question:
how to setup sqlalchemy session in celery tasks with no global variable (and also discussed in https://github.com/celery/celery/issues/3561 )
Now, the question is, how to handle metadata elegantly? If my understanding is correct, metadata can be had once, eg:
engine = create_engine(DB_URL, encoding='utf-8', pool_recycle=3600,
pool_size=10)
# db_session = get_session() # this is old global session
meta = MetaData()
meta.reflect(bind=engine)
Reflecting on each task execution is not good for performance reason, metadata is more or less stable and thread-safe structure (if we only read it).
However, metadata sometimes changes (celery is not the "owner" of the db schema), causing errors in workers.
What could be an elegant way to deal with meta in a testable way, plus still be able to react to underlying db changes? (alembic in use, if it is relevant).
I was thinking of using alembic version change as a signal to re-reflect, but not quite sure how to make it work nicely in celery. For instance, if more than one worker will at once sense a change, the global meta may be treated in a non-thread safety way.
If it matters, celery use in the case is standalone, no web framework modules/apps/whatever present in the celery app. The problem is also simplified as only SQLAlchemy Core is in use, not object mapper.
This is only partial solution, and it's for SQLAlchemy ORM (but I guess something similar is easy to implement for Core).
Main points:
engine is at module level, but config (access URL, parameters) is from os.environ
session is in it's own factory function
at module level: BaseModel = automap_base() and then table classes use that BaseModel as superclass and usually just one argument - __tablename__, but arbitrary relationships, attributes can be added there (very similar to normal ORM use)
at module level: BaseModel.prepare(ENGINE, reflect=True)
Tests (using pytest) inject environment variable (eg DB_URL) in conftest.py at module level.
One important moment: database_session is always initiated (that is, factory function called) in the task function, and propagated into all functions explicitly. This way allows to control units of work naturally, usually one transaction per task. This also simplifies testing, because all database-using functions can be provided with fake or real (test) database session.
"Task function" is the above is a function, which is called in the function, which is decorated by task - this way task function can be tested without task machinery.
This is only partial solution, because redoing reflection is not there. If task workers can be stopped for a moment (and database anyway experience downtime due to schema changes) as those usually are backgrounds tasks, so it does not pose a problem. Workers can also be restarted by some external watchdog, which can monitor database changes. This can be made convenient by using supervisord or some other way to control celery workers running in foreground.
All in all, after I solved the problem as described above I value "explicit is better than implicit" philosophy even more. All those magical "app"s, "request"s be it in celery or Flask, may bring minuscule abbreviations in the function signatures, but I'd rather passed some kind of context down the call chain for improved testability and better context understanding and management.

Writing unit tests for Solr plugin using JUnit4, including creating collections

I wrote a plugin for Solr which contains new stream expressions.
Now, I'm trying to understand what is the best way to write them unit tests:
Unit tests which need to include creation of collections in Solr, so I will be able to check if my new stream expressions return me the right data they suppose.
I saw over the web that there is a class called "SolrTestCaseJ4", but I didn't find how to use it for creating new collections in Solr and add them data and so on...
Can you please recommend me which class may I use for that purpose or any other way to test my new classes?
BTW, we are using Solr 7.1 in cloud mode and JUnit4.
Thanks in advance.
you could use MiniSolrCloudCluster
Here is an example how to create collections (all for unit test):
https://github.com/lucidworks/solr-hadoop-common/blob/159cce044c1907e646c2644083096150d27c5fd2/solr-hadoop-testbase/src/main/java/com/lucidworks/hadoop/utils/SolrCloudClusterSupport.java#L132
Eventually I found a better class, which simplifies everything and implements more functionality than MiniSolrCloudCluster (actually it contains MiniSolrCloudCluster inside it as a member).
This class called SolrCloudTestCase, and as you can see here, even Solr's source code uses it in their own unit tests.

.Net Core 1.0.0, multiple projects, and IConfiguration

TL;DR version:
What's the best way to use the .Net Core IConfiguration framework so values in a single appsettings.json file can be used in multiple projects within the solution? Additionally, I need to be able to access the values without using constructor injection (explanation below).
Long version:
To simply things, lets say we have a solution with 3 projects:
Data: responsible for setting up the ApplicationDbContext, generates migrations
Services: class library with some business logic
WebApi: REST API with a Startup.cs
With this architecture, we have to use a work-around for the "add-migration" issue that remains in Core 1.0.0. Part of this work-around means that we have a ApplicationDbContextFactory class that must have a parameterless constructor (no DI) in order for the migration CLI to use it.
Problem: Right now we have connection strings living in two places;
ApplicationDbContextFactory for the migration work-around
in the WebApi's "appsettings.json" file
Prior to .Net Core, we could use ConfigurationManager to pull connection strings for all solution projects from one web.config file based on the startup project. How do we use this new IConfiguration framework to store connection strings in one place that need to be used all over the solution? Additionally, I can't inject into the ApplicationDbContextFactory class' constructor... so that further complicates things (more-so since they changed how the [FromServices] attribute works).
Side note: I would like to avoid implementing an entire DI middleware just to get attribute injection, since Core includes it's own DI framework. If I can avoid that and still access appsettings.json values, that would be ideal.
If I need to add code let me know, this post was already long enough, so I'll hold off on examples until requested. ;)

Gradle: configuration injection vs inheritance

The Gradle docs state (49.9):
Properties and methods declared in a project are inherited to all its
subprojects. This is an alternative to configuration injection. But we
think that the model of inheritance does not reflect the problem space
of multi-project builds very well. In a future edition of this user
guide we might write more about this.
I understand what configuration injection is doing in principle, but I'd like to understand more about the distinctions from inheritance, and why it's a better fit for multi-project builds.
Can anyone give me a few bullets on this?
Got the answer on the Gradle forums.
Essentially, configuration injection allows you to selectively apply properties to subprojects.