SQLAlchemy: exclude few columns during column reflection? - sqlalchemy

I am mapping a table using declarative base and reflection. The db table has 1k+ columns, but I want to map only few hundred columns whose names are available via a sql.
Using reflection, I get the Column info in my event handler function, which allows me modify the Column's attributes but I am unable to skip the column from mapping.
def column_reflect(inspector, table, column_info):
#...
class MYCLASS1(Base):
__table__ = Table('MYTABLE1', mymetadata, autoload_with=myengine, autoload=True, listeners=[('column_reflect', column_reflect)])
Does SQLAlchemy support skipping certain columns while using reflection?
SQLA version: 0.83 and 0.9.0b1.

'van' answered my question so marking the question answered.
The solution was to use include_columns parameter on Table().

Related

JSON flattening in AWS Glue ETL job creates inferred schema with duplicated columns

I'm relatively new to AWS Glue and using the visual AWS Glue studio at the moment. Kind of a niche issue I'm having here...
Context:
I'm building an ETL job that, among other things, should parse/flatten json from a string column to replace it with different fields in different format which I can select to load in my datawarehouse table.
Approach:
I first extract my data from the Glue catalog as a dynamicFrame (in this case only one table).
Then I'm trying to use the approach of unboxing and unnesting.
Let's call that json column data:
def transformTable (glueContext, dfc) -> DynamicFrameCollection:
dyf = dfc.select(list(dfc.keys())[0])
dyf = Unbox.apply(frame=dyf, path="data", format="json")
dyf = UnnestFrame.apply(frame=dyf)
return DynamicFrameCollection({"TranformedTable": dyf}, glueContext)
(Then I have a step to select the right frame from the frame collection, and then I can apply mapping to my fields and load.)
My issue:
Glue automatically infers the data types of the my frame schema (rather successfully)
but it duplicates certain fields into several when the data type is unclear (similar to make_cols in the resolveChoice method), e.g. I end up with 2 fields in the output schema price_int and price_double, where price_int contains only the values that were round numbers by chance and null values everywhere else, etc.
So it seems like the default behavior of this method is to split columns in case of data type doubt (make_cols).
I understand that I could write a resolveChoice for each field, but with this approach they are already split in separate columns in the output schema.
Note: There are dozens of fields in this json, so I'm trying to devise a blanket solution that automatically makes all the fields of the json available in the schema to select and map in the next step, and avoid having to add one line of code for each field I want to extract. (And the json structure will grow with new fields in the future, so I'm trying to limit future ETL maintenance...)
Questions/help needed:
Any idea if there's a way to change this default behavior (like in the resolveChoice method)?
Alternatively, is there a way to apply a kind of default resolveChoice to all problematic fields from the json unboxing? For instance, I could force all problematic fields into string (similar to 'project:string'), and then reformat if needed in the applyMapping step. But resolveChoice seems to need to be applied field by field...
What's a different/better approach I could try? I would like to keep it as dynamic/automated as possible... e.g.:
I think I could maybe extract specific fields from the JSON line by line, but I'm not sure how (looks like the Unbox method is already splitting columns by format). And as explained, it's dozens of fields and growing... so it requires updating the code regularly, instead of just ticking boxes in the list of available fields.
TheRelationalize method could be an option, but it creates distinct frames and this quickly becomes much more complex (there are actually several columns with json, which all need to be flattened...).
Creating crawlers or classifiers which are run automatically regularly for extracting the schema from that specific string column from a table should be an option as well...
Thanks in advance!

cakephp retrive data from one table excluding the associated tables

I am struggling with a basic problem. i am using cake php 2.5. i try to apply the find query in the company model and receiving all the data from companies and with its associations, but i only want to receive the data from company table and want to exclude the data from rest of relationships, can anyone help me with this. below are my queries.
$this->loadModel('Company');
$fields=array('id','name','logo','status');
$conditions=array('status'=>1);
$search_companies = $this->Company->find('first',
compact(array('conditions'=>$conditions,'fields'=>$fields)));
print_r($search_companies);die();
echo json_encode($search_companies);die();
With out seeing your data output, I am just going to take a stab at the problem.
Inside your $search_companies variable you are getting a multidimensional array probably with the other values of the other tables.
Why not just select the one array:
$wantedData = $search_companies['Company'];
// The key Company (which is the model) should be the data you are wanting.
Try setting model's recursive value to -1
$this->Company->recursive = -1;
$search_companies = $this->Company->find('first',
compact(array('conditions'=>$conditions,'fields'=>$fields)));
With this you will not fire the joins queries and therefore you only retrieve model's information.
Cakephp provide this functionality that we can unblind few/all associations on a any model. the keyword unbindModel is used for this purpose. inside the unblindModel you can define the association type and model(s) name that you want to unblind for that specific association.
$this->CurrentModelName->unbindModel(array('AssociationName' => array('ModelName_Youwwant_unblind')));

Sequelize (v1.5) and node

How can I check whether a field exists in my table, via sequelize orm.
Note that I have already defined the full object model. I just need to check whether a particular field exists or not.
You can see what is inside your database via:
sequelize.getQueryInterface().describeTable('nameOfTableHere').success(function(data){})
If you want to check the table of a specific model you could also do this:
sequelize.getQueryInterface().describeTable(Model.tableName).success(function(data) {})
Since I had already defined the object model, the following expression gives an array of field names defined in the model.
Object.keys(Model.rawAttributes)

Multilingual text fields with SQLAlchemy

I am currently evaluating SQLAlchemy for a project. Here is my schema:
a LANGUAGE table, with a row for each language supported
a TRANSLATION table with (ID, LANGUAGE_ID, STR)
various tables will, instead of storing text, store TRANSLATION_IDs, for example, BOOK(ID, TITLE_TRANSLATION_ID, ABSTRACT_TRANSLATION_ID)
Now, assuming each request has the current language ID available (for example, through a thread variable...), I would need SQLAlchemy to automatically join the TRANSLATION table, and thus have text fields in the current language. Something like:
class Book(Base):
id = Column(Integer, primary_key=True)
title = TranslatableText()
abstract = TranslatableText()
When retrieving, the ORM would automatically join to the TRANSLATION table with the current language ID, and my_book.title would give me the title in the current language.
I also need this to work across relations: if a class contains foreign keys to other classes that also contain translatable text fields, I would ideally like those to be retrieved too.
Lastly, I would also need to be able to get to the TRANSLATION_ID for each field, for example through my_book.title_translation_id.
I am not expecting a complete solution, but I'd like to know if something like this is feasible, and where to start.
You have to use the concept of http://docs.sqlalchemy.org/en/latest/orm/extensions/declarative.html#mixin-and-custom-base-classes
Create one top level class and write some funciton like read, write and create. Always call that function to create or read data from the database.
If you dont want to implement the mixin classes then also you can use event http://docs.sqlalchemy.org/en/latest/orm/events.html#sqlalchemy.orm.events.MapperEvents.translate_row

Define custom POST method for MyDAC

I have three tables objects, (primary key object_ID) flags (primary key flag_ID) and object_flags (cross-tabel between objects and flags with some extra info).
I have a query returning all flags, and a one or zero if a given object has a certain flag:
SELECT
f.*,
of.*,
of.objectID IS NOT NULL AS object_has_flag,
FROM
flags f
LEFT JOIN object_flags of
ON (f.flag_ID = of.flag_ID) AND (of.object_ID = :objectID);
In the application (which is written in Delphi), all rows are loaded in a component. The user can assign flags by clicking check boxes in a table, modifying the data.
Suppose one line is edited. Depending on the value of object_has_flag, the following things have to be done:
If object_has_flag was true and still is true, an UPDATE should be done on the relevant row in objects_flags.
If object_has_flag was false but is now true, and INSERT should be done
If object_has_flag was true, but is now false, the row should be deleted
It seems that this cannot be done in one query https://stackoverflow.com/questions/7927114/conditional-replace-or-delete-in-one-query.
I'm using MyDAC's TMyQuery as a dataset. I have written separate code that executes the necessary queries to save changes to a row, but how do I couple this to the dataset? What event handler should I use, and how do I tell the TMyQuery that it should refresh instead of post?
EDIT: apparently, it is not completely clear what the problem is. The standard UpdateSQL, DeleteSQL and InsertSQL cannot be used because sometimes after editing a line (not deleting it or inserting a line), an INSERT or DELETE has to be done.
The short answer is, to paraphrase your answer here:
Look up the documentation for "Updating Data with MyDAC Dataset Components" (as of MyDAC 5.80).
Every TCustomDADataSet (such as TMyQuery) descendant has the capability to set update SQL statements using SQLInsert, SQLUpdate and SQLDelete properties.
TMyUpdateSQL is also a promising component for custom update operations.
It seems that the easiest way is to use the BeforePost event, and determine what has to be done using the OldValue and NewValue properties of several fields.