SQL Alchemy and generating ALTER TABLE statements - sqlalchemy

I want to programatically generate ALTER TABLE statements in SQL Alchemy to add a new column to a table. The column to be added should take its definition from an existing mapped class.
So, given an SQL Alchemy Column instance, can I generate the SQL schema definition(s) I would need for ALTER TABLE ... ADD COLUMN ... and CREATE INDEX ...?
I've played at a Python prompt and been able to see a human-readable description of the data I'm after:
>>> DBChain.__table__.c.rName
Column('rName', String(length=40, convert_unicode=False, assert_unicode=None, unicode_error=None, _warn_on_bytestring=False), table=<Chain>)
When I call engine.create_all() the debug log includes the SQL statements I'm looking to generate:
CREATE TABLE "Chain" (
...
"rName" VARCHAR(40),
...
)
CREATE INDEX "ix_Chain_rName" ON "Chain" ("rName")
I've heard of sqlalchemy-migrate, but that seems to be built around static changes and I'm looking to dynamically generate schema-changes.
(I'm not interested in defending this design, I'm just looking for a dialect-portable way to add a column to an existing table.)

After tracing engine.create_all() with a debugger I've discovered a possible answer:
>>> engine.dialect.ddl_compiler(
... engine.dialect,
... DBChain.__table__.c.rName ) \
... .get_column_specification(
... DBChain.__table__.c.rName )
'"rName" VARCHAR(40)'
The index can be created with:
sColumnElement = DBChain.__table__.c.rName
if sColumnElement.index:
sIndex = sa.schema.Index(
"ix_%s_%s" % (rTableName, sColumnElement.name),
sColumnElement,
unique=sColumnElement.unique)
sIndex.create(engine)

Related

Rails - How to reference model's own column value during update statement?

Is it possible to achieve something like this?
Suppose name and plural_name are fields of Animal's table.
Suppose pluralise_animal is a helper function which takes a string and returns its plural literal.
I cannot loop over the animal records for technical reasons.
This is just an example
Animal.update_all("plural_name = ?", pluralise_animal("I WANT THE ANIMAL NAME HERE, the `name` column's value"))
I want something similar to how you can use functions in MySQL while modifying column values. Is this out-of-scope or possible?
UPDATE animals SET plural_name = CONCAT(name, 's') -- just an example to explain what I mean by referencing a column. I'm aware of the problems in this example.
Thanks in advance
I cannot loop over the animal records for technical reasons.
Sorry, this cannot be done with this restriction.
If your pluralizing helper function is implemented in the client, then you have to fetch data values back to the client, pluralize them, and then post them back to the database.
If you want the UPDATE to run against a set of rows without fetching data values back to the client, then you must implement the pluralization logic in an SQL expression, or a stored function or something.
UPDATE statements run in the database engine. They cannot call functions in the client.
Use a ruby script to generate a SQL script that INSERTS the plural values into a temp table
File.open(filename, 'w') do |file|
file.puts "CREATE TEMPORARY TABLE pluralised_animals(id INT, plural varchar(50));"
file.puts "INSERT INTO pluralised_animals(id, plural) VALUES"
Animal.each.do |animal|
file.puts( "( #{animal.id}, #{pluralise_animal(animal.name)}),"
end
end
Note: replace the trailing comma(,) with a semicolon (;)
Then run the generated SQL script in the database to populate the temp table.
Finally run a SQL update statement in the database that joins the temp table to the main table...
UPDATE animals a
INNER JOIN pluralised_animals pa
ON a.id = pa.id
SET a.plural_name = pa.plural;

PyFlink Error/Exception: "Hive Table doesn't support consuming update changes which is produced by node PythonGroupAggregate"

Using Flink 1.13.1 and a pyFlink and a user-defined table aggregate function (UDTAGG) with Hive tables as source and sinks, I've been encountering an error:
pyflink.util.exceptions.TableException: org.apache.flink.table.api.TableException:
Table sink 'myhive.mydb.flink_tmp_model' doesn't support consuming update changes
which is produced by node PythonGroupAggregate
This is the SQL CREATE TABLE for the sink
table_env.execute_sql(
"""
CREATE TABLE IF NOT EXISTS flink_tmp_model (
run_id STRING,
model_blob BINARY,
roc_auc FLOAT
) PARTITIONED BY (dt STRING) STORED AS parquet TBLPROPERTIES (
'sink.partition-commit.delay'='1 s',
'sink.partition-commit.policy.kind'='success-file'
)
"""
)
What's wrong here?
I imagine you are executing a streaming query that is doing some sort of aggregation that requires updating previously emitted results. The parquet/hive sink does not support this -- once results are written, they are final.
One solution would be to execute the query in batch mode. Another would be to use a sink (or a format) that can handle updates. Or modify the query so that it only produces final results -- e.g., a time-windowed aggregation rather than an unbounded one.

update SQL table from foreign data source without first deleting all entries (but do delete entries no longer present)

I have a bunch of MySQL tables I work with where the ultimate data source from a very slow SQL server administered by someone else. My predecessors' solution to dealing with this is to do queries more-or-less like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table;')
other_python_wrapper('INSERT INTO local_table VALUES() %s;' % results)
The problem is this means you can never use values in local_table as foreign key constraints for other tables because they are constantly being deleted and added back into the table whenever you update it from the foreign source. However, if a record really does dis sapper in the results to the query on the foreign server, than that usually means you would want to trigger a cascade effect to drop records in other local tables that you've linked with a foreign key constraint to data propagated from the foreign table.
The only semi-reasonable solution I've come up with is to do something like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table_temp;')
other_python_wrapper('INSERT INTO local_table_temp VALUES() %s;' % results)
other_python_wrapper('DELETE FROM local_table WHERE primary_key NOT IN local_table_temp;')
other_python_wrapper('INSERT INTO local_table SELECT * FROM local_table_temp ON DUPLICATE KEY UPDATE local_table.col2 = local_table_temp.col2, local_table.col3 = local_table_temp.col3
The problem is there's a fair number of these tables and many of the tables have a large number of columns that need to be updated so it's tedious to write the same boiler-plate over & over. And if the table schema changes, there's more than one place you need to update the listing of all columns.
Is there any more concise way to do this with the SQL code?
Thanks!
I have a somewhat un-satisfactory answer to my own question. Since I'm using python to query the foreign Oracle database and put that into SQL, and I trust the format of the table and column names to be pretty well behaved, I can just wrap the whole procedure in python code and have python generate the update SQL update queries based off inspecting the tables.
For a number of reasons, I'd still like to see a better way to do this, but it works for me because:
I'm using an external scripting language that can inspect the database schema anyway.
I trust the database, column, and table names I'm working with to be well-behaved because these are all things I have direct control over.
My solution depends on the local SQL table structure; specifically which keys are primary keys. The code won't work without properly structured tables. But that's OK, because I can restructure the MySQL tables to make my python code work.
While I do hope someone else can think up a more-elegant and/or general-purpose solution, I will offer up my own python code to anyone who is working on a similar problem who can safely make the same assumptions I did above.
Below is a python wrapper I use to do simple SQL queries in python:
import config, MySQLdb
class SimpleSQLConn(SimpleConn):
'''simplified wrapper around a MySQLdb.connection'''
def __init__(self, **kwargs):
self._connection = MySQLdb.connect(host=config.mysql_host,
user=config.mysql_user,
passwd=config.mysql_pass,
**kwargs)
self._cursor = self._connection.cursor()
def query(self, query_str):
self._cursor.execute(query_str)
self._connection.commit()
return self._cursor.fetchall()
def columns(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table))g]
def primary_keys(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table)) if 'PRI' in x]
And here is the actual update function, using the SQL wrapper class above:
def update_table(database,
table,
mysql_insert_with_dbtable_placeholder):
'''update a mysql table without first deleting all the old records
mysql_insert_with_dbtable_placeholder should be set to a string with
placeholders for database and table, something like:
mysql_insert_with_dbtable_placeholder = "
INSERT INTO `%(database)s`.`%(table)s` VALUES (a, b, c);
note: code as is will update all the non-primary keys, structure
your tables accordingly
'''
sql = SimpleSQLConn()
query ='DROP TABLE IF EXISTS `%(database)s`.`%(table)s_temp_for_update`' %\
{'database': database, 'table': table}
sql.query(query)
query ='CREATE TABLE `%(database)s`.`%(table)s_temp_for_update` LIKE `%(database)s`.`%(table)s`'%\
{'database': database, 'table': table}
sql.query(query)
query = mysql_insert_with_dbtable_placeholder %\
{'database': database, 'table': '%s_temp_for_update' % table}
sql.query(query)
query = '''DELETE FROM `%(database)s`.`%(table)s` WHERE
(%(primary_keys)s) NOT IN
(SELECT %(primary_keys)s FROM `%(database)s`.`%(table)s_temp_for_update`);
''' % {'database': database,
'table': table,
'primary_keys': ', '.join(['`%s`' % key for key in sql.primary_keys(database, table)])}
sql.query(query)
update_columns = [col for col in sql.columns(database, table)
if col not in sql.primary_keys(database, table)]
query = '''INSERT into `%(database)s`.`%(table)s`
SELECT * FROM `%(database)s`.`%(table)s_temp_for_update`
ON DUPLICATE KEY UPDATE
%(update_cols)s
''' % {'database': database,
'table': table,
'update_cols' : ',\n'.join(['`%(table)s`.`%(col)s` = `%(table)s_temp_for_update`.`%(col)s`' \
% {'table': table, 'col': col} for col in update_columns])}
sql.query(query)

How to create table without schema in BigQuery by API?

Simply speaking I would create table with given name providing only data.
I have some JUnit's with sample data (jsons)
I have to provide schema for above files to create tables for them
I suppose that don't need provide above schemas.
Why? Because in BigQuery console I can create table from query (even such simple like: select 1, 'test') or I can upload json to create table with schema autodetection => probably could also do it programatically
I saw https://chartio.com/resources/tutorials/how-to-create-a-table-from-a-query-in-google-bigquery/#using-the-api and know that could parse jsons with data to queries and use Jobs.insert API to run them but it's over engineered and has some other disadvanteges e.g. boilerplate code.
After some research I found possibly simpler way of creating table on fly, but it doesn't work for me, code below:
Insert insert = bigquery.jobs().insert(projectId,
new Job().setConfiguration(
new JobConfiguration().setLoad(
new JobConfigurationLoad()
.setSourceFormat("NEWLINE_DELIMITED_JSON")
.setDestinationTable(
new TableReference()
.setProjectId(projectId)
.setDatasetId(dataSetId)
.setTableId(tableId)
)
.setCreateDisposition("CREATE_IF_NEEDED")
.setWriteDisposition(writeDisposition)
.setSourceUris(Collections.singletonList(sourceUri))
.setAutodetect(true)
)
));
Job myInsertJob = insert.execute();
JSON file which is used as a source data is pointed by sourceUri, looks like:
[
{
"stringField1": "value1",
"numberField2": "123456789"
}
]
Even if I used setCreateDisposition("CREATE_IF_NEEDED") I still receive error: "Not found: Table ..."
Is there any other method in API or better approach than above to exclude schema?
The code in your question is perfectly fine, and it does create table if it doesn't exist. However, it fails when you use partition id in place of table id, i.e. when destination table id is "table$20170323" which is what you used in your job. In order to write to partition, you will have to create table first.

Fluent NHibernate Schema output with errors when using list

I have two tables which are Many-To-One mapped. However, it is important to maintain the order of the second table, so when I use automapping, Fluent automapper creates a bag. I changed this to force a list by using this command:
.Override(Of ingredients)(Function(map) map.HasMany(Function(x) x.PolygonData).AsList())
(VB.NET syntax)
So I say "AsList" and instead of using a bag, the mapping xml which gets generated contains a list now. Fine so far. However,
the statement generated cannot be handled by MySQL. I use MySQL55Dialect to create the statements and I use exactly that version. But it creates the following create:
create table `ingredients` (
Id INTEGER NOT NULL AUTO_INCREMENT,
Name FLOAT,
Amout FLOAT,
Soup_id INTEGER,
Index INTEGER,
primary key (Id)
)
It crashes because of the line "Index INTEGER," but I don't know what to do here. Any ideas?
Thanks!!
Best,
Chris
I would suspect that Index could be a keyword for MySQL. To avoid such conflict, we can define different Index column name (sorry for C# notation)
HasMany(x => x.PolygonData)
.AsList(idx => idx.Column("indexColumnName").Type<int>())