update SQL table from foreign data source without first deleting all entries (but do delete entries no longer present) - mysql

I have a bunch of MySQL tables I work with where the ultimate data source from a very slow SQL server administered by someone else. My predecessors' solution to dealing with this is to do queries more-or-less like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table;')
other_python_wrapper('INSERT INTO local_table VALUES() %s;' % results)
The problem is this means you can never use values in local_table as foreign key constraints for other tables because they are constantly being deleted and added back into the table whenever you update it from the foreign source. However, if a record really does dis sapper in the results to the query on the foreign server, than that usually means you would want to trigger a cascade effect to drop records in other local tables that you've linked with a foreign key constraint to data propagated from the foreign table.
The only semi-reasonable solution I've come up with is to do something like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table_temp;')
other_python_wrapper('INSERT INTO local_table_temp VALUES() %s;' % results)
other_python_wrapper('DELETE FROM local_table WHERE primary_key NOT IN local_table_temp;')
other_python_wrapper('INSERT INTO local_table SELECT * FROM local_table_temp ON DUPLICATE KEY UPDATE local_table.col2 = local_table_temp.col2, local_table.col3 = local_table_temp.col3
The problem is there's a fair number of these tables and many of the tables have a large number of columns that need to be updated so it's tedious to write the same boiler-plate over & over. And if the table schema changes, there's more than one place you need to update the listing of all columns.
Is there any more concise way to do this with the SQL code?
Thanks!

I have a somewhat un-satisfactory answer to my own question. Since I'm using python to query the foreign Oracle database and put that into SQL, and I trust the format of the table and column names to be pretty well behaved, I can just wrap the whole procedure in python code and have python generate the update SQL update queries based off inspecting the tables.
For a number of reasons, I'd still like to see a better way to do this, but it works for me because:
I'm using an external scripting language that can inspect the database schema anyway.
I trust the database, column, and table names I'm working with to be well-behaved because these are all things I have direct control over.
My solution depends on the local SQL table structure; specifically which keys are primary keys. The code won't work without properly structured tables. But that's OK, because I can restructure the MySQL tables to make my python code work.
While I do hope someone else can think up a more-elegant and/or general-purpose solution, I will offer up my own python code to anyone who is working on a similar problem who can safely make the same assumptions I did above.
Below is a python wrapper I use to do simple SQL queries in python:
import config, MySQLdb
class SimpleSQLConn(SimpleConn):
'''simplified wrapper around a MySQLdb.connection'''
def __init__(self, **kwargs):
self._connection = MySQLdb.connect(host=config.mysql_host,
user=config.mysql_user,
passwd=config.mysql_pass,
**kwargs)
self._cursor = self._connection.cursor()
def query(self, query_str):
self._cursor.execute(query_str)
self._connection.commit()
return self._cursor.fetchall()
def columns(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table))g]
def primary_keys(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table)) if 'PRI' in x]
And here is the actual update function, using the SQL wrapper class above:
def update_table(database,
table,
mysql_insert_with_dbtable_placeholder):
'''update a mysql table without first deleting all the old records
mysql_insert_with_dbtable_placeholder should be set to a string with
placeholders for database and table, something like:
mysql_insert_with_dbtable_placeholder = "
INSERT INTO `%(database)s`.`%(table)s` VALUES (a, b, c);
note: code as is will update all the non-primary keys, structure
your tables accordingly
'''
sql = SimpleSQLConn()
query ='DROP TABLE IF EXISTS `%(database)s`.`%(table)s_temp_for_update`' %\
{'database': database, 'table': table}
sql.query(query)
query ='CREATE TABLE `%(database)s`.`%(table)s_temp_for_update` LIKE `%(database)s`.`%(table)s`'%\
{'database': database, 'table': table}
sql.query(query)
query = mysql_insert_with_dbtable_placeholder %\
{'database': database, 'table': '%s_temp_for_update' % table}
sql.query(query)
query = '''DELETE FROM `%(database)s`.`%(table)s` WHERE
(%(primary_keys)s) NOT IN
(SELECT %(primary_keys)s FROM `%(database)s`.`%(table)s_temp_for_update`);
''' % {'database': database,
'table': table,
'primary_keys': ', '.join(['`%s`' % key for key in sql.primary_keys(database, table)])}
sql.query(query)
update_columns = [col for col in sql.columns(database, table)
if col not in sql.primary_keys(database, table)]
query = '''INSERT into `%(database)s`.`%(table)s`
SELECT * FROM `%(database)s`.`%(table)s_temp_for_update`
ON DUPLICATE KEY UPDATE
%(update_cols)s
''' % {'database': database,
'table': table,
'update_cols' : ',\n'.join(['`%(table)s`.`%(col)s` = `%(table)s_temp_for_update`.`%(col)s`' \
% {'table': table, 'col': col} for col in update_columns])}
sql.query(query)

Related

Rails - How to reference model's own column value during update statement?

Is it possible to achieve something like this?
Suppose name and plural_name are fields of Animal's table.
Suppose pluralise_animal is a helper function which takes a string and returns its plural literal.
I cannot loop over the animal records for technical reasons.
This is just an example
Animal.update_all("plural_name = ?", pluralise_animal("I WANT THE ANIMAL NAME HERE, the `name` column's value"))
I want something similar to how you can use functions in MySQL while modifying column values. Is this out-of-scope or possible?
UPDATE animals SET plural_name = CONCAT(name, 's') -- just an example to explain what I mean by referencing a column. I'm aware of the problems in this example.
Thanks in advance
I cannot loop over the animal records for technical reasons.
Sorry, this cannot be done with this restriction.
If your pluralizing helper function is implemented in the client, then you have to fetch data values back to the client, pluralize them, and then post them back to the database.
If you want the UPDATE to run against a set of rows without fetching data values back to the client, then you must implement the pluralization logic in an SQL expression, or a stored function or something.
UPDATE statements run in the database engine. They cannot call functions in the client.
Use a ruby script to generate a SQL script that INSERTS the plural values into a temp table
File.open(filename, 'w') do |file|
file.puts "CREATE TEMPORARY TABLE pluralised_animals(id INT, plural varchar(50));"
file.puts "INSERT INTO pluralised_animals(id, plural) VALUES"
Animal.each.do |animal|
file.puts( "( #{animal.id}, #{pluralise_animal(animal.name)}),"
end
end
Note: replace the trailing comma(,) with a semicolon (;)
Then run the generated SQL script in the database to populate the temp table.
Finally run a SQL update statement in the database that joins the temp table to the main table...
UPDATE animals a
INNER JOIN pluralised_animals pa
ON a.id = pa.id
SET a.plural_name = pa.plural;

SqlAlchemy table name reflection using an efficient method

I am using the code below to extract table names on a database at a GET call in a Flask app.:
session = db.session()
qry = session.query(models.BaseTableModel)
results = session.execute(qry)
table_names = []
for row in results:
for column, value in row.items():
#this seems like a bit of a hack
if column == "tables_table_name":
table_names.append(value)
print('{0}: '.format(table_names))
Given that tables in the database may added/deleted regularly, is the code above an efficient and reliable way to get the names of tables in a database?
One obvious optimization is to use row["tables_table_name"] instead of second loop.
Assuming that BaseTableModel is a table, which contains names of all other tables, than you're using the fastest approach to get this data.

sql alchemy column value dependent on other table

If there's a table with a column that I want to get the number of occurrences of the columns 'id' in another tables column?
So if there was a table 'player' of every player, and a table 'goals' that listed every goal scored, is there an easy way to autoupdate the player column every time a goal they score is added to the goal table?
another example would be a 'team' and 'players' table, where the table updates the team.number_of_players every time a player is added with player.team_name == team.name or something like that.
Would using JSON as a way of holding {'username': True} or something like that for each user be worthwhile?
You have several ways to implement you idea:
Easiest way: you can update your columns with update query, something like this:
try:
player = Player(name='New_player_name', team_id=3)
Session.add(player)
Session.flush()
Session.query(Team).filter(Team.id == Player.team_id).update({Team.players_number: Team.players_number + 1})
Session.commit()
except SQLAlchemyError:
Session.rollback()
# error processing
You can implement sql-trigger. But an implementation is different for different DBMS. So, you can read about it in the documentation of your DBMS.
You can implement SQLAlchemy trigger, like this:
from sqlalchemy import event
class Team(Base):
...
class Player(Base):
...
#staticmethod
def increment_players_number(mapper, connection, player):
try:
Session.query(Team).filter(Team.id == player.team_id)\
.update({Team.players_number: Team.players_number + 1})
except SQLAlchemyError:
Session.rollback()
# error processing
event.listen(Player, 'after_insert', Player.increment_players_number)
As you see, there are always two queries, because you should perform two procedures: insert and update. I think (but I'm not sure) that some DBMS can process queries like this:
UPDATE table1 SET column = column + 1 WHERE id = SOMEID AND (INSERT INTO table2 values (VALUES))

SQL Alchemy and generating ALTER TABLE statements

I want to programatically generate ALTER TABLE statements in SQL Alchemy to add a new column to a table. The column to be added should take its definition from an existing mapped class.
So, given an SQL Alchemy Column instance, can I generate the SQL schema definition(s) I would need for ALTER TABLE ... ADD COLUMN ... and CREATE INDEX ...?
I've played at a Python prompt and been able to see a human-readable description of the data I'm after:
>>> DBChain.__table__.c.rName
Column('rName', String(length=40, convert_unicode=False, assert_unicode=None, unicode_error=None, _warn_on_bytestring=False), table=<Chain>)
When I call engine.create_all() the debug log includes the SQL statements I'm looking to generate:
CREATE TABLE "Chain" (
...
"rName" VARCHAR(40),
...
)
CREATE INDEX "ix_Chain_rName" ON "Chain" ("rName")
I've heard of sqlalchemy-migrate, but that seems to be built around static changes and I'm looking to dynamically generate schema-changes.
(I'm not interested in defending this design, I'm just looking for a dialect-portable way to add a column to an existing table.)
After tracing engine.create_all() with a debugger I've discovered a possible answer:
>>> engine.dialect.ddl_compiler(
... engine.dialect,
... DBChain.__table__.c.rName ) \
... .get_column_specification(
... DBChain.__table__.c.rName )
'"rName" VARCHAR(40)'
The index can be created with:
sColumnElement = DBChain.__table__.c.rName
if sColumnElement.index:
sIndex = sa.schema.Index(
"ix_%s_%s" % (rTableName, sColumnElement.name),
sColumnElement,
unique=sColumnElement.unique)
sIndex.create(engine)

linq-to-sql How can I get a few rows that don't match my existing rows?

I have a few rows of data pulled into business objects via linq-to-sql from large tables.
Now I want to get a few rows that don't match to test my comparison functions.
Using what I thought would work I get a NotSupportedException:
Local sequence cannot be used in LINQ to SQL implementation of query operators except the Contains() operator.
Here's the code:
//This table has a 2 field primary key, the other has a single
var AllNonMatches = from c in dc.Acaps
where !Matches.Rows.Any((row) => row.Key.Key == c.AppId & row.Key.Value == c.SeqNbr)
select c;
foreach (var item in AllNonMatches.Take(100)) //Exception here
{}
The table has a compound primary key: AppId and SeqNbr.
The Matches.Rows is defined as a dictionary of keyvaluepair(appid,seqnbr).
and the local sequence it is referring to appears to be the local dictionary.
Could you provide more information on the structure and the name(s) of the table(s) plz?
Not sure what you're trying to do...
edit:
Ok.. I think I get it now...
It appears you can't merge/join local tables (dictionary) with a SQL table.
If you can, I'm afraid I don't know how to do it.
The simplest solution I can think of is to put those results in a table ("Match" for instance) with foreign keys related to your table "Acaps" and then use linq-to-sql, like:
var AllNonMatches = dc.Acaps.Where(p=>p.Matchs==null).Take(100).ToList();
Sorry I couldn't come up with any better =(
What about this:
var AllNonMatches = from c in dc.Acaps
where !(Matches.Rows.ContainsKey(c.AppId) && Matches.Rows.ContainsValue(c.SeqNbr))
select c;
That will work fine. I have also used a bitwise AND operator (&&) - I think thats the right term to help improve performance over the standard AND operator.