I'm trying some db schema changes to my db, using the sqlalchemy table.create and sqlalchemy-migrate table.rename methods, plus some insert into select statments. I want to wrap all of this in a transaction. I can't figure out how to do this. This is what I tried:
engine = sqlalchemy.engine_from_config(conf.local_conf, 'sqlalchemy.')
trans = engine.connect().begin()
try:
old_metatadata.tables['address'].rename('address_migrate_tmp', connection=trans)
new_metatadata.tables['address'].create(connection=trans)
except:
trans.rollback()
raise
else:
trans.commit()
But it errors with:
AttributeError: 'RootTransaction' object has no attribute '_run_visitor'
(I tried using sqlalchemy-migrate column.alter(name='newname') but that errors, and does not work in a transaction, and so leaves my db in a broken state. I also need to rename multiple columns, and so I decide to roll my own code.)
Ah - I need to simply use the connection that the transaction was created on.
engine = sqlalchemy.engine_from_config(conf.local_conf, 'sqlalchemy.')
conn = engine.connect()
trans = conn.begin()
try:
old_metatadata.tables['address'].rename('address_migrate_tmp', connection=conn)
new_metatadata.tables['address'].create(bind=conn)
except:
trans.rollback()
raise
else:
trans.commit()
Related
I have a simple FastApi endpoint that connects to a MySQL database using SqlAlchemy (based of the tutorial: https://fastapi.tiangolo.com/tutorial/sql-databases/)
I create a session using:
engine = create_engine(
SQLALCHEMY_DATABASE_URL
)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
I create the dependency:
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
In my route I want to execute an arbitrary SQL statement but I am not sure how to handle session, connection, cursor etc. correctly (including closing) which I learned the hard way is super important for correct performance
#app.get("/get_data")
def get_data(db: Session = Depends(get_db)):
???
Ultimately the reason for this is that my table contains machine learning features with columns that are undetermined beforehand. If there is a way to define a Base model with "all columns" that would work too, but I couldnt find that either.
I solved this using the https://www.encode.io/databases/ package instead. It handles all connections / sessions etc. under the hood. Simplified snippet:
database = databases.Database(DATABASE_URL)
#app.get("/read_db")
async def read_db():
data = await database.fetch_all("SELECT * FROM USER_TABLE")
return data
import pymysql, pandas as pd
engine = create_engine('mysql+pymysql://'+uname+':'+password+'#'+server+':'+port+'/'+db)
con = engine.connect()
df = pd.read_sql('SELECT schema_name FROM information_schema.schemata', con)
return df
I am using celery to archive the async job in python, my code flow is as following:
celery task get some data from remote api
celery beat get the celery task result from celery backend which is redis and then insert the result into redis
but in step 2, before I insert result data into mysql, I check if the data is existed.although I do the check, the duplicate data still be inserted.
my code is as following:
def get_task_result(logger=None):
db = MySQLdb.connect(host=MYSQL_HOST, port=MYSQL_PORT, user=MYSQL_USER, passwd=MYSQL_PASSWD, db=MYSQL_DB, cursorclass=MySQLdb.cursors.DictCursor, use_unicode=True, charset='utf8')
cursor = db.cursor()
....
....
store_subdomain_result(db, cursor, asset_id, celery_task_result)
....
....
cursor.close()
db.close()
def store_subdomain_result(db, cursor, top_domain_id, celery_task_result, logger=None):
subdomain_list = celery_task_result.get('result').get('subdomain_list')
source = celery_task_result.get('result').get('source')
for domain in subdomain_list:
query_subdomain_sql = f'SELECT * FROM nw_asset WHERE domain="{domain}"'
cursor.execute(query_subdomain_sql)
sub_domain_result = cursor.fetchone()
if sub_domain_result:
asset_id = sub_domain_result.get('id')
existed_source = sub_domain_result.get('source')
if source not in existed_source:
new_source = f'{existed_source},{source}'
update_domain_sql = f'UPDATE nw_asset SET source="{new_source}" WHERE id={asset_id}'
cursor.execute(update_domain_sql)
db.commit()
else:
insert_subdomain_sql = f'INSERT INTO nw_asset(domain) values("{domain}")'
cursor.execute(insert_subdomain_sql)
db.commit()
I first select if the data is existed, if the data not existed, I will do the insert, the code is as following:
query_subdomain_sql = f'SELECT * FROM nw_asset WHERE domain="{domain}"'
cursor.execute(query_subdomain_sql)
sub_domain_result = cursor.fetchone()
I do this, but it still insert duplicate data, I can't understand this.
I google this question and some one says use insert ignore or relace into or unique index, but I want to know why the code not work as expectedly?
also, In my opinion, I think if there is some cache in mysql, when I do the select, the data not really into mysql it just in the flush, so the select will return none?
I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create
In one of the projects I am working on, I am using the transactions explicitly as follows:
from django.db import transaction
#transaction.commit_on_succcess
def some_view(request):
""" Renders some view
"""
I am using Django 1.5.5 and in the docs it says:
The recommended way to handle transactions in Web requests is to tie them to the request and response phases via Django’s TransactionMiddleware.
It works like this: When a request starts, Django starts a transaction. If the response is produced without problems, Django commits any pending transactions. If the view function produces an exception, Django rolls back any pending transactions.
To activate this feature, just add the TransactionMiddleware middleware to your MIDDLEWARE_CLASSES setting:
I want to use the transactions on requests instead of tying them to a particular view that requires it but I am a little confused about how this'd work. Say I have a view as follows:
def some_view(request):
""" Creates a user object.
"""
context = {}
first_name = request.POST['first_name']
last_name = request.POST['last_name']
email = request.POST['email']
try:
create_user(first_name, last_name, email)
context['success'] = 'User %s was added to the database.' % (first_name)
except IntegrityError, err:
context['failure'] = 'An error occurred while adding user to the database: %s' % (str(err))
except Exception, err:
context['failure'] = '%s' % (str(err))
return json_response(context)
In the above view we are handling the exceptions and returning a response and in the docs it states:
If the response is produced without problems, Django commits any pending transactions.
Q: Will the transactions be committed in the above mentioned view even if it raises an exception ?
What if we want to create multiple objects in a single request and only want to rollback a single entry that raises an exception but commit all other ? So for example, we read a data from the file and for each row we want to create a user object, we want all the users to be inserted into the database except for the ones that raise an error:
def some_view(request):
""" Creates a user object.
"""
context = {}
data = # Read from file
for row in data:
first_name, last_name, email = row.split(",")
try:
create_user(first_name, last_name, email)
context['success'] = 'User %s was added to the database.' % (first_name)
except IntegrityError, err:
context['failure'] = 'An error occurred while adding user to the database: %s' % (str(err))
except Exception, err:
context['failure'] = '%s' % (str(err))
return json_response(context)
Q: How would the transactions work in this case ? Is it better to explicitly use transactions here ?
Update:
In my case I am using an Inherited model class. Example:
class BaseUser(models.Model)
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
email = models.EmailField(max_length=100, unique=True)
class UserA(BaseUser):
phone = models.BigIntegerField()
type = models.CharField(max_length=32)
So in case I am trying to create a UserA type object using the above view and it raises an exception, The BaseUSer object is created with the given data but UserA type object is not. So, what I am trying to do is to either create the TypeA object or do not commit any changes. Currently I am using transactions manually (as follows) and it seem to work fine, It's just that I want to switch to using transactions on HTTP requests instead.
from django.db import transaction
transaction.commit_on_success()
def some_view(request):
""" Creates a user object.
"""
context = {}
data = # Read from file
for row in data:
first_name, last_name, email = row.split(",")
try:
sid = transaction.savepoint()
create_user(first_name, last_name, email)
context['success'] = 'User %s was added to the database.' % (first_name)
transaction.savepoint_commit()
except IntegrityError, err:
context['failure'] = 'An error occurred while adding user to the database: %s' % (str(err))
transaction.savepoint_rollback()
except Exception, err:
transaction.savepoint_rollback()
context['failure'] = '%s' % (str(err))
return json_response(context)
You don't need transactions at all here. If an IntegrityError is raised, that means the database update couldn't even be done, so there is nothing to roll back.
Transactions are useful if you want to roll back all the updates if a single one fails.
I'm new to SqlAlchemy. We were working primarily with Flask, but in a particular case I needed a manual database connection. So I launched a new db connection with something like this:
write_engine = create_engine("mysql://user:pass#localhost/db?charset=utf8")
write_session = scoped_session(sessionmaker(autocommit=False,
autoflush=False,bind=write_engine))
nlabel = write_session.query(Label).filter(Label.id==label.id).first() # Works
#Later in code
ms = Message("Some message")
write_session.add(ms) # Works fine
write_session.commit() # Errors out
Error looks like "AttributeError: 'SessionMaker' object has no attribute '_model_changes'"
What am I doing wrong?
From the documentation I think you might be missing the initialization of the Session object.
Try:
Session = scoped_session(sessionmaker(autocommit=False, autoflush=False,bind=write_engine))
write_session = Session()
It's a shot in the dark- I'm not intimately familiar with SQLAlchemy. Best of luck!
Your issue is that you are missing this line:
db_session._model_changes = {}