I have a model
class Foo(models.Model):
first = models.CharField()
second = models.CharField()
data I have is
first second
1 2
1 2
1 2
3 4
Now I want to delete all duplicate rows and keep one entry. The end result
first second
1 2
3 4
How do I do this? I checked this question but could not figure it out properly. Annotate
I have tried
foo_ids = Foo.objects.annotate(first_c=Count('first'), second_c=Count('second')).filter(first_c__gt=1, second_c__gt=1).values('first', 'second', 'id')
Then try and figure out how to not delete one of each I list of duplicates.
I ended up doing this.
from django.db.models import Count
duplicate_foo = Foo.objects.values('req_group','child_program__id').annotate(id_c=Count('id')).filter(id_c__gt=1)
for dups in duplicate_foo:
for i, val in enumerate(Foo.objects.filter(first=dups['first'],
second=dups['second'])):
if i ==0:
continue
val.delete()
Not the most optimzed solution. But it works.
It's an older thread, but both answers don't fit the bill for large datasets and lead to a huge number of queries.
You can use this generic method instead:
from django.apps import apps
from django.db.models import Count, Max
def get_duplicates_from_fields(app, model, fields, delete=False, **filters):
"""
Returns duplicate records based on a list of fields,
optionally deletes duplicate records.
"""
Model = apps.get_model(app, model)
duplicates = (Model.objects.filter(**filters).values(*fields)
.order_by()
.annotate(_max_id=Max('id'), _count=Count('id'))
.filter(_count__gt=1))
for duplicate in duplicates:
if delete:
(
Model.objects
.filter(**filters)
.filter(**{x: duplicate[x] for x in fields})
.exclude(id=duplicate['_max_id'])
.delete()
)
else:
print(duplicate)
You can use this method as such:
get_duplicates_from_fields('myapp', 'Foo', ['first', 'second'], True)
This lets you find and delete duplicate records based on any number of fields.
Related
I have a very simple and small db table with 2 columns only: Id and name.
Given the id, I have to find and return the corresponding name. I want to save the contents in a dictionary and look for the corresponding value from there instead of querying the database each time.
I have came across this How to convert SQL query results into a python dictionary but they are aiming to save each row as a dict, whilst I think I do not need a list of dictionaries just a dict with key value pairs.
def get_name(db_conn):
cursor = db_conn.cursor()
cursor.execute("SELECT id, name FROM table")
rows = cursor.fetchall()
d = {}
for row in rows:
id = row[0]
name = row[1]
d[id]= name
print(d)
What would be the best approach given the task I have?
Dictionary comprehensions should be straight forward enough...
d = { row[0]: row[1] for row in cursor.fetchall() }
or...
d = { id: name for (id, name) in cursor.fetchall() }
This processes every row in cursor.fetchall() and yields key: value pairs directly into a dictionary.
It's very similar to your loop, but the looping is done in C rather than natively in python.
In sqlalchemy, how to build filter condition based on user selections? This is what I tried, but doesn't seem to work
conditions =[]
if input.userid:
conditions.append( userdata.uid == input.userid)
if input.location:
conditions.append( userdata.location.like(f"{input.location}%") )
if input.username:
conditions.append( userdata.username.like(f"%{input.username}%") )
So based on user inputs, there may be 1, 2 or 3 filter conditions in the and_ operator. Below is my query
records = session.query(userdata).filter(and_(conditions)).all()
Or is it better to use from sqlalchemy.sql import text and generate a normal SQL query?
I like to do it like this-
query = session.query(userdata)
if input.userid:
query = query.filter(userdata.uid==input.userid)
if input.location:
query = query.filter(userdata.location.like(f"{input.location}%"))
records = query.all()
In the usual Customer with Orders example, this kind of SQLAlchemy code...
data = db.query(Customer)\
.join(Order, Customer.id == Order.cst_id)\
.filter(Order.amount>1000)
...would provide instances of the Customer model that are associated with e.g. large orders (amount > 1000). The resulting Customer instances would also include a list of their orders, since in this example we used backref for that reason:
class Order:
...
customer = relationship("customers", backref=backref('orders'))
The problem with this, is that iterating over Customer.orders means that the DB will return complete instances of Order - basically doing a 'select *' on all the columns of Order.
What if, for performance reasons, one wants to e.g. read only 1 field from Order (e.g. the id) and have the .orders field inside Customer instances be a simple list of IDs?
customers = db.query(Customer)....
...
pdb> print customers[0].orders
[2,4,7]
Is that possible with SQLAlchemy?
What you could do is make a query this way:
(
session.query(Customer.id, Order.id)
.select_from(Customer)
.join(Customer.order)
.filter(Order.amount > 1000)
)
It doesn't produce the exact result as what you have asked, but it gives you a list of tuples which looks like [(customer_id, order_id), ...].
I am not entirely sure if you can eagerly load order_ids into Customer object, but I think it should, you might want to look at joinedload, subqueryload and perhaps go through the relationship-loading docs if that helps.
In this case it works you could write it as;
(
session.query(Customer)
.select_from(Customer)
.join(Customer.order)
.options(db.joinedload(Customer.orders))
.filter(Order.amount > 1000)
)
and also use noload to avoid loading other columns.
I ended up doing this optimally - with array aggregation:
data = db.query(Customer).with_entities(
Customer,
func.ARRAY_AGG(
Order.id,
type_=ARRAY(Integer, as_tuple=True)).label('order_ids')
).outerjoin(
Orders, Customer.id == Order.cst_id
).group_by(
Customer.id
)
This returns tuples of (CustomerEntity, list) - which is exactly what I wanted.
I want to insert multi records into my belongs table, while the records are selected from two tables by SQLFORM.grid.
my table:
db.define_table('problem',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('tasks',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('belongs',
Field('task_id','reference tasks'),
Field('problem_id','reference problem'))
select some records from problem table and select one record from tasks table then insert into belongs table. Can it be realized by SQLFORM.grid ?
def problemtask():
form=SQLFORM.grid(db.problem,selectable =lambda ids:insert(ids,ids1))
form1=SQLFORM.grid(db.tasks,selectable = lambda ids1 :insert(ids,ids1) )
return dict(form=form,form1=form1)
def insert(ids,ids1):
thanks!
Select one record from one table then select some records from another table,last insert the combination into the third table.
def showtask():
id=request.args(0,cast=int)#id为course_id
db.tasks._common_filter = lambda query: db.tasks.course_id ==id
links=[lambda row:A('createproblem',_href=URL("default","addproblem",args=[row.id])),
lambda row:A('showproblem',_href=URL("default","showproblem",args=[row.id]))]
form=SQLFORM.smartgrid(db.tasks,args=request.args[:1],links=links,linked_tables=[''],csv=False)
return dict(form=form)
def mulassignproblem():
taskid=request.args(0,cast=int)
form=SQLFORM.grid(db.problem,args=request.args[:1],selectable = lambda ids :mulproblem(ids,taskid))
return dict(form=form)
def mulproblem(ids,taskid):
problemids=ids
taskids=taskid
for problemid in problemids:
if db((db.belongs.problem_id==problemid)&(db.belongs.task_id==taskids)).count():
pass
else:
db.belongs.insert(task_id=taskid,problem_id=problemid)
I was wondering if it's possible to delete some random rows from a Query Object before doing a bulk update.
Example:
writerRes = self.session.query(table)
writerRes = writerRes.filter(table.userID==3)
-> Delete some of the rows randomly
writerRes.update({"userID": 4})
Is there an easy way to do that?
Selecting random row with SA depends on the database. Based on that answer.
Postgresql and Sqlite3:
number_of_random_rows = 3
rand_rows = session.query(table.userid).order_by(func.random()).limit(number_of_random_rows).subquery()
session.query(table).filter(table.userid.in_(rand_rows)).delete(synchronize_session='fetch')
MySQL:
number_of_random_rows = 3
rand_rows = session.query(table.userid).order_by(func.rand()).limit(number_of_random_rows).subquery()
session.query(table).filter(table.userid.in_(rand_rows)).delete(synchronize_session='fetch')
...