How to delete rows with duplicate columns in django

How to delete rows with duplicate columns in django - mysql

I have a model
class Foo(models.Model):
first = models.CharField()
second = models.CharField()
data I have is
first second
1 2
1 2
1 2
3 4
Now I want to delete all duplicate rows and keep one entry. The end result
first second
1 2
3 4
How do I do this? I checked this question but could not figure it out properly. Annotate
I have tried
foo_ids = Foo.objects.annotate(first_c=Count('first'), second_c=Count('second')).filter(first_c__gt=1, second_c__gt=1).values('first', 'second', 'id')
Then try and figure out how to not delete one of each I list of duplicates.

I ended up doing this.
from django.db.models import Count
duplicate_foo = Foo.objects.values('req_group','child_program__id').annotate(id_c=Count('id')).filter(id_c__gt=1)
for dups in duplicate_foo:
for i, val in enumerate(Foo.objects.filter(first=dups['first'],
second=dups['second'])):
if i ==0:
continue
val.delete()
Not the most optimzed solution. But it works.

It's an older thread, but both answers don't fit the bill for large datasets and lead to a huge number of queries.
You can use this generic method instead:
from django.apps import apps
from django.db.models import Count, Max
def get_duplicates_from_fields(app, model, fields, delete=False, **filters):
"""
Returns duplicate records based on a list of fields,
optionally deletes duplicate records.
"""
Model = apps.get_model(app, model)
duplicates = (Model.objects.filter(**filters).values(*fields)
.order_by()
.annotate(_max_id=Max('id'), _count=Count('id'))
.filter(_count__gt=1))
for duplicate in duplicates:
if delete:
(
Model.objects
.filter(**filters)
.filter(**{x: duplicate[x] for x in fields})
.exclude(id=duplicate['_max_id'])
.delete()
)
else:
print(duplicate)
You can use this method as such:
get_duplicates_from_fields('myapp', 'Foo', ['first', 'second'], True)
This lets you find and delete duplicate records based on any number of fields.

Related

MySQL dump a table or saving table contents as a python dictionary

I have a very simple and small db table with 2 columns only: Id and name.
Given the id, I have to find and return the corresponding name. I want to save the contents in a dictionary and look for the corresponding value from there instead of querying the database each time.
I have came across this How to convert SQL query results into a python dictionary but they are aiming to save each row as a dict, whilst I think I do not need a list of dictionaries just a dict with key value pairs.
def get_name(db_conn):
cursor = db_conn.cursor()
cursor.execute("SELECT id, name FROM table")
rows = cursor.fetchall()
d = {}
for row in rows:
id = row[0]
name = row[1]
d[id]= name
print(d)
What would be the best approach given the task I have?

Dictionary comprehensions should be straight forward enough...
d = { row[0]: row[1] for row in cursor.fetchall() }
or...
d = { id: name for (id, name) in cursor.fetchall() }
This processes every row in cursor.fetchall() and yields key: value pairs directly into a dictionary.
It's very similar to your loop, but the looping is done in C rather than natively in python.

SQLAlchemy build filter condition based on user inputs

In sqlalchemy, how to build filter condition based on user selections? This is what I tried, but doesn't seem to work
conditions =[]
if input.userid:
conditions.append( userdata.uid == input.userid)
if input.location:
conditions.append( userdata.location.like(f"{input.location}%") )
if input.username:
conditions.append( userdata.username.like(f"%{input.username}%") )
So based on user inputs, there may be 1, 2 or 3 filter conditions in the and_ operator. Below is my query
records = session.query(userdata).filter(and_(conditions)).all()
Or is it better to use from sqlalchemy.sql import text and generate a normal SQL query?

I like to do it like this-
query = session.query(userdata)
if input.userid:
query = query.filter(userdata.uid==input.userid)
if input.location:
query = query.filter(userdata.location.like(f"{input.location}%"))
records = query.all()

SQLAlchemy foreign keys mapped to list of ids, not entities

In the usual Customer with Orders example, this kind of SQLAlchemy code...
data = db.query(Customer)\
.join(Order, Customer.id == Order.cst_id)\
.filter(Order.amount>1000)
...would provide instances of the Customer model that are associated with e.g. large orders (amount > 1000). The resulting Customer instances would also include a list of their orders, since in this example we used backref for that reason:
class Order:
...
customer = relationship("customers", backref=backref('orders'))
The problem with this, is that iterating over Customer.orders means that the DB will return complete instances of Order - basically doing a 'select *' on all the columns of Order.
What if, for performance reasons, one wants to e.g. read only 1 field from Order (e.g. the id) and have the .orders field inside Customer instances be a simple list of IDs?
customers = db.query(Customer)....
...
pdb> print customers[0].orders
[2,4,7]
Is that possible with SQLAlchemy?

What you could do is make a query this way:
(
session.query(Customer.id, Order.id)
.select_from(Customer)
.join(Customer.order)
.filter(Order.amount > 1000)
)
It doesn't produce the exact result as what you have asked, but it gives you a list of tuples which looks like [(customer_id, order_id), ...].
I am not entirely sure if you can eagerly load order_ids into Customer object, but I think it should, you might want to look at joinedload, subqueryload and perhaps go through the relationship-loading docs if that helps.
In this case it works you could write it as;
(
session.query(Customer)
.select_from(Customer)
.join(Customer.order)
.options(db.joinedload(Customer.orders))
.filter(Order.amount > 1000)
)
and also use noload to avoid loading other columns.

I ended up doing this optimally - with array aggregation:
data = db.query(Customer).with_entities(
Customer,
func.ARRAY_AGG(
Order.id,
type_=ARRAY(Integer, as_tuple=True)).label('order_ids')
).outerjoin(
Orders, Customer.id == Order.cst_id
).group_by(
Customer.id
)
This returns tuples of (CustomerEntity, list) - which is exactly what I wanted.

Select records from two SQLFORM.grid and insert the records into the third table

I want to insert multi records into my belongs table, while the records are selected from two tables by SQLFORM.grid.
my table:
db.define_table('problem',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('tasks',
Field('title','string',unique=True,length=255),
format = '%(title)s')
db.define_table('belongs',
Field('task_id','reference tasks'),
Field('problem_id','reference problem'))
select some records from problem table and select one record from tasks table then insert into belongs table. Can it be realized by SQLFORM.grid ?
def problemtask():
form=SQLFORM.grid(db.problem,selectable =lambda ids:insert(ids,ids1))
form1=SQLFORM.grid(db.tasks,selectable = lambda ids1 :insert(ids,ids1) )
return dict(form=form,form1=form1)
def insert(ids,ids1):
thanks！

Select one record from one table then select some records from another table,last insert the combination into the third table.
def showtask():
id=request.args(0,cast=int)#id为course_id
db.tasks._common_filter = lambda query: db.tasks.course_id ==id
links=[lambda row:A('createproblem',_href=URL("default","addproblem",args=[row.id])),
lambda row:A('showproblem',_href=URL("default","showproblem",args=[row.id]))]
form=SQLFORM.smartgrid(db.tasks,args=request.args[:1],links=links,linked_tables=[''],csv=False)
return dict(form=form)
def mulassignproblem():
taskid=request.args(0,cast=int)
form=SQLFORM.grid(db.problem,args=request.args[:1],selectable = lambda ids :mulproblem(ids,taskid))
return dict(form=form)
def mulproblem(ids,taskid):
problemids=ids
taskids=taskid
for problemid in problemids:
if db((db.belongs.problem_id==problemid)&(db.belongs.task_id==taskids)).count():
pass
else:
db.belongs.insert(task_id=taskid,problem_id=problemid)

Delete rows from Query Object

I was wondering if it's possible to delete some random rows from a Query Object before doing a bulk update.
Example:
writerRes = self.session.query(table)
writerRes = writerRes.filter(table.userID==3)
-> Delete some of the rows randomly
writerRes.update({"userID": 4})
Is there an easy way to do that?

Selecting random row with SA depends on the database. Based on that answer.
Postgresql and Sqlite3:
number_of_random_rows = 3
rand_rows = session.query(table.userid).order_by(func.random()).limit(number_of_random_rows).subquery()
session.query(table).filter(table.userid.in_(rand_rows)).delete(synchronize_session='fetch')
MySQL:
number_of_random_rows = 3
rand_rows = session.query(table.userid).order_by(func.rand()).limit(number_of_random_rows).subquery()
session.query(table).filter(table.userid.in_(rand_rows)).delete(synchronize_session='fetch')
...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to delete rows with duplicate columns in django - mysql

Related

MySQL dump a table or saving table contents as a python dictionary

SQLAlchemy build filter condition based on user inputs

SQLAlchemy foreign keys mapped to list of ids, not entities

Select records from two SQLFORM.grid and insert the records into the third table

Delete rows from Query Object

Categories

Resources