Remove duplicate entries in peewee

Remove duplicate entries in peewee - duplicates

I have a quick function that I threw up together to remove duplicates on my table given a particular combination of fields:
for l in table.select():
if table.select().where((table.Field1==l.Field1) & (table.Field2==l.Field2) & ....).count()>1:
l.delete()
l.save()
But I imagine that there's a better way to do this

You could add a unique constraint on the columns you wish to be unique, then let the database enforce the rules for you. That'd be the best way.
For peewee, that looks like:
class MyModel(Model):
first_name = CharField()
last_name = CharField()
dob = DateField()
class Meta:
indexes = (
(('first_name', 'last_name', 'dob'), True),
)
Docs: http://docs.peewee-orm.com/en/latest/peewee/models.html#indexes-and-unique-constraints

Related

joinedload and load_only but with filtering

I have two models with a simple FK relationship, Stock and Restriction (Restriction.stock_id FK to Stock).
class Restriction(Model):
__tablename__ = "restrictions"
id = db.Column(db.Integer, primary_key=True)
stock_id = FK("stocks.id", nullable=True)
name = db.Column(db.String(50), nullable=False)
class Stock(Model):
__tablename__ = "stocks"
id = db.Column(db.Integer, primary_key=True)
ticker = db.Column(db.String(50), nullable=False, index=True)
I would like to retrieve Restriction object and related Stock but only Stock's ticker (there are other fields omitted here). I can simply do this with:
from sqlalchemy.orm import *
my_query = Restriction.query.options(
joinedload(Restriction.stock).load_only(Stock.ticker)
)
r = my_query.first()
I get all columns for Restriction and only ticker for Stocks with above. I can see this in the SQL query run and also I can access r.stock.ticker and see no new queries run as it is loaded eagerly.
The problem is I cannot filter on stocks now, SQLAlchemy adds another FROM clause if I do my_query.filter(Stock.ticker == 'GME'). This means there is a cross product and it is not what I want.
On the other hand, I cannot load selected columns from relationship using join. ie.
Restriction.query.join(Restriction.stock).options(load_only(Restriction.stock))
does not seem to work with relationship property. How can I have a single filter and have it load selected columns from relationship table?
SQL I want run is:
SELECT restrictions.*, stocks.ticker
FROM restrictions LEFT OUTER JOIN stocks ON stocks.id = restrictions.stock_id
WHERE stocks.ticker = 'GME'
And I'd like to get a Restriction object back with its stock and stock's ticker. Is this possible?

joinedload basically should not be used with filter. You probably need to take contains_eager option.
from sqlalchemy.orm import *
my_query = Restriction.query.join(Restriction.stock).options(
contains_eager(Restriction.stock).load_only(Stock.ticker)
).filter(Stock.ticker == 'GME')
r = my_query.first()
Because you are joining using stock_id it will also be in the results as Stock.id beside Stock.ticker. But other fields would be omitted as you wish.
I have written short post about it recently if you are interested: https://jorzel.hashnode.dev/an-orm-can-bite-you

Joined selection from two tables

I'm using Django with a mysql database.
I have the table Question that contains the fields: id, text, section_id and I have the table CompletedQuestion that has the fields: id, question_id where the field question_id is foreign key to Question.id.
My models.py contains:
class Question(mixins.OrdMixin, mixins.EqMixin, models.Model):
section = models.ForeignKey('Section',
on_delete=models.CASCADE,
related_name='owner')
text = models.TextField()
class CompletedQuestion(models.Model):
question = models.ForeignKey('Question',
on_delete=models.CASCADE,
related_name='question_source')
I want to check whether there are completed questions in CompletedQuestion where belong to a specific section_id from Question.
My current query is the following but it's not the proper one:
quest_id = Question.objects.filter(section_id = section_id)

There is the __isnull=True|False filter you can use to check if any related model exists, I dont really understand what you mean but something like:
Question.objects.filter(section_id=section_id, question_source__isnull=False)
Or come from the other direction like:
CompletedQuestion.objects.filter(question__section_id=section_id) \
.values_list("question_id",flat=True).distinct()
To get a list of question-IDs that have any related CompletedQuestions

Django ORM: Tried to do inner join with foreign key but causes FieldError

I am new to django orm.
I've tables look like this.
class Product(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
name = models.CharField(max_length=60)
class ProductOption(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4)
product_id = models.ForeignKey(Product, on_delete=models.CASCADE, null=True, blank=True)
I would like to query productoption id that related to product. I made query like this to do inner join.
Query = Product.select_related(‘product_id’).filter(name='a')
And it gaves me error message saying
django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'product_id'. Choices are: (none)
I want to know if there is something wrong in models or query.

Use prefetch_related
Product.objects.filter(name='a').prefetch_related('productoption_set')

This is not how you query a related object. Since you used a foreign key and if I understand correctly, you probably want to use something like this:
Product.objects.filter(name='a').productoption_set.all()

Two "one-to-one" references to same table in SQLAlchemy

Suppose I am modelling postal address changes. I'd like each AddressChange to have a before relationship to an Address, as well as an after relationship to another Address. And I'd like a reference back from the Address to the AddressChange it is associated with.
class AddressChange(Base):
__tablename__ = 'AddressChanges'
id = Column(Integer, primary_key=True)
before_id = Column(Integer, ForeignKey('Addresses.id'))
before = relationship('Address', foreign_keys=before_id, uselist=False,
back_populates='change')
after_id = Column(Integer, ForeignKey('Addresses.id'))
after = relationship('Address', foreign_keys=after_id, uselist=False,
back_populates='change')
class Address(Base):
__tablename__ = 'Addresses'
id = Column(Integer, primary_key=True)
street, city, etc = Column(String), Column(String), Column(String)
change = relationship('AddressChange')
However SQLAlchemy complains:
Could not determine join condition between parent/child tables on relationship Address.change - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.
My Address does not have a foreign key reference to the parent table, and it's not clear to me why it should need one. If I add one, I get
Address.change and back-reference AddressChange.before are both of the same direction symbol('MANYTOONE'). Did you mean to set remote_side on the many-to-one side ?
Which is starting to get confusing, because the documentation for remote_side is for "self-referential relationships."

Thanks to #alex-grönholm for helping on #sqlalchemy.
This can be solved by adding a primaryjoin parameter to Address's side of the relationship to teach it how to map back to the parent AddressChange:
change = relationship('AddressChange', uselist=False,
viewonly=True,
primaryjoin=or_(
AddressChange.before_id == id,
AddressChange.after_id == id
))

Does the order of index_together matter in a Django model?

When defining a model with an index_together Meta property, does the order of the columns matter?
In other words, is there a difference between
Class myModel(models.Model):
name = models.CharField(max_length=100)
address = models.CharField(max_length=100)
favorite_color = models.CharField(max_length=20)
class Meta:
index_together = ('name', 'address', 'favorite_color')
vs
Class myModel(models.Model):
name = models.CharField(max_length=100)
address = models.CharField(max_length=100)
favorite_color = models.CharField(max_length=20)
class Meta:
index_together = ('favorite_color', 'name', 'address')
I only ask because I've noticed when looking at the structure of a table, each column in the key has an "index in key" property. Does MySQL/PostgreSQL expect the columns to be queried in that order?
Just as an aside, is there a great deal of difference between indexing the columns together vs separately?

The order of index_together explains the "path" the index is created.
You can query from left to the right to profit from the index.
So with your first index_together:
index_together = ('name', 'address', 'favorite_color')
if your first filter is name the index is used. If the first is name and the second is address the index is used, too.
But if you filter by address and then name or address, favorite_color the index can't be used.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Remove duplicate entries in peewee - duplicates

Related

joinedload and load_only but with filtering

Joined selection from two tables

Django ORM: Tried to do inner join with foreign key but causes FieldError

Two "one-to-one" references to same table in SQLAlchemy

Does the order of index_together matter in a Django model?

Categories

Resources