Django Combining AND and OR Queries with ManyToMany Field - mysql

hoping someone can help me out with this.
I'm trying to figure out whether I can construct a query that will allow me to retrieve items from my db based on a ForeignKey field and a ManyToManyField at the same time. The challenging part is that it will need to filter on multiple ManyToMany objects.
An example will hopefully make this clearer. Here are my (simplified) models:
class Item(models.Model):
name = models.CharField(max_length=200)
brand = models.ForeignKey(User, related_name='brand')
tags = models.ManyToManyField(Tag, blank=True, null=True)
def __unicode__(self):
return self.name
class Meta:
ordering = ['-id']
class Tag(models.Model):
name = models.CharField(max_length=64, unique=True)
def __unicode__(self):
return self.name
I would like to build a query that retrieves items based on two criteria:
Items that were uploaded by users that a user is following (called 'brand' in the model). So for example if a user is following the Paramount user account, I would want all items where brand = Paramount.
Items that match the keywords in saved searches. For example the user could make and save the following search: "80s comedy". In this case I would want all items where the tags include both "80s" and "comedy".
Now I know how to construct the query for each independently. For #1, it's:
items = Item.objects.filter(brand=brand)
And for #2 (based on the docs):
items = Item.objects.filter(tags__name='80s').filter(tags__name='comedy')
My question is: is it possible to construct this as a single query so that I don't have to take the hit of converting each query into a list, joining them, and removing duplicates?
The challenge seems to be that there is no way to use Q objects to construct queries where you need an item's manytomany field (in this case tags) to match multiple values. The following query:
items = Item.objects.filter(Q(tags__name='80s') & Q(tags__name='comedy'))
does NOT work.
(See: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)
Thanks in advance for your help on this!

After much research I could not find a way to combine this into a single query, so I ended up converting my QuerySets into lists and combining them.

Django's filters automatically AND. Q objects are only needed if you're trying to add ORs. Also, the __in query filter will help you out alot.
Assuming users have multiple brands they like and you want to return any brand they like, you should use:
`brand__in=brands`
Where brands is the queryset returned by something like someuser.brands.all().
The same can be used for your search parameters:
`tags__name__in=['80s','comedy']`
That will return things tagged with either '80s' or 'comedy'. If you need both (things tagged with both '80s' AND 'comedy'), you'll have to pass each one in a successive filter:
keywords = ['80s','comedy']
for keyword in keywords:
qs = qs.filter(tags__name=keyword)
P.S. related_name values should always specify the opposite relationship. You're going to have logic problems with the way you're doing it currently. For example:
brand = models.ForeignKey(User, related_name='brand')
Means that somebrand.brand.all() will actually return Item objects. It should be:
brand = models.ForeignKey(User, related_name='items')
Then, you can get a brands's items with somebrand.items.all(). Makes much more sense.

Related

Django Serialize to Json from Super class

I'm trying to figure out if there is any efficient way to serialize a queryset from superclass. My models:
class CampaignContact(models.Model):
campaign = models.ForeignKey(Campaign, related_name="campaign_contacts", null=False)
schedule_day = models.DateField(null=True)
schedule_time = models.TimeField(null=True)
completed = models.BooleanField(default=False, null=False)
class CampaignContactCompany(CampaignContact):
company = models.ForeignKey(Company, related_name='company_contacts', null=False)
class CampaignContactLead(CampaignContact):
lead = models.ForeignKey(Lead, related_name='lead_contacts' ,null=False)
I want to create a json with all campaign contacts may it be leads' or companys'
Django has a built in serializer documented here but it might not work as well considering how you structured your models:
from django.core import serializers
data = serializers.serialize("json", CampaignContactCompany.objects.all())
I imagine you could run that on both tables and combine the two sets but it would introduce a bit of overhead. You could also create a static to_json method in CampaignContact which takes two query sets from the other two tables and formats/combines them into json.
Maybe you have reason to model your tables as you did but based on observation it looks like you will have 3 tables, one never used and two with only a company and lead field different which is probably not ideal. Typically when relating a record to multiple objects you would simply put the lead and company field on the CampaignContact table and let them be null. To get only company contacts you could query company_contacts = CampaignContact.objects.filter(company__isnull=False).all()

Django GenereicForeignKey v/s custom manual fields performance/optimization

I'm trying to build a typical social networking site. there are two types of objects mainly.
photo
status
a user can like photo and status. (Note that these two are mutually exclusive)
means, We have two table (1) for Image only and other for status only.
now when a user likes an object(it could be a photo or status) how should I store that info.
I want to design a efficient SQL schema for this.
Currently I'm using Genericforeignkey(GFK)
class LikedObject(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
but yesterday I thought if I can do this without using GFK efficiently?
class LikedObject(models.Model):
OBJECT_TYPE = (
('status', 'Status'),
('image', 'Image/Photo'),
)
user = models.ForeignKey(User, related_name="liked_objects")
obj_id = models.PositiveIntegerField()
obj_type = models.CharField(max_length=63, choices=OBJECT_TYPE)
the only difference I can understand is that I have to make two queries if I want to get all liked_status of a particular user
status_ids = LikedObject.objects.filter(user=user_obj, obj_type='status').values_list('object_id', flat=True)
status_objs = Status.objects.filter(id__in=status_ids)
Am I correct? so What would be the best approach in terms of easy querying/inserting or performance, etc.
You are basically implementing your own Generic Object, only you limit your ContentType to your hard coded OBJECT_TYPE.
If you are only going to access the database as in your example (get all status objects liked by user x), or a couple specific queries, then your own implementation can be a little faster, of course. But obviously, if later you have to add more objects, or do other things, you may find yourself implementing your whole full generic solution. And like they say, why reinvent the wheel.
If you want better performance, and really only have those two Models to worry about, you may just want to have two different Like tables (StatusLike and ImageLike) and use inheritance to share functionality.
class LikedObject(models.Model):
common_field = ...
class Meta:
abstract = True
def some_share_function():
...
class StatusLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="status_liked_objects")
status = models.ForeignKey(Status, related_name="liked_objects")
class ImageLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="image_liked_objects")
image = models.ForeignKey(Image, related_name="liked_objects")
Basically, either you have a lot of Models to worry about, and then you probably want to use the more Django generic object implementation, or you only have two models, and why even bother with a half generic solution. Just use two tables.
In this case, I would check if your data objects Status and Photo may have many common data fields, e.g. Status.user and Photo.user, Status.title and Photo.title, Status.pub_date and Photo.pub_date, Status.text and Photo.caption, etc.
Could you combine them into an Item object maybe? That Item would have a Item.type field, either "photo" or "status"? Then you would only have a single table and a single object type a user can "like". Much simpler at basically no cost.
Edit:
from django.db import models
from django.utils.timezone import now
class Item(models.Model):
data_type = models.SmallIntegerField(
choices=((1, 'Status'), (2, 'Photo')), default=1)
user = models.ForeignKey(User)
title = models.CharField(max_length=100)
pub_date = models.DateTimeField(default=now)
...etc...
class Like(models.Model):
user = models.ForeignKey(User, related_name="liked_objects")
item = models.ForeignKey(Item)

Django - how to order a sequence of videos within a topic?

I am putting together a video library in django. So far I have a few courses, each subgrouped to a topic, then each topic has 10-20 videos each.
I would like to provide ordering for each grouping of videos within a topic. So when a user goes through the lessons there is a specific sequence to the videos.
My challenge is how to store the ordering of videos in my database backend (mysql) and further how to look up and retrieve the order? I also want to have (previous) and (next) links so a user can easily click through the sequence.
At first I was thinking I would just have a "sequence" int field on my video model, then look that up and simply find the next or previous one. However, what if a number is skipped when entering the sequence? I'd like to enforce the numbering in each sequence too.
Here's some code so far:
class Video(models.Model):
title = models.CharField(max_length=200)
description = models.CharField(max_length=500, blank=True)
topic = models.ForeignKey('VideoTopic')
sequence = models.IntegerField(default=1, null=True, help_text="(Sequence within the topic)")
class VideoTopic(models.Model):
name = models.CharField(max_length=200, blank=True)
description = models.TextField(blank=True)
course = models.ForeignKey('course.Course')
While this somewhat fulfills the base requirements of sequencing, I don't see how to enforce the ordering, and I'm also having difficulty looking up my previous/next videos.
I'm thinking I need to store sequence in a separate table, but I can't quite conceive how this should be done. Something like:
class VideoSequence(models.Model)
topic = models.ForeignKey('VideoTopic')
sequence = models.IntegerField()
video = models.ForeignKey('Video')
If I use the above separate table, how can I go about looking up the sequence in django?
SIMPLE ANSWER:
https://github.com/iambrandontaylor/django-admin-sortable
LESS SIMPLE ANSWER:
The standard way to do this is what you already have, with a numeric ordering field. If you want to enforce consecutive order indices you could use a signal to 'reindex' the sequences each time one is changed.
from django.db.models.signals import post_save
def update_sequence(sender, instance=False, created, **kwargs):
qs = self.topic.video_set.all().order_by('sequence')
counter = 1
for video in qs:
video.sequence = counter
video.save()
counter += 1
post_save.connect(update_sequence, sender=Video)
Then to get the next/previous item in the sequence:
models.py:
class Video(models.Model):
....
class Meta:
order_by = 'sequence'
def get_next(self):
return self.__class__.filter(sequence__gt=self.sequence)[0]
def get_prev(self):
return self.__class__.filter(sequence__lt=self.sequence)[-1]

sqlalchemy relations and query on relations

Suppose I have 3 tables in sqlalchemy. Users, Roles and UserRoles defined in declarative way. How would one suggest on doing something like this:
user = Users.query.get(1) # get user with id = 1
user_roles = user.roles.query.limit(10).all()
Currently, if I want to get the user roles I have to query any of the 3 tables and perform the joins in order to get the expected results. Calling directly user.roles brings a list of items that I cannot filter or limit so it's not very helpful. Joining things is not very helpful either since I'm trying to make a rest interface with requests such as:
localhost/users/1/roles so just by that query I need to be able to do Users.query.get(1).roles.limit(10) etc etc which really should 'smart up' my rest interface without too much bloated code and if conditionals and without having to know which table to join on what. Users model already has the roles as a relationship property so why can't I simply query on a relationship property like I do with normal models?
Simply use Dynamic Relationship Loaders. Code below verbatim from the documentation linked to above:
class User(Base):
__tablename__ = 'user'
posts = relationship(Post, lazy="dynamic")
jack = session.query(User).get(id)
# filter Jack's blog posts
posts = jack.posts.filter(Post.headline=='this is a post')
# apply array slices
posts = jack.posts[5:20]

Django: retrieve distinct QuerySet

I've got the following models in my app. The Addition model is used to govern the many-to-many relationship between the Book model and the Collection model, since I need to include extra fields on the intermediate model.
class Book(models.Model):
name = models.CharField(max_length=200)
picture = models.ImageField(upload_to='img', max_length=1000)
price = models.DecimalField(max_digits=8, decimal_places=2)
class Collection(models.Model):
user = models.ForeignKey(User)
name = models.CharField(max_length=100)
books = models.ManyToManyField(Book, through='Addition')
subscribers = models.ManyToManyField(User, related_name='collection_subscriptions', blank=True, null=True)
class Addition(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
collection = models.ForeignKey(Collection)
created = models.DateTimeField(auto_now=False, auto_now_add=True)
updated = models.DateTimeField(auto_now=True, auto_now_add=True)
In my app users can add books to collections that they create (for example fiction, history, etc.). Other users can then follow those collections that they like.
When a user logs into the site, I'd like to display all of the books that have been recently added to the collections that they follow. With each book, I'd also like to display the name of the person who added it, and the name of the collection it's in.
I can get all of the additions as follows...
additions = Addition.objects.filter(collection__subscribers=user).select_related()
... but this results in duplicate books being retrieved and displayed to the user, often side by side.
If there a way to retrieve a distinct list of books that are in collections the user is following?
I'm using Django 1.3 + MySQL.
Thanks.
UPDATE
I should add that in general I'm not looking for any 'loop through the results and de-duplicate that way' solutions, for a couple of reasons.
There are likely to be tens or even hundreds of thousands of additions (I am also displaying this information on pages that list all new additions added by users), and response time is extremely important.
This solution may become more practical when limiting the initial result set, but it creates problems with pagination, which is also required. Namely how do you paginate the entire result set while also de-duplicating only a small portion of that set. I'm open to any ideas here that may solve this problem.
UPDATE
I should also mention that if the same book gets added by multiple users, I actually don't have a preference for which addition gets used, either the original or the most recent addition would work fine.
How about the following - it's not a pure SQL solution, and it'll cost you an extra database query and some loop time, but it should still perform ok, and it'll give you a lot more control over which additions take precedence over others:
def filter_additions(additions):
# Use a ValuesQuerySet for performance
additions_values = additions.values()
# The following code just eliminates duplicates. You could do
# something much more powerful/interesting here if you like,
# e.g. give preference to additions by a user`s friends
book_pk_registry = {}
excluded_addition_pks = []
for addition in additions_values:
addition_pk = addition['id']
book_pk = addition['book_id']
if book_pk not in book_pk_registry:
book_pk_registry[book_pk] = True
else:
excluded_addition_pks.append(addition_pk)
additions = additions.exclude(pk__in=excluded_addition_pks)
additions = Addition.objects.filter(collection__subscribers=user)
additions = filter_additions(additions)
If there are likely to be more than a thousand or so books involved, you may want to put a limit on the initial additions query. Passing massive lists of ids over in the exclude isn't such a great idea. Using 'values()' is quite important, because Python can cycle through a basic list of dicts a LOT faster than a queryset and it uses a lot less memory.
Assuming there won`t be huge amounts of additions to display, this could easily to the trick:
# duplicated..
additions = Addition.objects.filter(collection__subscribers=user, created__gt=DATE_LAST_LOGIN).select_related()
# remove duplication
added_books = {}
for addition in additions:
added_books[addition.book] = True
added_books = added_books.keys()
By the description you gave of the problem, performance would not be a problem.
additions = Addition.objects.filter(collection__subscribers=user).values('book').annotate(user=Min('user'), collection=Min('collection')).order_by()
This query will give you list of unique books with their users and collections. Books, collections, users will be pk's, not objects. But I hope you will store them in cache so that won't be a problem.
But for your workload I'd think about denormalization. My query is very heavy, and it isn't easy to cache its results if you will have frequent additions. My first approach will be to add latest_additions field to Collection model and to update with signals (not adding duplicates). The format of this field is up to you.
Sometimes it's OK to drop into SQL, especially when the ORM-only solution is not performant. It's easy to get the non-duplicate Addition row IDs in SQL, and then you can switch back to the ORM to select the data. It's two queries, but will outperform any of the single query solutions I've seen so far.
from django.db import connection
from operator import itemgetter
cursor = connection.cursor()
# Select non-duplicate book additions, preferring for most recently updated
query = '''SELECT id, MAX(updated) FROM %s
GROUP BY book_id''' % Addition._meta.db_table
cursor.execute(query)
# Flatten the results to an id list
addition_ids = map(itemgetter(0), cursor.fetchall())
additions = Addition.objects.filter(
collection__subscribers=user, id__in=addition_ids).select_related()