sqlalchemy relations and query on relations - sqlalchemy

Suppose I have 3 tables in sqlalchemy. Users, Roles and UserRoles defined in declarative way. How would one suggest on doing something like this:
user = Users.query.get(1) # get user with id = 1
user_roles = user.roles.query.limit(10).all()
Currently, if I want to get the user roles I have to query any of the 3 tables and perform the joins in order to get the expected results. Calling directly user.roles brings a list of items that I cannot filter or limit so it's not very helpful. Joining things is not very helpful either since I'm trying to make a rest interface with requests such as:
localhost/users/1/roles so just by that query I need to be able to do Users.query.get(1).roles.limit(10) etc etc which really should 'smart up' my rest interface without too much bloated code and if conditionals and without having to know which table to join on what. Users model already has the roles as a relationship property so why can't I simply query on a relationship property like I do with normal models?

Simply use Dynamic Relationship Loaders. Code below verbatim from the documentation linked to above:
class User(Base):
__tablename__ = 'user'
posts = relationship(Post, lazy="dynamic")
jack = session.query(User).get(id)
# filter Jack's blog posts
posts = jack.posts.filter(Post.headline=='this is a post')
# apply array slices
posts = jack.posts[5:20]

Related

Django GenereicForeignKey v/s custom manual fields performance/optimization

I'm trying to build a typical social networking site. there are two types of objects mainly.
photo
status
a user can like photo and status. (Note that these two are mutually exclusive)
means, We have two table (1) for Image only and other for status only.
now when a user likes an object(it could be a photo or status) how should I store that info.
I want to design a efficient SQL schema for this.
Currently I'm using Genericforeignkey(GFK)
class LikedObject(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
but yesterday I thought if I can do this without using GFK efficiently?
class LikedObject(models.Model):
OBJECT_TYPE = (
('status', 'Status'),
('image', 'Image/Photo'),
)
user = models.ForeignKey(User, related_name="liked_objects")
obj_id = models.PositiveIntegerField()
obj_type = models.CharField(max_length=63, choices=OBJECT_TYPE)
the only difference I can understand is that I have to make two queries if I want to get all liked_status of a particular user
status_ids = LikedObject.objects.filter(user=user_obj, obj_type='status').values_list('object_id', flat=True)
status_objs = Status.objects.filter(id__in=status_ids)
Am I correct? so What would be the best approach in terms of easy querying/inserting or performance, etc.
You are basically implementing your own Generic Object, only you limit your ContentType to your hard coded OBJECT_TYPE.
If you are only going to access the database as in your example (get all status objects liked by user x), or a couple specific queries, then your own implementation can be a little faster, of course. But obviously, if later you have to add more objects, or do other things, you may find yourself implementing your whole full generic solution. And like they say, why reinvent the wheel.
If you want better performance, and really only have those two Models to worry about, you may just want to have two different Like tables (StatusLike and ImageLike) and use inheritance to share functionality.
class LikedObject(models.Model):
common_field = ...
class Meta:
abstract = True
def some_share_function():
...
class StatusLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="status_liked_objects")
status = models.ForeignKey(Status, related_name="liked_objects")
class ImageLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="image_liked_objects")
image = models.ForeignKey(Image, related_name="liked_objects")
Basically, either you have a lot of Models to worry about, and then you probably want to use the more Django generic object implementation, or you only have two models, and why even bother with a half generic solution. Just use two tables.
In this case, I would check if your data objects Status and Photo may have many common data fields, e.g. Status.user and Photo.user, Status.title and Photo.title, Status.pub_date and Photo.pub_date, Status.text and Photo.caption, etc.
Could you combine them into an Item object maybe? That Item would have a Item.type field, either "photo" or "status"? Then you would only have a single table and a single object type a user can "like". Much simpler at basically no cost.
Edit:
from django.db import models
from django.utils.timezone import now
class Item(models.Model):
data_type = models.SmallIntegerField(
choices=((1, 'Status'), (2, 'Photo')), default=1)
user = models.ForeignKey(User)
title = models.CharField(max_length=100)
pub_date = models.DateTimeField(default=now)
...etc...
class Like(models.Model):
user = models.ForeignKey(User, related_name="liked_objects")
item = models.ForeignKey(Item)

What is the best way to merge 2 tables with Active Record and Mysql

We need to allow users to customize their entities like products... so my intention was to have a product table and a custom_product table with just the information the users are allowed to change.
When a client goes to the product I want to merge the information, means I want to merge the two tables - the custom overwrites the default Products table.
I know that in mysql there exists a ifnull(a.title, b.title) way but I was wondering if there is any nice and efficient way to solve this in Rails 4 with Active Record. Assume that the products and custom products table have just 2 columns, ID and TITLE
I think you can convert both objects to JSON and then handle their params as a hash, using the merge method:
class Product
end
class Customization
belongs_to :product
end
a = Product.find(...)
b = a.customization
c = JSON(a.to_json).merge(JSON(b.to_json).reject!{|k,v| v.nil?})
Therefore c will contain all params from Product eventually overridden by those in Customization which are not nil.
If you still want to use a Product object with hybrid values (taken from Customization) you can try this:
a.attributes = a.attributes.merge(b.attributes.reject!{|k,v| v.nil?})
In this case a will still be a Product instance. I would recommend to keep the same attributes in both models when doing this.

Can I create sperate queries for different views?

I'm learning sqlalchemy and not sure if I grasp it fully yet(I'm more used to writing queries by hand but I like the idea of abstracting the queries and getting objects). I'm going through the tutorial and trying to apply it to my code and ran into this part when defining a model:
def __repr__(self):
return "<User('%s','%s', '%s')>" % (self.name, self.fullname, self.password)
Its useful because I can just search for a username and get only the info about the user that I want but is there a way to either have multiple of these type of views that I can call? or am I using it wrong and should be writing a specific query for getting different data for different views?
Some context to why I'm asking my site has different templates, and most pages will just need the usersname, first/last name but some pages will require things like twitter or Facebook urls(also fields in the model).
First of all, __repr__ is not a view, so if you have a simple model User with defined columns, and you query for a User, all the columns will get loaded from the database, and not only those used in __repr__.
Lets take model Book (from the example refered to later) as a basis:
class Book(Base):
book_id = Column(Integer, primary_key=True)
title = Column(String(200), nullable=False)
summary = Column(String(2000))
excerpt = Column(Text)
photo = Column(Binary)
The first option to skip loading some columns is to use Deferred Column Loading:
class Book(Base):
# ...
excerpt = deferred(Column(Text))
photo = deferred(Column(Binary))
In this case when you execute query session.query(Book).get(1), the photo and excerpt columns will not be loaded until accessed from the code, at which point another query against the database will be executed to load the missing data.
But if you know before you query for the Book that you need the column photo immediately, you can still override the deferred behavior with undefer option: query = session.query(Book).options(undefer('photo')).get(1).
Basically, the suggestion here is to defer all the columns (in your case: except username, password etc) and in each use case (view) override with undefer those you know you need for that particular view. Please also see the group parameter of deferred, so that you can group the attributes by use case (view).
Another way would be to query only some columns, but in this case you are getting the tuple instance instead of the model instance (in your case User), so it is potentially OK for form filling, but not so good for model validation: session.query(Book.id, Book.title).all()

Django: retrieve distinct QuerySet

I've got the following models in my app. The Addition model is used to govern the many-to-many relationship between the Book model and the Collection model, since I need to include extra fields on the intermediate model.
class Book(models.Model):
name = models.CharField(max_length=200)
picture = models.ImageField(upload_to='img', max_length=1000)
price = models.DecimalField(max_digits=8, decimal_places=2)
class Collection(models.Model):
user = models.ForeignKey(User)
name = models.CharField(max_length=100)
books = models.ManyToManyField(Book, through='Addition')
subscribers = models.ManyToManyField(User, related_name='collection_subscriptions', blank=True, null=True)
class Addition(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
collection = models.ForeignKey(Collection)
created = models.DateTimeField(auto_now=False, auto_now_add=True)
updated = models.DateTimeField(auto_now=True, auto_now_add=True)
In my app users can add books to collections that they create (for example fiction, history, etc.). Other users can then follow those collections that they like.
When a user logs into the site, I'd like to display all of the books that have been recently added to the collections that they follow. With each book, I'd also like to display the name of the person who added it, and the name of the collection it's in.
I can get all of the additions as follows...
additions = Addition.objects.filter(collection__subscribers=user).select_related()
... but this results in duplicate books being retrieved and displayed to the user, often side by side.
If there a way to retrieve a distinct list of books that are in collections the user is following?
I'm using Django 1.3 + MySQL.
Thanks.
UPDATE
I should add that in general I'm not looking for any 'loop through the results and de-duplicate that way' solutions, for a couple of reasons.
There are likely to be tens or even hundreds of thousands of additions (I am also displaying this information on pages that list all new additions added by users), and response time is extremely important.
This solution may become more practical when limiting the initial result set, but it creates problems with pagination, which is also required. Namely how do you paginate the entire result set while also de-duplicating only a small portion of that set. I'm open to any ideas here that may solve this problem.
UPDATE
I should also mention that if the same book gets added by multiple users, I actually don't have a preference for which addition gets used, either the original or the most recent addition would work fine.
How about the following - it's not a pure SQL solution, and it'll cost you an extra database query and some loop time, but it should still perform ok, and it'll give you a lot more control over which additions take precedence over others:
def filter_additions(additions):
# Use a ValuesQuerySet for performance
additions_values = additions.values()
# The following code just eliminates duplicates. You could do
# something much more powerful/interesting here if you like,
# e.g. give preference to additions by a user`s friends
book_pk_registry = {}
excluded_addition_pks = []
for addition in additions_values:
addition_pk = addition['id']
book_pk = addition['book_id']
if book_pk not in book_pk_registry:
book_pk_registry[book_pk] = True
else:
excluded_addition_pks.append(addition_pk)
additions = additions.exclude(pk__in=excluded_addition_pks)
additions = Addition.objects.filter(collection__subscribers=user)
additions = filter_additions(additions)
If there are likely to be more than a thousand or so books involved, you may want to put a limit on the initial additions query. Passing massive lists of ids over in the exclude isn't such a great idea. Using 'values()' is quite important, because Python can cycle through a basic list of dicts a LOT faster than a queryset and it uses a lot less memory.
Assuming there won`t be huge amounts of additions to display, this could easily to the trick:
# duplicated..
additions = Addition.objects.filter(collection__subscribers=user, created__gt=DATE_LAST_LOGIN).select_related()
# remove duplication
added_books = {}
for addition in additions:
added_books[addition.book] = True
added_books = added_books.keys()
By the description you gave of the problem, performance would not be a problem.
additions = Addition.objects.filter(collection__subscribers=user).values('book').annotate(user=Min('user'), collection=Min('collection')).order_by()
This query will give you list of unique books with their users and collections. Books, collections, users will be pk's, not objects. But I hope you will store them in cache so that won't be a problem.
But for your workload I'd think about denormalization. My query is very heavy, and it isn't easy to cache its results if you will have frequent additions. My first approach will be to add latest_additions field to Collection model and to update with signals (not adding duplicates). The format of this field is up to you.
Sometimes it's OK to drop into SQL, especially when the ORM-only solution is not performant. It's easy to get the non-duplicate Addition row IDs in SQL, and then you can switch back to the ORM to select the data. It's two queries, but will outperform any of the single query solutions I've seen so far.
from django.db import connection
from operator import itemgetter
cursor = connection.cursor()
# Select non-duplicate book additions, preferring for most recently updated
query = '''SELECT id, MAX(updated) FROM %s
GROUP BY book_id''' % Addition._meta.db_table
cursor.execute(query)
# Flatten the results to an id list
addition_ids = map(itemgetter(0), cursor.fetchall())
additions = Addition.objects.filter(
collection__subscribers=user, id__in=addition_ids).select_related()

Django Combining AND and OR Queries with ManyToMany Field

hoping someone can help me out with this.
I'm trying to figure out whether I can construct a query that will allow me to retrieve items from my db based on a ForeignKey field and a ManyToManyField at the same time. The challenging part is that it will need to filter on multiple ManyToMany objects.
An example will hopefully make this clearer. Here are my (simplified) models:
class Item(models.Model):
name = models.CharField(max_length=200)
brand = models.ForeignKey(User, related_name='brand')
tags = models.ManyToManyField(Tag, blank=True, null=True)
def __unicode__(self):
return self.name
class Meta:
ordering = ['-id']
class Tag(models.Model):
name = models.CharField(max_length=64, unique=True)
def __unicode__(self):
return self.name
I would like to build a query that retrieves items based on two criteria:
Items that were uploaded by users that a user is following (called 'brand' in the model). So for example if a user is following the Paramount user account, I would want all items where brand = Paramount.
Items that match the keywords in saved searches. For example the user could make and save the following search: "80s comedy". In this case I would want all items where the tags include both "80s" and "comedy".
Now I know how to construct the query for each independently. For #1, it's:
items = Item.objects.filter(brand=brand)
And for #2 (based on the docs):
items = Item.objects.filter(tags__name='80s').filter(tags__name='comedy')
My question is: is it possible to construct this as a single query so that I don't have to take the hit of converting each query into a list, joining them, and removing duplicates?
The challenge seems to be that there is no way to use Q objects to construct queries where you need an item's manytomany field (in this case tags) to match multiple values. The following query:
items = Item.objects.filter(Q(tags__name='80s') & Q(tags__name='comedy'))
does NOT work.
(See: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships)
Thanks in advance for your help on this!
After much research I could not find a way to combine this into a single query, so I ended up converting my QuerySets into lists and combining them.
Django's filters automatically AND. Q objects are only needed if you're trying to add ORs. Also, the __in query filter will help you out alot.
Assuming users have multiple brands they like and you want to return any brand they like, you should use:
`brand__in=brands`
Where brands is the queryset returned by something like someuser.brands.all().
The same can be used for your search parameters:
`tags__name__in=['80s','comedy']`
That will return things tagged with either '80s' or 'comedy'. If you need both (things tagged with both '80s' AND 'comedy'), you'll have to pass each one in a successive filter:
keywords = ['80s','comedy']
for keyword in keywords:
qs = qs.filter(tags__name=keyword)
P.S. related_name values should always specify the opposite relationship. You're going to have logic problems with the way you're doing it currently. For example:
brand = models.ForeignKey(User, related_name='brand')
Means that somebrand.brand.all() will actually return Item objects. It should be:
brand = models.ForeignKey(User, related_name='items')
Then, you can get a brands's items with somebrand.items.all(). Makes much more sense.