How to eagerly load all relationships in SQLAlchemy

How to eagerly load all relationships in SQLAlchemy - sqlalchemy

I have the following model:
class Item(Base):
a = relationship(<whatever>)
b = relationship(<whatever>)
c = relationship(<whatever>)
d = relationship(<whatever>)
other_stuff = Column(<whatever>)
Most of the time, I just want to see the other_stuff column, so I don't specify lazy='joined' in the relationship. But sometimes, I want to see all the joined fields, and I want them to be loaded in one SQL query. I could do the following:
query(Item).options(joinedload('a')).options(joinedload('b')).options(joinedload('c')).options(joinedload('d'))
But I feel like this is a common enough use case that there has to be a prettier way to do it.

You can simply say .options(joinedload('*')).
For reference: http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#wildcard-loading-strategies

Related

Contao CMS Query a 'checkboxWizard' BLOB field

I have a question about how to query a 'checkboxWizard' BLOB field. In have added a such field to tl_member. This is working very fine. I can add “0 to N” selection to each members. Let’s call this field “myBlob”.
Now the questions is how to query “myBlob” with the Contao way? Let’s say I want all member that are in the postal code “12120” and that have the id “2” of “myBlob” selected. Not only “2” but at least this one.
$arrColumn[] = "tl_member.postal=?";
$arrValues[] = 12120;
$arrColumn[] = "tl_member.myBlob=?"; <- how to say “contains in the blob” here?
$arrValues[] = 2;
self::findBy($arrColumn, $arrValues)

The only way to do this (when using the default Contao method for such relationships) is to create a query like:
… WHERE myBlob LIKE '%"2"%'
So in your case it might be:
$arrColumn[] = "tl_member.myBlob LIKE ?";
$arrValues[] = '%"2"%';
However, this is of course cumbersome and might not work in all cases.
May be a better way would be to use codefog/contao-haste with its 'many to many' helper: https://github.com/codefog/contao-haste/blob/master/docs/Model/index.md
This way you will have a separate table containing the references.

Django GenereicForeignKey v/s custom manual fields performance/optimization

I'm trying to build a typical social networking site. there are two types of objects mainly.
photo
status
a user can like photo and status. (Note that these two are mutually exclusive)
means, We have two table (1) for Image only and other for status only.
now when a user likes an object(it could be a photo or status) how should I store that info.
I want to design a efficient SQL schema for this.
Currently I'm using Genericforeignkey(GFK)
class LikedObject(models.Model):
content_type = models.ForeignKey(ContentType)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
but yesterday I thought if I can do this without using GFK efficiently?
class LikedObject(models.Model):
OBJECT_TYPE = (
('status', 'Status'),
('image', 'Image/Photo'),
)
user = models.ForeignKey(User, related_name="liked_objects")
obj_id = models.PositiveIntegerField()
obj_type = models.CharField(max_length=63, choices=OBJECT_TYPE)
the only difference I can understand is that I have to make two queries if I want to get all liked_status of a particular user
status_ids = LikedObject.objects.filter(user=user_obj, obj_type='status').values_list('object_id', flat=True)
status_objs = Status.objects.filter(id__in=status_ids)
Am I correct? so What would be the best approach in terms of easy querying/inserting or performance, etc.

You are basically implementing your own Generic Object, only you limit your ContentType to your hard coded OBJECT_TYPE.
If you are only going to access the database as in your example (get all status objects liked by user x), or a couple specific queries, then your own implementation can be a little faster, of course. But obviously, if later you have to add more objects, or do other things, you may find yourself implementing your whole full generic solution. And like they say, why reinvent the wheel.
If you want better performance, and really only have those two Models to worry about, you may just want to have two different Like tables (StatusLike and ImageLike) and use inheritance to share functionality.
class LikedObject(models.Model):
common_field = ...
class Meta:
abstract = True
def some_share_function():
...
class StatusLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="status_liked_objects")
status = models.ForeignKey(Status, related_name="liked_objects")
class ImageLikeObject(LikedObject):
user = models.ForeignKey(User, related_name="image_liked_objects")
image = models.ForeignKey(Image, related_name="liked_objects")
Basically, either you have a lot of Models to worry about, and then you probably want to use the more Django generic object implementation, or you only have two models, and why even bother with a half generic solution. Just use two tables.

In this case, I would check if your data objects Status and Photo may have many common data fields, e.g. Status.user and Photo.user, Status.title and Photo.title, Status.pub_date and Photo.pub_date, Status.text and Photo.caption, etc.
Could you combine them into an Item object maybe? That Item would have a Item.type field, either "photo" or "status"? Then you would only have a single table and a single object type a user can "like". Much simpler at basically no cost.
Edit:
from django.db import models
from django.utils.timezone import now
class Item(models.Model):
data_type = models.SmallIntegerField(
choices=((1, 'Status'), (2, 'Photo')), default=1)
user = models.ForeignKey(User)
title = models.CharField(max_length=100)
pub_date = models.DateTimeField(default=now)
...etc...
class Like(models.Model):
user = models.ForeignKey(User, related_name="liked_objects")
item = models.ForeignKey(Item)

What is the best way to merge 2 tables with Active Record and Mysql

We need to allow users to customize their entities like products... so my intention was to have a product table and a custom_product table with just the information the users are allowed to change.
When a client goes to the product I want to merge the information, means I want to merge the two tables - the custom overwrites the default Products table.
I know that in mysql there exists a ifnull(a.title, b.title) way but I was wondering if there is any nice and efficient way to solve this in Rails 4 with Active Record. Assume that the products and custom products table have just 2 columns, ID and TITLE

I think you can convert both objects to JSON and then handle their params as a hash, using the merge method:
class Product
end
class Customization
belongs_to :product
end
a = Product.find(...)
b = a.customization
c = JSON(a.to_json).merge(JSON(b.to_json).reject!{|k,v| v.nil?})
Therefore c will contain all params from Product eventually overridden by those in Customization which are not nil.
If you still want to use a Product object with hybrid values (taken from Customization) you can try this:
a.attributes = a.attributes.merge(b.attributes.reject!{|k,v| v.nil?})
In this case a will still be a Product instance. I would recommend to keep the same attributes in both models when doing this.

sqlalchemy relations and query on relations

Suppose I have 3 tables in sqlalchemy. Users, Roles and UserRoles defined in declarative way. How would one suggest on doing something like this:
user = Users.query.get(1) # get user with id = 1
user_roles = user.roles.query.limit(10).all()
Currently, if I want to get the user roles I have to query any of the 3 tables and perform the joins in order to get the expected results. Calling directly user.roles brings a list of items that I cannot filter or limit so it's not very helpful. Joining things is not very helpful either since I'm trying to make a rest interface with requests such as:
localhost/users/1/roles so just by that query I need to be able to do Users.query.get(1).roles.limit(10) etc etc which really should 'smart up' my rest interface without too much bloated code and if conditionals and without having to know which table to join on what. Users model already has the roles as a relationship property so why can't I simply query on a relationship property like I do with normal models?

Simply use Dynamic Relationship Loaders. Code below verbatim from the documentation linked to above:
class User(Base):
__tablename__ = 'user'
posts = relationship(Post, lazy="dynamic")
jack = session.query(User).get(id)
# filter Jack's blog posts
posts = jack.posts.filter(Post.headline=='this is a post')
# apply array slices
posts = jack.posts[5:20]

Django: retrieve distinct QuerySet

I've got the following models in my app. The Addition model is used to govern the many-to-many relationship between the Book model and the Collection model, since I need to include extra fields on the intermediate model.
class Book(models.Model):
name = models.CharField(max_length=200)
picture = models.ImageField(upload_to='img', max_length=1000)
price = models.DecimalField(max_digits=8, decimal_places=2)
class Collection(models.Model):
user = models.ForeignKey(User)
name = models.CharField(max_length=100)
books = models.ManyToManyField(Book, through='Addition')
subscribers = models.ManyToManyField(User, related_name='collection_subscriptions', blank=True, null=True)
class Addition(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
collection = models.ForeignKey(Collection)
created = models.DateTimeField(auto_now=False, auto_now_add=True)
updated = models.DateTimeField(auto_now=True, auto_now_add=True)
In my app users can add books to collections that they create (for example fiction, history, etc.). Other users can then follow those collections that they like.
When a user logs into the site, I'd like to display all of the books that have been recently added to the collections that they follow. With each book, I'd also like to display the name of the person who added it, and the name of the collection it's in.
I can get all of the additions as follows...
additions = Addition.objects.filter(collection__subscribers=user).select_related()
... but this results in duplicate books being retrieved and displayed to the user, often side by side.
If there a way to retrieve a distinct list of books that are in collections the user is following?
I'm using Django 1.3 + MySQL.
Thanks.
UPDATE
I should add that in general I'm not looking for any 'loop through the results and de-duplicate that way' solutions, for a couple of reasons.
There are likely to be tens or even hundreds of thousands of additions (I am also displaying this information on pages that list all new additions added by users), and response time is extremely important.
This solution may become more practical when limiting the initial result set, but it creates problems with pagination, which is also required. Namely how do you paginate the entire result set while also de-duplicating only a small portion of that set. I'm open to any ideas here that may solve this problem.
UPDATE
I should also mention that if the same book gets added by multiple users, I actually don't have a preference for which addition gets used, either the original or the most recent addition would work fine.

How about the following - it's not a pure SQL solution, and it'll cost you an extra database query and some loop time, but it should still perform ok, and it'll give you a lot more control over which additions take precedence over others:
def filter_additions(additions):
# Use a ValuesQuerySet for performance
additions_values = additions.values()
# The following code just eliminates duplicates. You could do
# something much more powerful/interesting here if you like,
# e.g. give preference to additions by a user`s friends
book_pk_registry = {}
excluded_addition_pks = []
for addition in additions_values:
addition_pk = addition['id']
book_pk = addition['book_id']
if book_pk not in book_pk_registry:
book_pk_registry[book_pk] = True
else:
excluded_addition_pks.append(addition_pk)
additions = additions.exclude(pk__in=excluded_addition_pks)
additions = Addition.objects.filter(collection__subscribers=user)
additions = filter_additions(additions)
If there are likely to be more than a thousand or so books involved, you may want to put a limit on the initial additions query. Passing massive lists of ids over in the exclude isn't such a great idea. Using 'values()' is quite important, because Python can cycle through a basic list of dicts a LOT faster than a queryset and it uses a lot less memory.

Assuming there won`t be huge amounts of additions to display, this could easily to the trick:
# duplicated..
additions = Addition.objects.filter(collection__subscribers=user, created__gt=DATE_LAST_LOGIN).select_related()
# remove duplication
added_books = {}
for addition in additions:
added_books[addition.book] = True
added_books = added_books.keys()
By the description you gave of the problem, performance would not be a problem.

additions = Addition.objects.filter(collection__subscribers=user).values('book').annotate(user=Min('user'), collection=Min('collection')).order_by()
This query will give you list of unique books with their users and collections. Books, collections, users will be pk's, not objects. But I hope you will store them in cache so that won't be a problem.
But for your workload I'd think about denormalization. My query is very heavy, and it isn't easy to cache its results if you will have frequent additions. My first approach will be to add latest_additions field to Collection model and to update with signals (not adding duplicates). The format of this field is up to you.

Sometimes it's OK to drop into SQL, especially when the ORM-only solution is not performant. It's easy to get the non-duplicate Addition row IDs in SQL, and then you can switch back to the ORM to select the data. It's two queries, but will outperform any of the single query solutions I've seen so far.
from django.db import connection
from operator import itemgetter
cursor = connection.cursor()
# Select non-duplicate book additions, preferring for most recently updated
query = '''SELECT id, MAX(updated) FROM %s
GROUP BY book_id''' % Addition._meta.db_table
cursor.execute(query)
# Flatten the results to an id list
addition_ids = map(itemgetter(0), cursor.fetchall())
additions = Addition.objects.filter(
collection__subscribers=user, id__in=addition_ids).select_related()

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to eagerly load all relationships in SQLAlchemy - sqlalchemy

You can simply say .options(joinedload('*')). For reference: http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#wildcard-loading-strategies

Related

Contao CMS Query a 'checkboxWizard' BLOB field

Django GenereicForeignKey v/s custom manual fields performance/optimization

What is the best way to merge 2 tables with Active Record and Mysql

sqlalchemy relations and query on relations

Django: retrieve distinct QuerySet

Categories

Resources