I am creating a project similar to a house booking system, with a few particularities (you publish the house but you can rent the rooms individually, and you can set when you are on holidays and the house is not available) in Django (rest-framework, so only API for now). The house model can be simplified to:
class House(models.Model):
title = models.CharField(max_length=127)
n_rooms = models.PositiveSmallIntegerField()
there is a calendar to save when the house is not available:
class Unavailability(models.Model):
house = models.ForeignKey(House, on_delete=models.CASCADE, related_name="house_unavailability")
unavailable_rooms = models.SmallIntegerField()
from_date = models.DateField()
to_date = models.DateField()
and a model to save when there have been bookings:
class Booking(models.Model):
house = models.ForeignKey(House, on_delete=models.CASCADE, related_name='booking')
booker = models.ForeignKey(UserProfile, on_delete=models.CASCADE)
start_date = models.DateField()
end_date = models.DateField()
n_rooms = models.PositiveSmallIntegerField(default=1)
rent = models.DecimalField(decimal_places=2, max_digits=8)
I am now trying to create an API to search the houses that have at least one room available in the selected dates (not on holidays and not booked).
I have seen some similar cases without the particularities I have and using other languages but none using Django (I am using MySQL so I could fall back to SQL if a clean solution with Django does not arise)
Here, I am try to write a query that will provide expected result but I am not tested that query so may syntax error will be there but I think it will help.
from django.db.models import Sum, Value, IntegerField, Q, F
from django.db.models.functions import Coalesce
from django.utils import timezone
given_start_date = timezone.now().date() # assuming
given_end_date = timezone.now().date() # assuming
houses_at_least_one_room_available = House.objects.annotate(
total_unavailable_rooms=Coalesce(
Sum("house_unavailability__unavailable_rooms", filter=Q(
house_unavailability__from_date__range=(given_start_date, given_end_date),
house_unavailability__to_date__range=(given_start_date, given_end_date)
), distinct=True),
Value(0),
output_field=IntegerField()
),
total_booked_rooms=Coalesce(
Sum("booking__n_rooms", filter=Q(
booking__start_date__range=(given_start_date, given_end_date),
booking__end_date__range=(given_start_date, given_end_date)
), distinct=True),
Value(0),
output_field=IntegerField()
),
available_rooms=F("n_rooms") - F("total_unavailable_rooms") - F("total_booked_rooms")
).filter(available_rooms__gt=0)
Here, you can update the filter query inside Sum according to your use case.
Related
This is my first question in Stackoverflow ever. :P Everything work just fine, except a crawl order, I add a priority method but didn`t work correctly. Need to first write all author data, then all album and songs data and store to DB with this order. I want to query items in a MySql table by order from item in another one.
Database structure: https://i.postimg.cc/GhF4w32x/db.jpg
Example: first write all author items in Author table, and then order album items in Album table by authorId from Author table.
Github repository: https://github.com/markostalma/discogs/tree/master/discogs
P.S. I have a three item class for author, album and song parser.
Also I was tried to make a another flow of spider and put all in one item class, but with no success. Order was a same. :(
Sorry for my bad English.
You need to setup an item pipeline for this. I would suggest using SQL Alchemy to build the SQL item and connect to the DB. You're SQL Alchemy class will reflect all the table relationships you have in your DB schema. Let me show you. This is a working example of a similar pipeline that I have except you would setup your class on the SQLAlchemy to container the m2m or foreignkey relationships you need. You'll have to refer to their documentation [1] .
An even more pythonic way of doing this would be to keep your SQL Alchemy class and item names the same and do something like for k,v in item.items():
This way you can just loop the item and set what is there. Code is long and violates DRY for a purpose though.
# -*- coding: utf-8 -*-
from scrapy.exceptions import DropItem
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Boolean, Sequence, Date, Text
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
import datetime
DeclarativeBase = declarative_base()
def db_connect():
"""
This function connections to the database. Tables will automatically be created if they do not exist.
See __tablename__ under RateMds class
MySQL example: engine = create_engine('mysql://scott:tiger#localhost/foo')
"""
return create_engine('sqlite:///reviews.sqlite', echo=True)
class GoogleReviewItem(DeclarativeBase):
__tablename__ = 'google_review_item'
pk = Column('pk', String, primary_key=True)
query = Column('query', String(500))
entity_name = Column('entity_name', String(500))
user = Column('user', String(500))
review_score = Column('review_score', Integer)
description = Column('description', String(5000))
top_words = Column('top_words', String(10000), nullable=True)
bigrams = Column('bigrams', String(10000), nullable=True)
trigrams = Column('trigrams', String(10000), nullable=True)
google_average = Column('google_average', Integer)
total_reviews = Column('total_reviews', Integer)
review_date = Column('review_date', DateTime)
created_on = Column('created_on', DateTime, default=datetime.datetime.now)
engine = db_connect()
Session = sessionmaker(bind=engine)
def create_individual_table(engine):
# checks for tables existance and creates them if they do not already exist
DeclarativeBase.metadata.create_all(engine)
create_individual_table(engine)
session = Session()
def get_row_by_pk(pk, model):
review = session.query(model).get(pk)
return review
class GooglePipeline(object):
def process_item(self, item, spider):
review = get_row_by_pk(item['pk'], GoogleReviewItem)
if review is None:
googlesite = GoogleReviewItem(
query=item['query'],
google_title=item['google_title'],
review_score=item['review_score'],
review_count=item['review_count'],
website=item['website'],
website_type=item['website_type'],
top_words=item['top_words'],
bigrams=item['bigrams'],
trigrams=item['trigrams'],
text=item['text'],
date=item['date']
)
session.add(googlesite)
session.commit()
return item
else:
raise DropItem()
[1]: https://docs.sqlalchemy.org/en/13/core/constraints.html
I am trying to optimize my db queries(mysql) in a django app.
This is the situation:
I need to retrieve some data about sales, stock about some products on a monthly basis. This is the function
def get_magazzino_month(year, month):
from magazzino.models import ddt_in_item, omaggi_item, inventario_item
from corrispettivi.models import corrispettivi_item, corrispettivi
from fatture.models import fatture_item, fatture, fatture_laboratori_item
from prodotti.models import prodotti
qt = 0
val = 0
products = prodotti.objects.all()
invents = inventario_item.objects.all().filter(id_inventario__data__year=year-1)
fatture_lab = fatture_laboratori_item.objects.all().order_by("-id_fattura__data")
for product in products:
inv_instance = filter_for_product(invents, product)
if inv_instance:
qt += inv_instance[0].quantita
lab_instance = fatture_lab.filter(id_prodotti=product).first()
prezzo_prodotto = (lab_instance.costo_acquisto/lab_instance.quantita - ((lab_instance.costo_acquisto/lab_instance.quantita) * lab_instance.sconto / 100)) if lab_instance else product.costo_acquisto
return val, qt
The problem is where I need to filter all the data to get only the product I need. It seems that the .filter option makes django requery the database, although all of the data is there. I tried making a function to filter it myself, but although the queries diminish, loading time increases dramatically.
This is the function to filter:
def filter_for_product(array, product):
result = []
for instance in array:
if instance.id_prodotti.id == product.id:
result.append(instance)
return result
Has anyone ever dealt with this kind of problem?
You can use prefetch_related() to return a queryset of related objects and Prefetch() to further control the operation.
from django.db.models import Prefetch
products = prodotti.objects.all().annotate(
Prefetch(
'product_set',
queryset=inventario_item.objects.all().filter(id_inventario__data__year=year-1),
to_attr='invent'
)
)
Then you can access each product's invent like products[0].invent
Using select_related() will help optimize your queries
A good example of what select_related() does and how to use it is available at simpleisbetterthancomplex.
Say that I have this structure:
class Distinct_Alert(models.Model):
entities = models.ManyToManyField(to='Entity', through='Entity_To_Alert_Map')
objects = Distinct_Alert_Manager()
has_unattended = models.BooleanField(default=True)
latest_alert_datetime = models.DateTimeField(null=True)
class Entity_To_Alert_Map(models.Model):
entity = models.ForeignKey(Entity, on_delete=models.CASCADE)
distinct_alert = models.ForeignKey(Distinct_Alert, on_delete=models.CASCADE)
entity_alert_relationship_label = models.TextField()
class Entity(models.Model):
label = models.CharField(max_length=700, blank=False)
related_entities = models.ManyToManyField('self')
identical_entities = models.ManyToManyField('self')
objects = Entity_Manager()
disregarding the other fields, what I'm trying to do is get all the unique entities from a selection of distinct alerts. So say that I pull 3 distinct alerts, and each of them have 4 entities in its manytomany entity field, say that across them, a couple are shared, I want to get only the distinct ones.
I'm doing this:
ret_list = map(lambda x: x.get_dictionary(), itertools.chain(*[alert.entities.all() for alert in
Distinct_Alert.objects.filter(
has_unattended=True,
entities__related_entities__label=threat_group_label)]))
return [dict(t) for t in set([tuple(d.items()) for d in ret_list])]
But as I imagine, this isn't optimal at all since I end up pulling a lot of dupes and then deduping at the end. I've tried pulling the distinct value entities but that pulls me a Long that's used as a key to map the entities to the distinct alert table. Any way to improve this?
Can you try this?
entity_ids = EntityToAlertMap.objects.filter(distinct_alert__has_unattended=True,
entity__related_entities__label=thread_group_label)) \
.values_list('entity', flat=True).distinct()
Entity.objects.filter(id__in=entity_ids)
Django doc about values_list.
I'm beginner in Djanog and trying to display values that distinct result with foreinkey.
Here are my env and example model.
Django 1.8
Mysql5
Python2.7
class a_group(models.Model):
num = models.AutoField(primary_key=True)
title = models.CharField(max_length=50)
def __unicode__(self):
return self.title
class b_group(models.Model):
no = models.AutoField(primary_key=True)
group = models.ForeignKey(a_group)
And then I tried to distinct with group field like this.
g = b_group.objects.values('group').distinct()
But, as mentioned at here https://docs.djangoproject.com/en/dev/ref/models/querysets/#values , it only return pk, not title.
Is there anyway to get title field value also?
You can also refer to fields on related models with reverse relations through OneToOneField,ForeignKey and ManyToManyField attributes, you can do as follow:
g = b_group.objects.values('group__title').distinct()
to access a field of related model Django use by convention double underscore.
Now that Google has announced availability of Cloud SQL storage for app engine, what will be the best way to migrate existing data from BigTable/GAE Datastore to MySQL?
To respond to the excellent questions brought up by Peter:
In my particular scenario, I kept my data model very relational.
I am fine with taking my site down for a few hours to make the transition, or at least warning people that any changes they make for the next few hours will be lost due to database maintenance, etc.
My data set is not very large - the main dashboard for my app says .67gb, and the datastore statistics page says it's more like 200mb.
I am using python.
I am not using the blobstore (although I think that is a separate question from a pure datastore migration - one could migrate datastore usage to MySql while maintaining the blobstore).
I would be fine with paying a reasonable amount (say, less than $100).
I believe my application is Master/Slave - it was created during the preview period of App Engine. I can't seem to find an easy way to verify that though.
It seems like the bulk uploader should be able to be used to download the data into a text format that could then be loaded with mysqlimport, but I don't have any experience with either technology. Also, it appears that Cloud SQL only supports importing mysqldumps, so I would have to install MqSQL locally, mysqlimport the data, then dump it, then import the dump?
An example of my current model code, in case it's required:
class OilPatternCategory(db.Model):
version = db.IntegerProperty(default=1)
user = db.UserProperty()
name = db.StringProperty(required=True)
default = db.BooleanProperty(default=False)
class OilPattern(db.Model):
version = db.IntegerProperty(default=2)
user = db.UserProperty()
name = db.StringProperty(required=True)
length = db.IntegerProperty()
description = db.TextProperty()
sport = db.BooleanProperty(default=False)
default = db.BooleanProperty(default=False)
retired = db.BooleanProperty(default=False)
category = db.CategoryProperty()
class League(db.Model):
version = db.IntegerProperty(default=1)
user = db.UserProperty(required=True)
name = db.StringProperty(required=True)
center = db.ReferenceProperty(Center)
pattern = db.ReferenceProperty(OilPattern)
public = db.BooleanProperty(default=True)
notes = db.TextProperty()
class Tournament(db.Model):
version = db.IntegerProperty(default=1)
user = db.UserProperty(required=True)
name = db.StringProperty(required=True)
center = db.ReferenceProperty(Center)
pattern = db.ReferenceProperty(OilPattern)
public = db.BooleanProperty(default=True)
notes = db.TextProperty()
class Series(db.Model):
version = db.IntegerProperty(default=3)
created = db.DateTimeProperty(auto_now_add=True)
user = db.UserProperty(required=True)
date = db.DateProperty()
name = db.StringProperty()
center = db.ReferenceProperty(Center)
pattern = db.ReferenceProperty(OilPattern)
league = db.ReferenceProperty(League)
tournament = db.ReferenceProperty(Tournament)
public = db.BooleanProperty(default=True)
notes = db.TextProperty()
allow_comments = db.BooleanProperty(default=True)
complete = db.BooleanProperty(default=False)
score = db.IntegerProperty(default=0)
class Game(db.Model):
version = db.IntegerProperty(default=5)
user = db.UserProperty(required=True)
series = db.ReferenceProperty(Series)
score = db.IntegerProperty()
game_number = db.IntegerProperty()
pair = db.StringProperty()
notes = db.TextProperty()
entry_mode = db.StringProperty(choices=entry_modes, default=default_entry_mode)
Have you considered using the Map Reduce framework?
You could write mappers that store the datastore entities in CloudSQL.
Do not forget to add a column for the datastore key, this might
help you avoiding duplicate rows or identifying missing rows.
You might have a look at https://github.com/hudora/gaetk_replication
for an inspiration on the mapper functions.