Calculating and DB Design for getting an average

Calculating and DB Design for getting an average - mysql

I'm trying to wrap my head around the proper design to calculate an average for multiple items, in my case beers. Users of the website can review various beers and all beers are given a rating (avg of all reviews for that beer) based on those reviews. Each beer review has 5 criteria that it's rated on, and those criteria are weighted and then calculated into an overall rating for that particular review (by that user).
Here are some of the relevant models as they currently stand. My current thinking is that all beer reviews will be in their own table like you see below.
class Beer(models.Model):
name = models.CharField(max_length=200)
brewer = models.ForeignKey(Brewery)
style = models.ForeignKey(Style)
.....
class Beerrating(models.Model):
thebeer = models.ForeignKey(Beer)
theuser = models.ForeignKey(User)
beerstyle = models.ForeignKey(Style)
criteria1 = models.IntegerField
...
criteria5 = models.IntegerField
overallrating = models.DecimalField
My real question is how do I calculate the overall beer average based on all the reviews for that beer? Do I keep a running tally in the Beer model (e.g. # reviews and total points; which gets updated after every review) or do I just calculate the avg on the fly? Is my current db design way off the mark?
I'll also be calculating a top beer list (100 highest rated beers), so that's another calculation I'll be doing with the ratings.
Any help is much appreciated. This is my first web app so please forgive my noob-ness. I haven't chosen a DB yet, so if MYSQL or PostgresSQL is better in some way over the other, please provide your preference and perhaps why if you have time. I'll be choosing between those two DB's. I'm also using Django. Thank You.

As long as you're using Django version 1.1, you can use the new aggregation features to calculate the average whenever you need it.
Something like:
from django.db.models import Avg
beers_with_ratings = Beer.objects.all().annotate(avg_rating=Avg('beer__overallrating'))
Now each Beer object will have an avg_rating property which is the average of the overallrating fields for each of its associated Ratings.
Then to get the top 100:
beers_with_ratings.order_by('avg_rating')[:100]
As regards database choice, either is perfectly fine for this sort of thing. Aggregation is a basic feature of relational databases, and both Postgres and Mysql can do it with no problem.

You might want to have a look at Django ratings module. It's very nicely structured and provides a powerful ratings system. And not overly complicated at the same time (although if this is your first web-app it might look slightly intimidating).
You won't have to deal with calculating averages etc. directly.
Edit: To be a bit more helpful
If you use django-ratings, your models.py would probably look something like this:
class Beer(models.Model):
name = models.CharField(max_length=200)
brewer = models.ForeignKey(Brewery)
style = models.ForeignKey(Style)
.....
criteria1 = RatingField(range=5) # possible rating values, 1-5
...
criteria5 = RatingField(range=5)
No need for Beerrating model. Instead all the ratings information will be stored in Vote + Score models of django-ratings.

Related

Rails - Saving Daily Metrics

I'm currently developing a Rails application, on top of PostgreSQL, that stores daily data for our company. We run ads on Facebook, and we have a few hundred ads running at any one time. I pull metrics every day, and import to my application, which then either creates or updates based on if it exists. However, I want to be able to see daily performance over the course of, say a week or month. What would be the easiest way to accomplish this?
My facebook_ad model has X amount of rows, 1 for each ad campaign. Each column denotes a specific metric, i.e. amount spent, clicks, etc. Should I create a new table for each date? Is there a way to timestamp every entry and include the time in my queries? I've made good progress up until here, and no amount of searching has brought me to a strategy I could use.
Side note, we are hoping to access to their API, which would probably solve most of this. But we want to build something in the interim, so we can be as efficient as possible until then, which could be 6 months or more.
Edited::
I want to query and graph the data based on the daily data. For example, grab the metrics from 10/01/14 - 10/08/14 for one ad, and be able to see 10/01/14: MetricA = 1, MetricB = 2; 10/02/14: MetricA = 4, MetricB = 5; 10/03/14: MetricA = 6, Metric B = 3, etc. We want to be able to see trends and see how changes affect performance.

I would definitely not recommend creating a new table for each date -- that would be a data management nightmare. There shouldn't be any reason you can't have each ad campaign in the same table based on what you've said above. You could have a created and updated column in the table which defaults to now(), and if you update it for any reason, set the updated column to now() again. (I like to add those columns to just about every table I create -- it's often useful for a variety of queries).
You could then query that table based on the desired timeframe to get your performance statistics. Depending upon the exact nature of what you want to query, Window Functions may prove to be quite useful.

Correlate and Sum Table Data as Summary

I want to summarize rows from one end of a relationship tree with a table on the other side. Is "correlate" the correct term? Really just knowing the terms would help me solve this problem.
I am using MySQL and am extending an existing DB structure - though would have the liberty to rearrange data if needed. I'm getting better at creating "filtering" queries using JOINs, and I'm sure this next piece will be straight-forward once I understand it (without performing tons of queries : )
I made a simplified schema (and theme!) for this example, but the idea is the same.
Say there are many DietPlans, which is related to a bunch of MenuItems and each MenuItem has an ItemType (such as 'Healthy','Fast','Normal', etc.) On the other side of DietPlan there are Persons, who each store how many DailyCalories they consume, and another table MenuAllocations, where a Person stores how much percent of their daily intake is from what MenuItem.
As examples of scale, There could be 1000 MenuItems, and 50 of those associated with each of 200 DietPlans. Also, each DietPlan might have 10,000 Persons, who each will have 5-10 MenuAllocations of various types.
What I'd like to do feels complex to me. I want to create a dashboard for each DietPlan (there could be many), gathering data from the Persons of that DietPlan, and tabulating the number of calories for each item type.
The math is simple: tblPerson.dailyCalories * tblMenuAllocations.percent. But I want to do that for each Person in the DietPlan, for each ItemType.
I understand the JOINs required to 'filter' from tblItemType around to tblMenuAllocation and think it would be similar to this:
SELECT *
FROM tblMenuAllocation
INNER JOIN tblPerson
on personId = PersonId
INNER JOIN tblDietPlan
on tblPerson.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblMenuItem
on tblMenuItem.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblItemTyp
on ItemTypeId = itemTypeId
WHERE ItemTypeId = 2
It feels like one query for each tblItemType, which could be a LOT of Person and MenuAllocation data to sort through, and doing that many consecutive queries feels like I'm missing something. Also, I think math can be handled in the query to sum values, but I've never done that. Where can I begin?
EDIT: The final results would be something like this:
----------------------------------------------
ItemId | ItemDesc | TotalCalories
----------------------------------------------
1 Healthy 450,876
2 Fast 1,987,948
3 Vegan 349,123
etc.
I would be willing to accept some manipulation of data outside the query, but the Person's specific dailyCalories is very important to the tblMenuAllocation.percent calculation. Some tblMenuAllocation rows might be of the same ItemType!

I think you are looking for these topics :
Aggregate Functions and
Group By Modifiers

Flexible requests to database in Rails

I have the following problem in Rails which I am not sure how to solve. I build a test web application,a bulletin board with ads about real estate, like simple version of http://www.trulia.com/ and obviously users can add advertisements to the site which they can then find in their "office", and thats where the problem appears. I have 8 types of advertisements, like flats, offices, garages and etc, and so I need to be able to retrieve ads that belong to some particular user and I dont want to make sql requests to all 8 tables to show this user's advertisements, if user has , for example, just 1 ad about selling a flat.
So I need something instead of
#garages = Garage.where("user_id = #{current_user.id}")
#flats = Flat.where("user_id = #{current_user.id}")
#offices = Office.where("user_id = #{current_user.id}")
..... and so on
I have User model so all ads belong to some user and I am thinking of creating a polymorphic table which would belong to user and contained information about all ads the user invited
it would be named, for example,"Advertisement",it would have 3 columns, user_id, advertisable_id, advertiseble_type, and it would be very easy to get all rows that belong to some particular user, but I have no idea if its possible to make Rails get ads only from those tables that are in "advertisable_type" and with those ids from "advertisable_id", I hope you understand what I mean. So, any advices for a newbie? :)

If Your existing models Garage, Flat, Office are really advertisements, share a lot of same logic and columns, then obviously You need a data redesign.
If it's 99% the same thing why not just have Advertisement table and a column advertisement_type ? If You want a classifier for Advertisement type, simply use a separate table/model AdvertisementTypes and reference those by advertisement_type_id in Your advertisements table.
If You feel like there are/will be a lot of things in common, but there will be also a lots of distinct logic, it might be an ideal case for STI (Single table inheritance).
When using a STI You have different model classes which inherit from the same parent class and are stored in the same table.
class Advertisement < ActiveRecord::Base
def ad_text
"We are selling property: #{read_attribute(:ad_text)}"
end
end
class GarageAdvertisement < Advertisement
def ad_text
"We are selling garage: #{read_attribute(:ad_text)}"
end
end

Database design to store lottery information

I am designing a system where I am supposed to store different types of Lottery(results + tickets).
Currently focusing on US Mega Millions and Singapore Pool Toto. They both have a similar format.
Mega Millions: Five different numbers from 1 to 56 and one number from 1 to 46.
Toto: 6 numbers from 1 to 45
I need to come up with an elegant database design to store the user tickets and corresponding results.
I thought of two ways to go about it.
Just store 6 six numbers in 6 columns.
OR
Create another table(many to many) which has ball-number and ticket_id
I need to store the ball-numbers for the results as well.
For TOTO if you your numbers match 4 or more winning numbers, you win a prize.
For Mega millions there is a similar process.
I'm looking for the pros and cons or possibly a better solution?
I have done a lot of research and paper work, but I am still confused which way to go about it.

Two tables
tickets
ball_number
ticket_id
player
player_id
ticket_id
// optional
results
ball_number
lottery_id
With two tables you could use a query like:
select ticket_id, count(ball_number) hits
from tickets
where ball_number in (wn1, wn2, ...) // wn - winning number
group by ticket_id
having hits = x
Of course you could take winning numbers from lottery results table (or store them in the balls_table under special ticket numbers).
Also preparing statistics would be easier. With
select count(ticket_id)
from tickets
group by ball_number
you could easily see which numbers are mostly picked.
You might also use some field like lottery number to be able to narrow down the queries as most of them would concern just one lottery.
One table
Using one table with a column for each number might make the queries much more complex. Especially that, as I believe, the numbers are sorted, and there are be prizes for hitting all but one (or two) numbers. Than you might have to compare 1, 2, 3, ... with 2, 3, 4, ... which is not as short as straightforward as the queries above.
One column
Storing all entries in a string in just one column violates all normalization practices, forces you to split the column for most of the queries and takes away all optimization carried out by the database. Also storing numbers requires less disk space than storing text.

Since this is a once a day thing, I think I'd store the data in an easy to edit, maintain, visualize way. Your many-many approach would work. Mainly, I'd want it easy to find users that chose a particular ball_number.
users
id
name
drawings
id
type # Mega Millions or Singapore (maybe subclass Drawing)
drawing_on
wining_picks
drawing_id
ball_number
ticket
drawing_id
user_id
correct_count
picks
id
ticket_id
ball_number
Once you get the numbers in, find all user_ids that pick a particular number in a drawing
Get the drawing by date
drawing = Drawing.find_by_drawing_on(drawing_date)
Get the users by ball_number and drawing.
picked_1 = User.picked(1,drawing)
picked_2 = User.picked(2,drawing)
picked_3 = User.picked(3,drawing)
This is a scope on User
class User < ActiveRecord::Base
def self.picked(ball_number, drawing)
joins(:tickets => :picks).where(:picks => {:ball_number => ball_number}, :tickets => {:drawing_id => drawing.id})
end
end
Then do quick array intersections to get the user_ids that got 3,4,5,6 picks correct. You'd loop through the winning numbers to get the permutations.
For example if the winning numbers were 3,8,21,24,27,44
some_3_correct_winner_ids = picked_3 & picked_8 & picked_21 # Array intersection
For each winner - update the ticket with correct count.
I may potentially store winners separately, but with an index on correct_count, and not too much data in tickets, this would probably be ok for now.

I would just concatenate them using a convention and store them in one column.
Something like '10~20~30~40~50~!60'
~ separates numbers
! indicates special number ( powerball, etc)
Have a sql table valued function split the result if you really need to have it in columns.

Firstly, let me say that I'm an Oracle person, not a MySQL person.
Secondly, I'd usually say to go for a normalised design, but I'm tempted here to think of a very unconventional alternative which I'll float out here for comment.
How about you denormalised it to the extent of using one column for all the number choices?
ticket_id integer
nums bit(56)
special_number integer
It would be a pretty compact representation, and you could perhaps use bit-wise operations to find the winners or potential winners.
No idea if this is workable ... open for comments.

Filter a MySQL Result in Delphi

I'm having an issue with a certain requirement to one of my Homework Assignments. I am required to take a list of students and print out all of the students with credit hours of 12 or more. The Credit hours are stored in a separate table, and referenced through a third table
basically, a students table, a classes table with hours, and an enrolled table matching student id to Course id
I used a SUM aggregate grouped by First name from the tables and that all works great, but I don't quite understand how to filter out the people with less than 12 hours, since the SQL doesn't know how many hours each person is taking until it's done with the query.
my string looks like this
'SELECT Students.Fname, SUM(Classes.Crhrs) AS Credits
FROM Students, Classes, Enrolled
WHERE Students.ID = Enrolled.StudentID AND Classes.ID = Enrolled.CourseID
GROUP BY Students.Fname;'
It works fine and shows the grid in the Delphi Project, but I don't know where to go from here to filter the results, since each query run deletes the previous.

Since it's a homework exercise, I'm going to give a very short answer: look up the documentation for HAVING.

Beside getting the desired result directly from SQL as Martijn suggested, Delphi datasets have ways to filter data on the "client side" also. Check the Filter property and the OnFilter record.
Anyway, remember it is usually better to apply the best "filter" on the database side using the proper SQL, and then use client side "filters" only to allow for different views on an already obtained data set, without re-querying the same data, thus saving some database resources and bandwidth (as long as the data on the server didn't change meanwhile...)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008