Best Practice for Master-Detail relationship when ALL combinations are required - mysql

I need to understand best practice for below scenario:
1) A school can have multiple classes (grades)
2) A school can have multiple events
3) Event is associated with minimum one class or it can be all school event (i.e. applicable for all classes in the school)
I have typical table structure
Event table (to store events), Class (to store class) and event_class (association table)
1) I insert a row inside 'event_class' table when an event gets associated with class
2) If it is a school event then, assuming a school has 20 classes, I insert 20 records inside 'event_class' table
In theory I know above is correct and would work.
My question is: In case when no of classes get increased from 20 to <>; what's should be the approach? If it is an all school event then shall I just store a flag at header level and use left/right join to get list of events? I am trying to understand what's normally practice.
Thanks in advance
Manisha

I don't know if this would count as 'Best' practice, but I've certainly seen it a lot in the wild, and therefore is probably at least 'Good' practice.
The Event table should probably have a 'type' column - defined by you, could be numeric or text ('class' or 'school', 0 or 1, etc.). Then, there would only be entries in the event_class table for events of type 'class'.
When retrieving class-specific data for class_id, your join logic would look like:
SELECT * FROM event, event_class
WHERE event_class.class_id = this_class.id
AND event.type = 'class'
AND event.id = event_class.event_id
If you wanted all the class AND school event data for class_id, it would look like:
SELECT * FROM event, event_class
WHERE (event_class.class_id = this_class.id
AND event.type = 'class'
AND event.id = event_class.event_id)
OR event.type = 'school'
Mind the parentheses on the second one to make sure the boolean logic works correctly. None of this is tested I'm afraid - just an idea. There are probably ways of optimising the joins, but for 20 classes, it's likely not worth the effort.

Related

Query ActiveRecord for records and relation calculations at once

TL;DR? See Edit 2
I've got a little Rails application that has a few different sort of games people can play: it's based around sports, so they can pick the winners of each game every week (model PickEm, attribute correct boolean with nil for unfinished games), and predict the outcome of a specific team's game (model Guess, attribute score with integer, nil for unfinished games). Every User has_many PickEms and Guesses. And I'm trying to display standings (correct/total - total being all non-nil, score/total possible).
What I'm finding is that I can gather the users and their associated records, but in trying to display standings I'm discovering that every single User is triggering another query - slow and not sustainable as the user base increases. That's because #user.pick_em_score is pick_ems.where(correct: true).size and #user.guess_Score is guesses.where.not(score: nil).sum(:score). So I call user.pick_em_score and it runs that query. I feel like there should be a way to get every User, as well as these specific counts, at once, rather than buffering a whole bunch of needless extra stuff.
What I need:
User record
User.pick_em_score (calculated by counting correct records)
User.pick_ems count where NOT NULL
User.guesses_score (calculated by guesses.sum(:score))
User.guesses count where NOT NULL
Most of the stuff I find on Rails's ActiveRecord helpers, especially related to calculations, is for retrieving only the calculation. It looks like I'll probably need to delve directly into select() etc. But I can't get it working. Can someone point me in the right direction?
Edit
For clarification: I'm aware that I can write this information to the User model, but this is overly restrictive: next season, I'll need to add a new column to the User for that year's results, etc. In addition, this is a third degree of callback updating related models – the Match model already updates related PickEms and Guesses on save. I'm looking for the simplest ActiveRecord query or queries to be able to work with this information, as indicated by the title. Ideally one query that returns the above information, but if it needs to a few, that's OK.
I used to work directly in MySQL with PHP, but those skills have rusted (in raw MySQL, I imagine, I'd have several sub-select statements to help pull these counts) and I'd also like to be able to use Rails's ActiveRecord helpers and such, and avoid constructing raw SQL as much as possible.
Second Edit:
I seem to have it down to one call that starts to work, but I'm writing a lot of SQL. It's also brittle, IMO, and trying to run with it has failed. It also looks like I'm just pushing the million singular SELECT queries from Rails right into SQL, but that may still be a step up.
User.unscoped.select('users.*',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct) AS correct_pick_ems',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct IS NOT NULL) AS total_pick_ems',
'(SELECT SUM(guesses.score) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_score',
'(SELECT COUNT(*) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_count' )
The issue seems to be: is there a way to use Rails, and not raw SQL, to link up users.id that we see there with these subqueries? Or just … a better way to construct this, in general?
In addition, I'm running another set of SELECTs for the WHERE, which would hinge on total_pick_ems and guesses_count being > 0 but since I can't use those aliased columns, I have to call the SELECT one more time.
Welcome to AR. Its really only good for simple CRUD like queries. Once you actually want to query your data in anger it just doesn't have the capababilities to do the queries you want without resorting to wholesale SQL strings and often abandoning the ability to chain as a result.
Its precisely why I moved to Sequel as it does have the features to compose queries using a much fuller SQL feature set, including join conditions, window functions, recursive common table expressions, and advanced eager loading. The author is incredibly responsive and documentation is excellent compared to AR and Arel.
I don't expect you will like this answer but a time will come when you will start to look outside the opinionated components that come with rails which I have to say are hardly best of breed. Sequel also sped my application up many times over what I was able to get with AR as well, it not just developer happiness, it means less servers to run. Yes it will be a learning curve but IMO its better to learn tools that have your back covered.
Joins might work. Smthing like below
User.unscoped.joins(:guesses).joins(:pick_ems).
where("guesses.score IS NOT NULL").
select("users.*,
sum(guesses.score) as guesses_score,
count(guesses.id) as guesses_count,
count(case when pick_ems.correct = True then 1 else null end)
as correct_pick_ems,
count(case when pick_ems.correct != null then 1 else null end)
as total_pick_ems,
").
group("users.id")
If you need this information for a limited number of users at a time then above query or eager loading (User.includes(:guesses, :pick_ems)) with class methods like
def correct_pick_ems
pick_ems.count(&:correct)
end
would work.
However If you need this information for all the users most of the time, cached counters within the users table would be more optimal.
What you need is some sort of custom (smart) counter_cache to count only at certain conditions (e.g correct is true)
You can achive this using conditional after_save & after_destroy triggers to build your own custom counter_cache that looks like this:
class PickEm
belongs_to :user
after_save :increment_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
after_destroy :decrement_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
private
def increment_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter + 1) #update_column should not trigger any validations or callbacks
end
def decrement_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter - 1) #update_column should not trigger any validations or callbacks
end
end
Notes:
Code not tested (only to show the idea)
Some guys said it's better to avoid naming custom counters as rails name them (foo_counter_cache)
You should benchmark it, but my hunch is that adding all of that data into a single SELECT isn't going to be much faster than breaking it up into separate SELECTs (I've actually had cases where the latter was faster). By breaking it up, you can also stick to more ActiveRecord and less raw SQL, e.g.:
user_ids_to_pick_em_score = User.joins(:pick_ems).where(pick_ems: {correct: true}).group(:user_id).count
user_ids_to_pick_ems_count = User.joins(:pick_ems).where.not(pick_ems: {correct: nil}).group(:user_id).count
user_ids_to_guesses_score = Hash[User.select("users.id, SUM(guesses.score) AS total_score").joins(:guesses).group(:user_id).map{|u| [u.id, u.total_score]}]
user_ids_to_guesses_count = User.joins(:guesses).where.not(guesses: {score: nil}).group(:user_id).count
Edit: To display them, you could do like so:
<%- User.select(:id, :name).find_each do |u| -%>
Name: <%= u.name %>
Picks Correct: <%= user_ids_to_pick_em_score[u.id] %>/<%= user_ids_to_pick_ems_count[u.id] %>
Total Score: <%= user_ids_to_guesses_score[u.id] %>/<%= user_ids_to_guesses_count[u.id] %>
<%- end -%>

What is the best way to merge 2 tables with Active Record and Mysql

We need to allow users to customize their entities like products... so my intention was to have a product table and a custom_product table with just the information the users are allowed to change.
When a client goes to the product I want to merge the information, means I want to merge the two tables - the custom overwrites the default Products table.
I know that in mysql there exists a ifnull(a.title, b.title) way but I was wondering if there is any nice and efficient way to solve this in Rails 4 with Active Record. Assume that the products and custom products table have just 2 columns, ID and TITLE
I think you can convert both objects to JSON and then handle their params as a hash, using the merge method:
class Product
end
class Customization
belongs_to :product
end
a = Product.find(...)
b = a.customization
c = JSON(a.to_json).merge(JSON(b.to_json).reject!{|k,v| v.nil?})
Therefore c will contain all params from Product eventually overridden by those in Customization which are not nil.
If you still want to use a Product object with hybrid values (taken from Customization) you can try this:
a.attributes = a.attributes.merge(b.attributes.reject!{|k,v| v.nil?})
In this case a will still be a Product instance. I would recommend to keep the same attributes in both models when doing this.

How to do a MYSQL conditional select statement

Background
I'm faced with the following problem, relating to three tables
class_sectors table contains three categories of classes
classes table contains a list of classes students can attend
class_choices contains the first, second and third class choice of the student, for each sector. So for sector 1 Student_A has class_1 as first choihce, class_3 as second choice and class_10 as third choice for example, then for sector 2 he has another three choices, etc...
The class_choices table has these columns:
kp_choice_id | kf_personID | kf_sectorID | kf_classID | preference | assigned
I think the column names are self explanatory. preference is either 1, 2 or 3. And assigned is a boolean set to 1 once we have reviewed a student's choices and assigned them to a class.
Problem:
Writing an sql query that tells the students what class they are assigned to for each sector. If their class hasn't been assigned, it should default to show their first preference.
I have actually got this to work, but using two (very bloated??) sql queries as follows:
$choices = $db -> Q("SELECT
*, concat_ws(':', `kf_personID`, `kf_sectorID`) AS `concatids`
FROM
`class_choices`
WHERE
(`assigned` = '1')
GROUP BY
`concatids`
ORDER BY
`kf_personIDID` ASC,
`kf_sectorID` ASC;");
$choices2 = $db -> Q("SELECT
*, concat_ws(':', `kf_personID`, `kf_sectorID`) AS `concatids`
FROM
`class_choices`
WHERE
`preference` = '1'
GROUP BY
`concatids`
HAVING
`concatids` NOT IN (".iimplode($choices).")
ORDER BY
`kf_personID` ASC,
`kf_sectorID` ASC;");
if(is_array($choices2)){
$choices = array_merge($choices,$choices2);
}
Now $choices does have what I want.
But I'm sure there is a way to simplify this, merge the two SQL queries, and so it's a bit more lightweight.
Is there some kind of conditional SQL query that can do this???
Your solution uses two steps to enable you to filter the data as needed. Since you are generating a report, this is a pretty good approach even if it looks a bit more verbose than you might like.
The advantage of this approach is that it is much easier to debug and maintain, a big plus.
To improve the situation, you need to consider the data structure itself. When I look at the class_choices table, I see the following fields: kf_classID, preference, assigned which contain the key information.
For each class, the assigned field is either 0 (default) or 1 (when the class preference is assigned for the student). By default, the class with preference = 1 is the assigned one since you display it in the report when assigned=0 for all the student's class choices in a particular sector.
The data model could be improved by imposing a business rule as follows:
For preference=1 set the default value assigned=1. When the class selection process
takes place, and if the student gets assigned the 2nd or 3rd choice, then preference 1 is unassigned and the alternate choice assigned.
This means a bit more code in the application but it makes the reporting a bit easier.
The source of the difficulty is that the assignment process does not explicitly assign the 1st preference. It only updates assigned if the student cannot get the 1st choice.
In summary, your SQL is good and the improvements come from taking another look at the data model.
Hope this helps, and good luck with the work!

Is it better to use database polling or events for the following system?

I'm working on an ordering system that works exactly the way Netflix's service works (see end of this question if you're not familiar with Netflix). I have two approaches and I am unsure which approach is the right one; one relies on database polling and the other is event driven.
The following two approaches assume this simplified schema:
member(id, planId)
plan(id, moviesPerMonthLimit, moviesAtHomeLimit)
wishlist(memberId, movieId, rank, shippedOn, returnedOn)
Polling: I would run the following count queries in wishlist
Count movies shippedThisMonth (where shippedOn IS NOT NULL #memberId)
Count moviesAtHome (where shippedOn IS NOT NULL, and returnedOn IS NULL #memberId)
Count moviesInList (#memberId)
The following function will determine how many movies to ship:
moviesToShip = Min(moviesPerMonthLimit - shippedThisMonth, moviesAtHomeLimit - moviesAtHome, moviesInList)
I will loop through each member, run the counts, and loop through their list as many times as moviesToShip. Seems like a pain in the neck, but it works.
Event Driven: This approach involves adding an extra column "queuedForShipping" and marking it to 0,1 every time an event takes place. I will do the following counts:
Count movies shippedThisMonth (where shippedOn IS NOT NULL #memberId)
Count moviesAtHome (where shippedOn IS NOT NULL, and returnedOn IS NULL #memberId)
Count moviesQueuedForShipping (where queuedForShipping = 1, #memberId)
Instead of using min, I have to use the following if statements
If moviesPerMonthLimit > (shippedThisMonth + moviesQueuedForShipping)
AND IF moviesAtHomeLimit > (moviesAtHome + moviesQueuedForShipping))
If both conditions are true, I will select a row from wishlist where queuedForShippinh = 0, and set it's queuedForShipping to 1. I will run this function every time someone adds, deletes, reorders their list. When it's time to ship, I would select #memberId where queuedForShipping = 1. I would also run this when updating shippedAt and returnedAt.
Approach one is simple. It also allows members to mess around with their ranks until someone decides to run the polling. That way what to ship is always decided by rank. But ppl keep telling polling is bad.
The event driven approach is self-sustaining, but it seems like a waste of time to ping the database with all those counts every time a person changes their list. I would also have to write to the column queuedForShipment. It also means when a member re-ranks their list and they have pending shipments (shippedAt IS NULL, queuedForShipping = 1) I would have to update those rows and set queuedForShipping back to 1 based on the new ranks. (What if someone added 5 movies, and then suddenly went to change the order? Well, queuedForShipment would already be set to 1 on the first two movies he or she added)
Can someone please give me their opinion on the best approach here and the cons/advantages of polling versus event driven?
Netflix is a monthly subscription service where you create a movie list, and your movies are shipped to you based on your service plan limits.
Based on what you described, there's no reason to keep the data "ready to use" (event) when you can create it very easily when needed (poll).
Reasons to cache it:
If you needed to display the next item to the user.
If the detailed data was being removed due to some retention policy.
If the polling queries were too slow.

DynamicQuery: How to select a column with linq query that takes parameters

We want to set up a directory of all the organizations working with us. They are incredibly diverse (government, embassy, private companies, and organizations depending on them ). So, I've resolved to create 2 tables. Table 1 will treat all the organizations equally, i.e. it'll collect all the basic information (name, address, phone number, etc.). Table 2 will establish the hierarchy among all the organizations. For instance, Program for illiterate adults depends on the National Institute for Social Security which depends on the Labor Ministry.
In the Hierarchy table, each column represents a level. So, for the example above, (i)Labor Ministry - Level1(column1), (ii)National Institute for Social Security - Level2(column2), (iii)Program for illiterate adults - Level3(column3).
To attach an organization to an hierarchy, the user needs to go level by level(i.e. column by column). So, there will be at least 3 situations:
If an adequate hierarchy exists for an organization(for instance, level1: US Embassy), that organization can be added (For instance, level2: USAID).--> US Embassy/USAID, and so on.
How about if one or more levels are missing? - then they need to be added
How about if the hierarchy need to be modified? -- not every thing need to be modified.
I do not have any choice but working by level (i.e. column by column). I does not make sense to have all the levels in one form as the user need to navigate hierarchies to find the right one to attach an organization.
Let's say, I have those queries in my repository (just that you get the idea).
Query1
var orgHierarchy = (from orgH in db.Hierarchy
select orgH.Level1).FirstOrDefault;
Query2
var orgHierarchy = (from orgH in db.Hierarchy
select orgH.Level2).FirstOrDefault;
Query3, Query4, etc.
The above queries are the same except for the property queried (level1, level2, level3, etc.)
Question: Is there a general way of writing the above queries in one? So that the user can track an hierarchy level by level to attach an organization.
In other words, not knowing in advance which column to query, I still need to be able to do so depending on some conditions. For instance, an organization X depends on Y. Knowing that Y is somewhere on the 3rd level, I'll go to the 4th level, linking X to Y.
I need to select (not manually) a column with only one query that takes parameters.
=======================
EDIT
As I just said to #Mark Byers, all I want is just to be able to query a column not knowing in advance which one. Check this out:
How about this
Public Hierarchy GetHierarchy(string name)
{
var myHierarchy = from hierarc in db.Hierarchy
where (hierarc.Level1 == name)
select hierarc;
retuen myHierarchy;
}
Above, the query depends on name which is a variable. It mighbe Planning Ministry, Embassy, Local Phone, etc.
Can I write the same query, but this time instead of looking to much a value in the DB, I impose my query to select a particular column.
var myVar = from orgH in db.Hierarchy
where (orgH.Level1 == "Government")
select orgH.where(level == myVariable);
return myVar;
I don't pretend that select orgH.where(level == myVariable) is even close to be valid. But that is what I want: to be able to select a column depending on a variable (i.e. the value is not known in advance like with name).
Thanks for helping
How about using DynamicQueryable?
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
Your database is not normalized so you should start by changing the heirarchy table to, for example:
OrganizationId Parent
1 NULL
2 1
3 1
4 3
To query this you might need to use recursive queries. This is difficult (but not impossible) using LINQ, so you might instead prefer to create a parameterized stored procedure using a recursive CTE and put the query there.