Translate SQL query to Ruby on Rails

Translate SQL query to Ruby on Rails - mysql

I need to convert a relatively simple query to display a total quiz average for a given user in a table set up in Rails/HAML. We have users take quizzes, record the scores, and display the average per quiz. We now want to total average of all quizzes. Easy:
SELECT (ROUND(AVG(`score`*100), 1)) FROM `quiz_results` WHERE `user_id`=$user
The results need to display in a table cell that is already set up, but I cannot figure this out.
Perhaps this line will help. It's pre-existing code that calculates the average of a particular quiz for that user:
%td.separate="#{(((lesson.quiz_results.average('score', :conditions => "user_id = #{#user.id}")) * 100).to_i)}%"
I have Rails 2.3.x.

Well, as i can see now - all you need is to remove particular quiz restriction, which is imposed by association usage lesson.quiz_results - instead of it just use model class, which is most likely QuizResult.
And, also, there is tiny bug in your existing code - .to_i will rounding down, you should use .round. See the difference:
irb(main):002:0> 1.6.to_i
=> 1
irb(main):003:0> 1.6.round
=> 2
So, full code should be:
(QuizResult.average('score', :conditions => "user_id = #{#user.id}") * 100).round
(I also removed some unnecessary brackets)

Related

Thinking sphinx ranking and statistics

I'm trying to set up an ability to get some numbers from my Sphinx indexes, but not sure how to get the info I want.
I have a mysql db with articles, sphinx index set up for that db and full text search, all working. What I want is to get some numbers:
How many times search text (keyword, or key phrase) appears over all articles for all time (more likely limited to "articles from time interval from X and to Y")
Same as previous but for how many times 2 keywords or keyphrases (so "x AND y") appear in same articles
I was doing something similar to first manually using bat file I made
indexer ind_core -c c:\%SOME_PATH%\development.sphinx.conf --buildstops stats.txt 10000 --buildfreqs
Which generated me a txt with all repeating keywords and how often they appear at early development stages, which helped to form a list of keywords I'm interested in. Now I'm trying to do the same but just for a finite list of predetermined keywords and integrated into my rails project to be able to build charts in future.
I tried running some queries like
#testing = Article.search 'Keyword1 AND Keyword2', :ranker => :wordcount
but I'm not sure how it works and how to process the result, as well as if that's what I'm looking for.
Another approach I tried was manual mysql queries such as
SELECT id,title,WEIGHT() AS w FROM ind_core WHERE MATCH('#title keyword1 | keyword2') OPTION ranker=expr('sum(hit_count)');
but I'm not sure how to process results from here either (as well as how to actually implement it into my existing rails project), and it's limited to 20 lines per query (which I think I can change somewhere in settings?). But at least looking at mysql results what I'm interested in is hit_count over all articles (or all articles from set timeframe).
Any ideas on how to do this?
UPDATE:
Current way I found was to add
#testing = Article.search params[:search], :without => {:is_active => false}, :ranker => :bm25
to controller with some conditions (so it doesn't bug out from nil search). :is_active is my soft delete flag, don't want to search deleted entries, so don't mind it. And in view I simply displayed
<%= #testing.total_entries %>
Which if I understand it correct shows me number of matches sphinx found (so pretty much what I was looking for).

So, to figure out the number of hits per document, you're pretty much on the right track, it's just a matter of getting it into Ruby/Thinking Sphinx.
To get the raw Sphinx results (if you don't need the ActiveRecord objects):
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
… this will return an array of hashes, and you can use the weight() string key for the hit count, and the sphinx_internal_id string key for the model's primary key (id is Sphinx's own primary key, which isn't so useful).
Or, if you want to use the ActiveRecord objects, Thinking Sphinx has the ability to wrap each search result in a helper object which passes appropriate methods through to the underlying model instances, but lets weight respond with the values from Sphinx:
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()"; ""
search.context[:panes] << ThinkingSphinx::Panes::WeightPane
search.each do |article|
puts article.weight
end
Keep in mind that panes must be added before the search is evaluated, so if you're testing this in a Rails console, you'll want to avoid letting the console inspect the search variable (which I usually do by adding ; "" at the end of the initial search call.
In both of these cases, as you've noted, the search results are paginated - you can use the :page option to determine which page of results you want, and :per_page to determine the number of records returned in each request. There is a standard limit of 1000 results overall, but that can be altered using the max_matches setting.
Now, if you want the number of times the keywords appear across all Sphinx records, then the best way to do that while also taking advantage of Thinking Sphinx's search options, is to get the raw results of an aggregate SUM - similar to the first option above.
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "SUM(weight()) AS count",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
search.first["count"]

yii pagination issue trying to use 2 criterias

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.

So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Top level MySQL statistics

Have not been able to find any information on this, I could do this in its own but I feel keeping it in the query might be the best option, if its possible.
Basically I want to try to add a top level "statistics" portion of a query.
So when I get the results I will see it like so
num_rows = 900
distinct_col = 9
results = array()
This way I can loop the results normally, and then pull out information that I would only need once outside of it. Is this possible?
EDIT:
I am not looking for the normal mysql statistics like num_rows exactly. But in a case where lets say you limit the results to ten, num_rows would return 10, but you want total results, so 900. In most cases I would just use another query and look just for the amount, however combining it all into one query logically seems faster for me. There is also more then just the num_rows I may need, say they are all products and have a specific category, I would need to count the amount of categories all items fall under. So looping the raw results when there is only one result for those columns is sillyness.
EDIT 2:
To clarify further I need to get some counts on some columns, and maybe a min-max result on a join. Having it return on every loop would work, but the same exact return uselessly returning on every loop when its only needed once does not seem logical. I am no MySQL expert and am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Here is a PHP return example:
array(
[num_rows] => 900,
[categories] => 9,
[min_price] => 400,
[max_price] => 900,
[results] => array(
[0] => //row array
[1] -> //row array
)
);
Mysql returns its default num rows before you "fetch" the results, having custom results added there may be sufficient.

Dunno why do you need that but that's very easy to get
Assuming you are using safeMysql (though you can use whatever way to get data into array)
$results = $db->getAll("SELECT * FROM t");
$num_rows = count($results);
$num_cols = count($results[0]);
that's all
I am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Yes, you are.
Nothing wrong with getting aggregated data with every loop.
As for the count beyond LIMITs - when you need it, you can use mysql's SQL_CALC_FOUND_ROWS / FOUND_ROWS() feature

How do I calculate the importance/weight of input based on users reputation?

I have a couple systems which contain a users' table along with some form of karma/weight/reputation. Sometimes it's the number of posts a user has made, sometimes it's the number of up/down votes a user has received across all their activity on the site.
USER {
id int
name string
karma int
}
How do I use these numbers to calculate that user's "weight" or "authority"? For example, the vote of one long-time member is often worth much more than 4 votes from brand new users.
I was thinking about adding up the total points/karma/reputation of all members and then trying to come up with a 1-100 scale.
SUM(user.points) / COUNT(user.*) = average user points
Then something like
CEIL(userA.points / average user points) = their weight on an issue
However, there also needs to be a curve on the points this way as I don't want someone with 5,000 posts/karma to out weigh 20 new users votes.

Mathematically, your best bet is to weight by the log of the percentile ranking of user in question. However, that is painful in SQL.
Simpler would be to cheat and assume the mean is the same as the median (a very bad assumption statistically, but much simpler programmatically):
SELECT 1 - log10(SELECT COUNT (*) FROM user
WHERE (SUM(user.points) / COUNT(user.*)) < user.points)
/ SELECT (COUNT (*) from user))
In this way, your top 10% of karma would have one and a half the impact of your average user, almost twice the impact of a noob.
Changing the log base would scale this, obviously, where natural log (log() in mysql) would give the upper 10% 3 times as much impact as a noob, and twice the impact as average. Log2() is even more extreme. (Note: subtraction is required because the log will be negative.)
If you want a more severe effect you might try squaring the log. (Note: squaring makes the log squared positive, so addition is appropriate here.)
If you want a hyperprecise rule, you can go into standard deviations, but the sql gets cumbersome and slow. It all depends on how far down the rabbit hole you want to go....

There are probably some resources that can provide you with parameters for this, but you should probably decide exactly what you want rather than using some predefined model. I suggest you define some rules for which sets of users should be equivalent or which should outweigh each other (e.g. 10 0 karma users = 1 5k karma user) (equivalence is much easier to work with), which will very quickly produce parameters for some chosen equation.
Using log (as already suggested), some (fractional) power (like square root) or even just linear can work.
I suggest something like newKarma = a.karma^b + c, and it shouldn't be to difficult to solve a, b and c. I suggest you pick b rather than trying to calculate it. Using new users (with karma = 0) should make this quite easy to solve. Guessing values to get close to what you want can be easier than determining them mathematically (since some rules together won't fit any simple equation).
Note that c above is an offset to karma, which will give many new users more total karma than high-karma users. You may also want to think about a.(karma + c)^b, or a.(karma + c)^b + d. Analysing the rules you defined should tell you which one to use.
UPDATE: Added alternatives for c
EDIT: You have some options for SQL. A temp table (with sums) might actually be the fastest. You can also just use a view. A join on the same table might also be possible, though I'm not sure. Using a view would look something like: (for some chosen a,b,c and d) (you may also want to add indices to the view)
Votes(issueID, userID) // table structure
User(userID, karma, ...) // table structure
CREATE VIEW Sums AS
SELECT issueID, SUM(1*POWER(karma + 2, 3) + 4) AS sumVal
FROM Votes JOIN User ON User.userID = Votes.userID
GROUP BY issueID
Query:
SELECT (1*POWER(karma + 2, 3) + 4)/sumVal AS influenceOnIssue
FROM Votes JOIN User ON User.userID = Votes.userID
JOIN Sums on Sums.issueID = Votes.issueID
WHERE Votes.userID = #UserID AND Votes.issueID = #IssueID
A simplification may be to have a computed column that = 1*POWER(karma + 2, 3) + 4
The faster option would be to calculate the derived karma on insert/update, either by having an additional column and using triggers or just calculating in before you call insert/update, and calling insert/update with the new value.

Calculate new average afterSave (or something better?) in CakePHP

I'm building a rating system where a user can rate something from 1-5 stars.
I was wondering if there's a way to automatically calculate all of a specific item's ratings (from the ratings table where model='x' and foreign_key='y') on afterSave or something similar.
I can do it in the ratings_controller just fine... just thought it might be more ideal to be done automatically in the model. Can anyone point me in the right direction for this?
I would LOVE to hear that there's some kind of association setting in CakePHP that allows it to do this for you - something like:
//Rating model
var $belongsTo = array(
'Restaurant' => array(
'averageValue' => 'rating
)
);
But - I'm sure that's asking to much :)

if you want to save the average into a field in items table then afterSave would probably be the best solution right now.
The only thing cake can automatically do for you is keeping track of how many ratings an item has (counterCache), but not other aggregate functions.
virtualField may be good, but I have never used that for aggregate functions, so I'm not sure. Besides, if your ratings don't change often, it would put unnecessary work on the system.
In Rating model:
function afterSave($created){
$avgValue = $this->Query('SELECT AVG(rating) as rating FROM ratings WHERE ratings.restaurant_id = '.$this->restaurant_id);
$this->Restaurant->updateRatingAverage($this->restaurant_id,$avgValue[0][0]['rating']);
}
In Restaurant model
function updateRatingAverage($id,$avg){
$this->id = $id;
$this->field('your_average_field_here',$avg);
}
you might want to log the $avgValue to see how it's structured, but I think I got that right.

I need to post and answer because I can't yet comment.
So, your (for example) restaurants hasMany ratings? Try virtualField in restaurants model where you calculate the rating every time restaurants are fetched from database. There might be need for GROUP BY like ypercube mentioned if you need to use AVG().

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008