I've been developing web applications for a while and i am quite comfortable with mySql, in fact as many do i use some form of SQL almost every day. I like the syntax and a have zero problems writing queries or optimizing my tables. I have enjoyed this mysql api.
The thing that has been bugging me is Ruby on Rails uses ActiveRecord and migrates everything so you use functions to query the database. I suppose the idea being you "never have to look at SQL again". Maybe this isn't KISS (keep it simple stupid) but is the ActiveRecord interface really best? If so why?
Is development without having to ever write a SQL statement healthy? What if you ever have to look something up that isn't already defined as a rails function? I know they have a function that allows me to do a custom query. I guess really i want to know what people think the advantages are of using ActiveRecord over mySQL and if anyone feels like me that maybe this would be for the rails community what the calculator was to the math community and some people might forget how to do long division.
You're right that hiding the SQL behind ActiveRecord's layer means people might forget to check the generated SQL. I've been bitten by this myself: missing indexes, inefficient queries, etc.
What ActiveRecord allows is making the easy things easy:
Post.find(1)
vs
SELECT * FROM posts WHERE posts.id = 1
You, the developer, have less to type, and thus have less chances for error.
Validation is another thing that ActiveRecord makes easy. You have to do it anyway, so why not have an easy way to do it? With the repetitive, boring, parts abstracted out?
class Post < ActiveRecord::Base
validates_presence_of :title
validates_length_of :title, :maximum => 80
end
vs
if params[:post][:title].blank? then
# complain
elsif params[:post][:title].length > 80 then
# complain again
end
Again, easy to specify, easy to validate. Want more validation? A single line to add to an ActiveRecord model. Convoluted code with multiple conditions is always harder to debug and test. Why not make it easy on you?
The final thing I really like about ActiveRecord instead of SQL are callbacks. Callbacks can be emulated with SQL triggers (which are only available in MySQL 5.0 or above), while ActiveRecord has had callbacks since way back then (I started on 0.13).
To summarize:
ActiveRecord makes the easy things easy;
ActiveRecord removes the boring, repetitive parts;
ActiveRecord does not prevent you from writing your own SQL (usually for performance reasons), and finally;
ActiveRecord is fully portable accross most database engines, while SQL itself is not (sometimes).
I know in your case you are talking specifically about MySQL, but still. Having the option is nice.
The idea here is that by putting your DB logic inside your Active Records, you're dealing with SQL code in one place, rather than spread all over your application. This makes it easier for the various layers of your application to follow the Single Responsibility Principle (that an object should have only one reason to change).
Here's an article on the Active Record pattern.
Avoiding SQL helps you when you decide to change the database scheme. The abstraction is also necessary for all kinds of things, like validation. I doesn't mean you don't get to write SQL: you can always do that if you feel the need for it. But you don't have to write a 5 line query where all you need is user.save. It's the rails philosophy to avoid unnecessary code.
Related
I understand the basic process of SQL injection attack. My question is related to SQL injection prevention. I was told that one way to prevent such an attack is by frequently changing the table name! Is that possible?
If so, can someone provide me a link to read about it more because I couldn't find an explanation about it on the web.
No. That makes no sense. You'd either have to change every line of code that references the table or you'd have to leave in place something like a view with the old table name that acts exactly like the old table. No reasonable person would do that. Plus, it's not like there are a ton of reasonable names for tables so you'd be doing crazy things like saying table A stores customer data and AA stores employer data and AAA was the intersection between customers and employers.
SQL injection is almost comically simple to prevent. Use prepared statements with bind variables. Don't dynamically build SQL statements. Done. Of course, in reality, making sure that the new developer doesn't violate this dictum either because they don't know any better or because they can hack something out in a bit less time if they just do a bit of string concatenation makes it a bit more complex. But the basic approach is very simple.
Pffft. What? Frequently changing a table name?
That's bogus advice, as far as "preventing SQL Injection".
The only prevention for SQL Injection vulnerabilities is to write code that isn't vulnerable. And in the vast majority of cases, that is very easy to do.
Changing table names doesn't do anything to close a SQL Injection vulnerability. It might make a successful attack vector less repeatable, requiring an attacker to make some adjustments. But it does nothing prevent SQL Injection.
As a starting point for research on SQL Injection, I recommend OWASP (Open Web Application Security Project)
Start here: https://www.owasp.org/index.php/SQL_Injection
If you run across "changing a table name" as a mitigation, let me know. I've never run across that as a prevention or mitigation for SQL Injection vulnerability.
Here's things you can do to prevent SQL injection:
Use an ORM that encapsulates your SQL calls and provides a friendly layer to your database records. Most of these are very good at writing high quality queries and protecting you from injection bugs simply because of how you use them.
Use prepared statements with placeholder values whenever possible. Write queries like this:
INSERT INTO table_name (name, age) VALUES (:name, :age)
Be very careful to properly escape any and all values that are inserted into SQL though any other method. This is always a risky thing to do, so any code you do write like this should have any escaping you do made blindingly obvious so that a quick code review can verify it's working properly. Never hide escaping behind abstractions or methods with cute names like scrub or clean. Those methods might be subtly broken and you'd never notice.
Be absolutely certain any table name parameters, if dynamic, are tested versus a white list of known-good values. For example, if you can create records of more than one type, or put data into more than one table ensure that the parameter supplied is valid.
Trust nothing supplied by the user. Presume every single bit of data is tainted and hostile unless you've taken the trouble to clean it up. This goes doubly for anything that's in your database if you got your database from some other source, like inheriting a historical project. Paranoia is not unfounded, it's expected.
Write your code such that deleting a line does not introduce a security problem. That means never doing this:
$value = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value')");
You're one line away from failure here. If you must do this, write it like so:
$value_escaped = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value_escaped')");
That way deleting the line that does the escaping does not immediately cause an injection bug. The default here is to fail safely.
Make every effort to block direct access to your database server by aggressively firewalling it and restricting access to those that actually need access. In practice this means blocking port 3306 and using SSH for any external connections. If you can, eliminate SSH and use a secured VPN to connect to it.
Never generate errors which spew out stack traces that often contain information highly useful to attackers. For example, an error that includes a table name, a script path, or a server identifier is providing way too much information. Have these for development, and ensure these messages are suppressed on production servers.
Randomly changing table names is utterly pointless and will make your code a total nightmare. It will be very hard to keep all your code in sync with whatever random name the table is assuming at any particular moment. It will also make backing up and restoring your data almost impossible without some kind of decoder utility.
Anyone who recommends doing this is proposing a pointless and naïve solution to a an already solved problem.
Suggesting that randomly changing the table names fixes anything demonstrates a profound lack of understanding of the form SQL injection bugs take. Knowing the table name is a nice thing to have, it makes your life easier as an attacker, but many attacks need no knowledge of this. A common attack is to force a login as an administrator by injecting additional clauses in the WHERE condition, the table name is irrelevant.
I want to sort a collection using a custom proc. I know Rails has the order method, but I don't believe this works with procs, so I'm just using sort_by instead. Can someone go into detail about the speed I'm sacrificing, or suggest alternatives? My understanding is that the exact implementation of order will depend on the adapter (which, in my case, is mysql), but I'm wondering if there are ways to take advantage of this to speed the sort up.
As an example, I want to do this:
Model.order(|m| m.get_priority )
but am forced to do this
Model.all.sort_by{|m| m.get_priority}
sort_by is implemented at Ruby level and it's part of Ruby, not ActiveRecord. Therefore, the sorting will not be executed by the database, rather by the Ruby interpreter.
This is not an optimal solution as DBMS are generally more efficient at sorting data as they may use existing indexes.
If get_priority performs some sort of computation outside the database, then you don't have a lot of alternatives to the code you posted here unless you want to cache the result of the get_priority as a column in the Model table and sort against it using the ActiveRecord order statement that will result in an ORDER BY SQL statement.
I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.
I am a .net guy with 6 years of experience. Recently I started working on ROR project and realized that stored procedures/sql functions are not being used at all. On inquiring about it I got to know that it is common practice and in general nobody in team writes a sql queries at all, everything is done using ActiveRecord.
I googled about any possible reasons for this but didn't find much information. So I am just curios to know
Is it common practice that stored procedures/sql functions are not preferred to use?
What are pros and cons of using stored procedures?
Is it common practice that stored procedures/sql functions are not preferred to use?
It is very common, most Rails apps will never need to use anything more than ActiveRecord.
One of the chief philosophies behind Rails is that it's more important to get a working product to market today than it is to get a "fast" product to market 6 months from now. Your product will almost certainly never be popular enough for performance to be a concern. If that does become a problem, you can shore up the performance side of things later, but the immediate concern is to be able to build an app quickly, and to be able to rapidly refactor some or all of it in response to your market.
What are pros and cons of using stored procedures?
They're slower to write and more difficult to change, and therefore front-load your development costs. However, they can be faster to execute.
It might not be the "rails way" to use stored procedures but its also once not "The rails way" to use foreign key costraints, and we all know what a monumentally bad design decision that turned out to be.
So I would take "the rails way" with a grain of salt. If stored procedures work for you, use them.
Heres what I do. Understanding that ORMs often dont really 'understand' stored procedures without slightly more in depth magic, I avoid using it directly, but instead create a materialized view that encapsulates the stored procedure and then presents it as a regular table. Correctly set up, this gives the ORM something it can better understand whilst still leveraging the advantages of keeping database logic inside the layer its supposed to live in, the database, an engine that will always outperform the web framework at data crunching.
You can call stored procedures from Rails, but you are going to lose most of the benefits of ActiveRecord, as the standard generated SQL will not work. You can use the native database connection and call it, but it's going to be a leaky abstraction. You may want to consider DataMapper.
taken from >> Using Stored Procedures in Rails
To sum up, its not the "RAILS WAY" to use stored procedures.
Is it common practice that stored procedures/sql functions are not preferred to use?
True. Building queries with Active Record allow you to manage them all in your application code.
What are pros and cons of using stored procedures?
Pros: you can hide complex query logic from your application code.
Cons: you have to create and execute a migration if you want to rewrite a procedure.
See this example on hiding logic in a database view, also applicable to procedures.
Pros Example:
You need to select all hotels with rooms available between start_time and end_time. Every hotel has total_rooms (integer attribute), hotel_times (entity that define operating hours for a hotel) and some bookings (entity that define a user that booked a room in a hotel). Some hotels are big and offer daily bookings. Other hotels are small and offer hourly bookings. You ask the user when he wants to book, which can be either a date or a date-with-time.
This involves some joins and sub-queries and would create a big ugly piece of Active Record code. Instead, you can write a procedure and call it like this:
Hotel.find_by_sql ['SELECT * FROM hotels_available_between(?, ?)', start_time, end_time]
Wrap it in a scope and get more ruby-ish:
class Hotel < ActiveRecord::Base
scope :available_between, -> start_time, end_time do
find_by_sql ['SELECT * FROM hotels_available_between(?, ?)', start_time, end_time]
end
end
Hotel.available_between start_time, end_time
In Rails 3 with mysql, suppose I have two models, Customers and Purchases, obviously purchase belongs_to customer. I want to find all the customers with 2 orders or more. I can simply say:
Customer.includes(:purchases).all.select{|c| c.purchases.count > 2}
Effectively though, the line above makes query on the magnitude of Customer.all and Purchase.all, then does the "select" type processing in ruby. In a large database, I would much prefer to avoid doing all this "select" calculation in ruby, and have mysql do the processing and only give me the list of qualified customers. That is both much faster (since mysql is more tuned to do this) and significantly reduces bandwidth from the database.
Unfortunately I am unable to conjure up the code with the building blocks in rails(where, having, group, etc) to make this happen, something on the lines of (psudo-code):
Customer.joins(:purchases).where("count(purchases) > 2").all
I will settle for straight MySql solution, though I much prefer to figure this out in the elegant framework of rails.
No need to install a gem to get this to work (though metawhere is cool)
Customer.joins(:purchases).group("customers.id").having("count(purchases.id) > ?",0)
The documentation on this stuff is fairly sparse at this point. I'd look into using Metawhere if you'll be doing any more queries that are similar to this. Using Metawhere, you can do this (or something similar, not sure if the syntax is exactly correct):
Customer.includes(:purchases).where(:purchases => {:count.gte => 2})
The beauty of this is that MetaWhere still uses ActiveRecord and arel to perform the query, so it works with the 'new' rails 3 way of doing queries.
Additionally, you probably don't want to call .all on the end as this will cause the query to ping the database. Instead, you want to use lazy loading and not hit the db until you actually require the data (in the view, or some other method that is processing actual data.)
This is a bit more verbose, but if you want Customers where count = 0 or a more flexible sql, you would do a LEFT JOIN
Customer.joins('LEFT JOIN purchases on purchases.customer_id = customers.id').group('customers.id').having('count(purchases.id) = 0').length