Stored procedures in Ruby on Rails - mysql

I am a .net guy with 6 years of experience. Recently I started working on ROR project and realized that stored procedures/sql functions are not being used at all. On inquiring about it I got to know that it is common practice and in general nobody in team writes a sql queries at all, everything is done using ActiveRecord.
I googled about any possible reasons for this but didn't find much information. So I am just curios to know
Is it common practice that stored procedures/sql functions are not preferred to use?
What are pros and cons of using stored procedures?

Is it common practice that stored procedures/sql functions are not preferred to use?
It is very common, most Rails apps will never need to use anything more than ActiveRecord.
One of the chief philosophies behind Rails is that it's more important to get a working product to market today than it is to get a "fast" product to market 6 months from now. Your product will almost certainly never be popular enough for performance to be a concern. If that does become a problem, you can shore up the performance side of things later, but the immediate concern is to be able to build an app quickly, and to be able to rapidly refactor some or all of it in response to your market.
What are pros and cons of using stored procedures?
They're slower to write and more difficult to change, and therefore front-load your development costs. However, they can be faster to execute.

It might not be the "rails way" to use stored procedures but its also once not "The rails way" to use foreign key costraints, and we all know what a monumentally bad design decision that turned out to be.
So I would take "the rails way" with a grain of salt. If stored procedures work for you, use them.
Heres what I do. Understanding that ORMs often dont really 'understand' stored procedures without slightly more in depth magic, I avoid using it directly, but instead create a materialized view that encapsulates the stored procedure and then presents it as a regular table. Correctly set up, this gives the ORM something it can better understand whilst still leveraging the advantages of keeping database logic inside the layer its supposed to live in, the database, an engine that will always outperform the web framework at data crunching.

You can call stored procedures from Rails, but you are going to lose most of the benefits of ActiveRecord, as the standard generated SQL will not work. You can use the native database connection and call it, but it's going to be a leaky abstraction. You may want to consider DataMapper.
taken from >> Using Stored Procedures in Rails
To sum up, its not the "RAILS WAY" to use stored procedures.

Is it common practice that stored procedures/sql functions are not preferred to use?
True. Building queries with Active Record allow you to manage them all in your application code.
What are pros and cons of using stored procedures?
Pros: you can hide complex query logic from your application code.
Cons: you have to create and execute a migration if you want to rewrite a procedure.
See this example on hiding logic in a database view, also applicable to procedures.
Pros Example:
You need to select all hotels with rooms available between start_time and end_time. Every hotel has total_rooms (integer attribute), hotel_times (entity that define operating hours for a hotel) and some bookings (entity that define a user that booked a room in a hotel). Some hotels are big and offer daily bookings. Other hotels are small and offer hourly bookings. You ask the user when he wants to book, which can be either a date or a date-with-time.
This involves some joins and sub-queries and would create a big ugly piece of Active Record code. Instead, you can write a procedure and call it like this:
Hotel.find_by_sql ['SELECT * FROM hotels_available_between(?, ?)', start_time, end_time]
Wrap it in a scope and get more ruby-ish:
class Hotel < ActiveRecord::Base
scope :available_between, -> start_time, end_time do
find_by_sql ['SELECT * FROM hotels_available_between(?, ?)', start_time, end_time]
end
end
Hotel.available_between start_time, end_time

Related

MySQL - Best methods to provide fast Dynamic filter support for large-scale database record lists?

I am curious what techniques Database Developers and Architects use to create dynamic filter data response Stored Procedures (or Functions) for large-scale databases.
For example, let's take a database with millions of people in it, and we want to provide a stored procedure "get-person-list" which takes a JSON parameter. Within this JSON parameter, we can define filters such as $.filter.name.first, $.filter.name.last, $.filter.phone.number, $.filter.address.city, etc.
The frontend (web solution) allows the user to define one or more filters, so the front-end can say "Show me everyone with a First name of Ted and last name of Smith in San Diego."
The payload would look like this:
{
"filter": {
"name": {
"last": "smith",
"first": "ted"
},
"address": {
"city": "san diego"
}
}
}
Now, what would the best technique be to write a single stored procedure capable of handling numerous (dozens or more) filter settings (dynamically) and returning the proper result set all with the best optimization/speed?
Is it possible to do this with CTE, or are prepared statements based on IF/THEN logic (building out the SQL to be executed based on filter value) the best/only real method?
How do big companies with huge databases and thousands of users write their calls to return complex dynamic lists of data as quickly as possible?
Everything Bill wrote is true, and good advice.
I'll take it a little further. You're proposing building a search layer into your system, which is fine.
You're proposing an interface in which you pass a JSON object to code inside the DBMS.That's not fine. That code will either have a bunch of canned queries handling the various search scenarios, or will have a mess of string-handling code that reads JSON, puts together appropriate queries, then uses MySQL's PREPARE statement to run them. From my experience that is, with respect, a really bad idea.
Here's why:
The stored-procedure language has very weak string-handling support compared to host languages. No sprintf. No arrays of strings. No join or implode operators. Clunky regex, and not always present on every server. You're going to need string handling to build search queries.
Stored procedures are trickier to debug, test, deploy, and maintain than ordinary application code. That work requires special skills and special access.
You will need to maintain this code, especially if your system proves successful. You'll add requirements that will require expanding your search capabilities.
It's impossible (seriously, impossible) to know what your actual application usage patterns will be at scale. You surely will, as a consequence of growth, find usage patterns that surprise you. My point is that you can't design and build a search system and then forget about it. It will evolve along with your app.
To keep up with evolving usage patterns, you'll need to refactor some queries and add some indexes. You will be under pressure when you do that work: People will be complaining about performance. See points 1 and 2 above.
MySQL / MariaDB's stored procedures aren't compiled with an optimizing compiler, unlike Oracle and SQL Server's. So there's no compelling performance win.
So don't use a stored procedure for this. Please. Ask me how I know this sometime.
If you need a search module with a JSON interface, implement it in your favorite language (php, C#, nodejs, java, whatever). It will be easier to debug, test, deploy, and maintain.
To write a query that searches a variety of columns, you would have to write dynamic SQL. That is, write code to parse your JSON payload for the filter keys and values, and format SQL expressions in a string that is part of a dynamic SQL statement. Then prepare and execute that string.
In general, you can't "optimize for everything." Trying to optimize when you don't know in advance which queries your users will submit is a nigh-impossible task. There's no perfect solution.
The most common method of optimizing search is to create indexes. But you need to know the types of search in advance to create indexes. You need to know which columns will be included, and which types of search operations will be used, because the column order in an index affects optimization.
For N columns, there are N-factorial permutations of columns, but clearly this is impractical because MySQL only allows 64 indexes per table. You simply can't create all the indexes needed to optimize every possible query your users attempt.
The alternative is to optimize queries partially, by indexing a few combinations of columns, and hope that these help the users' most common queries. Use application logs to determine what the most common queries are.
There are other types of indexes. You could use fulltext indexing, either the implementation built in to MySQL, or else supplement your MySQL database with ElasticSearch or similar technology. These provide a different type of index that effectively indexes everything with one index, so you can search based on multiple columns.
There's no single product that is "best." Which fulltext indexing technology meets your needs requires you to evaluate different products. This is some of the unglamorous work of software development — testing, benchmarking, and matching product features to your application requirements. There are few types of work that I enjoy less. It's a toss-up between this and resolving git merge conflicts.
It's also more work to manage copies of data in multiple datastores, making sure data changes in your SQL database are also copied into the fulltext search index. This involves techniques like ETL (extract, transform, load) and CDC (change data capture).
But you asked how big companies with huge databases do this, and this is how.
Input
I to that "all the time". The web page has a <form>. When submitted, I look for fields of that form that were filled in, then build
WHERE this = "..."
AND that = "..."
into the suitable SELECT statement.
Note: I leave out any fields that were not specified in the form; I make sure to escape the strings.
I'm walking through $_GET[] instead of JSON, so it is quite easy.
INDEXing
If you have columns for each possible fields, then it is a matter of providing indexes only for the most likely columns to search on. (There are practical and even hard-coded limits on Indexes.)
If you have stored the attributes in EAV table structure, you have my condolences. Search the [entitity-attribute-value] tag for many other poor soles who wandered into that swamp.
If you store the attributes in JSON, well that is likely to be an order of magnitude worse than EAV.
If you throw all the information in a FULLTEXT columns and use MATCH, then you can get enough speed for "millions" or rows. But it comes with various caveats (word length, stoplist, endings, surprise matches, etc).
If you would like to discuss further, then scale back your expectations and make a list of likely search keys. We can then discuss what technique might be best.

Complex filtering in rails app. Not sure complex sql is the answer?

I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.

Advice required - using entity framework with normalised data

I've recent gone through the process of revamping my database, normaising a lot of entities. Obviously I now have a few more tables than I had. A lot of data I use on the website is readonly so this is simple to denormalise using a view, however there are entities that could benefit from denormalised retrieval but still need to be updated.
Here's an example.
A User may be a Member
A Member may have a Profile
A Member may have an Account
In addition I have 3 further lookup tables.
In total there are 3 tables for User and 4 tables for Member.
Ideally, I can create 2 views from the above tables.
However, User needs to be updated as do the entities belonging to Member. Additionally there are 6 separate tables associated with Users/Members, i.e. FavouriteCategories that also need to be retreived and updated from time to time.
I'm struggling to come up with the best, most efficient way of doing this.
I could simply not use views and bring all the entities and lookups into the model, but I would be reliant on EF to produce the retreival queries. The stuff I've read suggest that EF is not best at dealing with joined data.
I could add both the view and tables, using the tables for updates only. This seems sloppy due to the duplication, complication of the model, as well as underutilising the EF model functionality.
Maybe I could use the readonly view for data retrieval and create stored procs. I believe that the process of using EF with stored procs is a bit of a hack, so I'd probably keep the stored procs distinct from EF and simply pass params and call the SP via traditional methods. This again seems like a bit of a halfway house.
I'm not that experienced with .net or EF, so would appreciate some solid advice on either the methods I've referred to above or any better technique to acheive this. I don't want to go hacking the edmx file at this stage because... well it's just wrong.
I have a few entities that would benefit from the right solution. The User example is amongst the simplest, so there's a lot to gain from the right approach.
Help and advice would be very much appreciated.
Do you want to use EF? If yes use either first approach with not using views at all and allowing EF to handle everything or the last approach with using views and mapping stored procedures for insert, update and delete operations.
Combining mapped views for reading and mapped tables for modifications is possible as well but it is mostly the first solution (allowing EF to handle everything) with additional views for some query optimization.
You will not find cleaner approaches. Are mentioned approaches are valid solution for your problem. The only question is if you want to write SQL yourselves (view and stored procedures) or let EF to do that.
The worst approach is using EF for querying and manual calling of stored procedures for updating but in some cases it can be also useful.

Best practises : is sql views really worth it? [duplicate]

This question already has answers here:
Why do you create a View in a database?
(25 answers)
Closed 8 years ago.
I am building a new web applications with data stored in the database. as many web applications, I need to expose data from complexe sql queries (query with conditions from multiple table). I was wondering if it could be a good idea to build my queries in the database as sql view instead of building it in the application? I mean what would be the benefit of that ? database performance? do i will code longer? debug longer?
Thank you
This can not be answered really objectively, since it depends on case by case.
With views, functions, triggers and stored procedures you can move part of the logic of your application into the database layer.
This can have several benefits:
performance -- you might avoid roundtrips of data, and certain treatment are handled more efficiently using the whole range of DBMS features.
consisness -- some treatment of data are expressed more easily with the DBMS features.
But also certain drawback:
portability -- the more you rely on specific features of the DBMS, the less portable the application becomes.
maintenability -- the logic is scattered across two different technologies which implies more skills are needed for maintenance, and local reasoning is harder.
If you stick to the SQL92 standard it's a good trade-off.
My 2 cents.
I think your question is a little bit confusing in what you are trying to achieve (Gain knowledge regarding SQL Views or how to structure your application).
I believe all database logic should be stored at the database tier, ideally in a stored procedure, function rather in the application logic. The application logic should then connect to the database and retrieve the data from these procedures/functions and then expose this data in your application.
One of the the benefits of storing this at the database tier is taking advantage of the execution plans via SQL Server (which you may not get by accessing it directly via the application logic). Another advantage is that you can separate the application, i.e. if the database needs to be changed you don't have to modify the application directly.
For a direct point on Views, the advantages of using them include:
Restrict a user to specific rows in a table.
For example, allow an employee to see only the rows recording his or her work in a labor- tracking table.
Restrict a user to specific columns.
For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information.
Join columns from multiple tables so that they look like a single table.
http://msdn.microsoft.com/en-us/library/aa214068(v=sql.80).aspx
Personally I prefer views, especially for reports/apps as if there are any problems in the data you only have to update a single view rather than re-building the app or manually editing the queries.
SQL views have many uses. Try first reading about them and then asking a more specific question:
http://odetocode.com/code/299.aspx
http://msdn.microsoft.com/en-us/library/ms187956.aspx
I have seen that views are used a lot to do two things:
Simplify queries, if you have a HUGE select with multiple joins and joins and joins, you can create a view that will have the same performance but the query will be only a couple of lines.
For security reason, if you have a table with some information that shouldn't be accessed for all the developers, you can create views and grant privileges to see the views and not the main table, I.E:
table 1: Name, Last_name, User_ID, credit_card, social_security. You create a view table.table view: name, last_name, user_id .
You can run into performance issues and constraints on the types queries you can run against a view.
Restrictions on what you can do with views.
http://dev.mysql.com/doc/refman/5.6/en/view-restrictions.html
Looks like the big one is that you cannot create an index on the view. This could cause a big performance hit if your final result table is large
This is also a good forum discussing views: http://forums.mysql.com/read.php?100,22967,22967#msg-22967
In my experience a well indexed table, using the right engine, and properly tuned (for example setting an appropriate key_buffer value) can perform better than a view.
Alternatively you could create a trigger that updates a table based on the results of other tables. http://dev.mysql.com/doc/refman/5.6/en/triggers.html
The technic you are saying is called denormalization. Cal Henderson, software engineer from Flickr, openly supports this technic.
In theory JOIN operation is one of the most expensive operations, so it is a good practice to denormalize, since you are transforming n queries with m JOIN in 1 query with m JOIN and n queries that select from a view.
That said, the best way is to test it for yourself. Because what could be incredibly good for Flickr may not be so good for your application.
Moreover, the performance of views may vary a lot from one RBDMS to another. For instance, depending on the RBDMS views can be updated when the orginal table is changed, can have indexes, etc.

MySQL: Views vs Stored Procedures

Since MySQL started supporting stored procedures, I've never really used them. Partly because I'm not a great query writer, partly because I often work with DBAs who make those choices for me, partly because I'm just comfy with What I Know.
In terms of doing data selection, specifically when considering a select that is essentially a de-normalization (joins) and aggregate (avg or max, subqueries w/counts, etc) selection of data, what is the right choice in MySQL 5.x? A view? Or a stored procedure?
Views I'm comfortable with - you know what your SELECT query is supposed to look like so you just create that, make sure it indexed and whatnot, then just do a CREATE VIEW [View] AS SELECT [...]. Then, in my application, I treat the view as a read-only table - it represents a de-normalized version of my normalized data.
What are the disadvantages here - if any? And what would change (gains or losses) if I moved that exact same SELECT statement into a stored procedure?
I'm hoping to find some good 'under the hood' info that has been difficult to find while googling this topic but really I welcome all comments and answers.
In my opinion, Stored Procedures should be used solely for data manipulation when the same routine needs to be used amongst several different application or for ETL between databases or tables, nothing more. Basically, do as much in code as you can until you run into the DRY principle or what you are doing is simply moving data from one place to another within the DB.
Views can be used to provide an alternate or simplified "view" into the data. As such, I would go with a view as you are not really manipulating the data as much as finding a different method of displaying it.
Not sure if it's an either/or choice. Stored procedures can do a wide variety of things that views would struggle (think populating data in temp table then running cursor on it and then doing aggregation and returning a result set).
Views on the other hand can hide complex sql / access rights and present a modified view of the schema.
I think both have a place in the scheme of things and both are useful for a successful schema implementation.
I use views for de-normalisation or output formatting and stored procedures for filtering and data manipulation (things that require parameter inputs) or iteration (cursors).
I often access a view inside a stored procedure when both de-normalisation and filtering are required.
One thing to note, at least with mysql view results are stored in a temporary table and unlike most decent database engines this table is not indexed, so if using to just simplify queries, view are great when your program is going to grab all of the results from the view, however if your then searching the results of that view, based on parameters it is incredibly slow especially if there are millions of records to sift through and even worse if the view is built on top of other views and so on.
A stored procedure however you can pass those search parameters in and run the query directly against the underlining (indexed) tables. the downside is the results will need to be fetched every time the procedure is run, which may also occur with a view anyway depending on server configuration.
so basically if your using a view try to minimise the number of results (if you then need to search it) else use a stored procedure.