MySQL: Views vs Stored Procedures - mysql

Since MySQL started supporting stored procedures, I've never really used them. Partly because I'm not a great query writer, partly because I often work with DBAs who make those choices for me, partly because I'm just comfy with What I Know.
In terms of doing data selection, specifically when considering a select that is essentially a de-normalization (joins) and aggregate (avg or max, subqueries w/counts, etc) selection of data, what is the right choice in MySQL 5.x? A view? Or a stored procedure?
Views I'm comfortable with - you know what your SELECT query is supposed to look like so you just create that, make sure it indexed and whatnot, then just do a CREATE VIEW [View] AS SELECT [...]. Then, in my application, I treat the view as a read-only table - it represents a de-normalized version of my normalized data.
What are the disadvantages here - if any? And what would change (gains or losses) if I moved that exact same SELECT statement into a stored procedure?
I'm hoping to find some good 'under the hood' info that has been difficult to find while googling this topic but really I welcome all comments and answers.

In my opinion, Stored Procedures should be used solely for data manipulation when the same routine needs to be used amongst several different application or for ETL between databases or tables, nothing more. Basically, do as much in code as you can until you run into the DRY principle or what you are doing is simply moving data from one place to another within the DB.
Views can be used to provide an alternate or simplified "view" into the data. As such, I would go with a view as you are not really manipulating the data as much as finding a different method of displaying it.

Not sure if it's an either/or choice. Stored procedures can do a wide variety of things that views would struggle (think populating data in temp table then running cursor on it and then doing aggregation and returning a result set).
Views on the other hand can hide complex sql / access rights and present a modified view of the schema.
I think both have a place in the scheme of things and both are useful for a successful schema implementation.

I use views for de-normalisation or output formatting and stored procedures for filtering and data manipulation (things that require parameter inputs) or iteration (cursors).
I often access a view inside a stored procedure when both de-normalisation and filtering are required.

One thing to note, at least with mysql view results are stored in a temporary table and unlike most decent database engines this table is not indexed, so if using to just simplify queries, view are great when your program is going to grab all of the results from the view, however if your then searching the results of that view, based on parameters it is incredibly slow especially if there are millions of records to sift through and even worse if the view is built on top of other views and so on.
A stored procedure however you can pass those search parameters in and run the query directly against the underlining (indexed) tables. the downside is the results will need to be fetched every time the procedure is run, which may also occur with a view anyway depending on server configuration.
so basically if your using a view try to minimise the number of results (if you then need to search it) else use a stored procedure.

Related

MySQL - Best methods to provide fast Dynamic filter support for large-scale database record lists?

I am curious what techniques Database Developers and Architects use to create dynamic filter data response Stored Procedures (or Functions) for large-scale databases.
For example, let's take a database with millions of people in it, and we want to provide a stored procedure "get-person-list" which takes a JSON parameter. Within this JSON parameter, we can define filters such as $.filter.name.first, $.filter.name.last, $.filter.phone.number, $.filter.address.city, etc.
The frontend (web solution) allows the user to define one or more filters, so the front-end can say "Show me everyone with a First name of Ted and last name of Smith in San Diego."
The payload would look like this:
{
"filter": {
"name": {
"last": "smith",
"first": "ted"
},
"address": {
"city": "san diego"
}
}
}
Now, what would the best technique be to write a single stored procedure capable of handling numerous (dozens or more) filter settings (dynamically) and returning the proper result set all with the best optimization/speed?
Is it possible to do this with CTE, or are prepared statements based on IF/THEN logic (building out the SQL to be executed based on filter value) the best/only real method?
How do big companies with huge databases and thousands of users write their calls to return complex dynamic lists of data as quickly as possible?
Everything Bill wrote is true, and good advice.
I'll take it a little further. You're proposing building a search layer into your system, which is fine.
You're proposing an interface in which you pass a JSON object to code inside the DBMS.That's not fine. That code will either have a bunch of canned queries handling the various search scenarios, or will have a mess of string-handling code that reads JSON, puts together appropriate queries, then uses MySQL's PREPARE statement to run them. From my experience that is, with respect, a really bad idea.
Here's why:
The stored-procedure language has very weak string-handling support compared to host languages. No sprintf. No arrays of strings. No join or implode operators. Clunky regex, and not always present on every server. You're going to need string handling to build search queries.
Stored procedures are trickier to debug, test, deploy, and maintain than ordinary application code. That work requires special skills and special access.
You will need to maintain this code, especially if your system proves successful. You'll add requirements that will require expanding your search capabilities.
It's impossible (seriously, impossible) to know what your actual application usage patterns will be at scale. You surely will, as a consequence of growth, find usage patterns that surprise you. My point is that you can't design and build a search system and then forget about it. It will evolve along with your app.
To keep up with evolving usage patterns, you'll need to refactor some queries and add some indexes. You will be under pressure when you do that work: People will be complaining about performance. See points 1 and 2 above.
MySQL / MariaDB's stored procedures aren't compiled with an optimizing compiler, unlike Oracle and SQL Server's. So there's no compelling performance win.
So don't use a stored procedure for this. Please. Ask me how I know this sometime.
If you need a search module with a JSON interface, implement it in your favorite language (php, C#, nodejs, java, whatever). It will be easier to debug, test, deploy, and maintain.
To write a query that searches a variety of columns, you would have to write dynamic SQL. That is, write code to parse your JSON payload for the filter keys and values, and format SQL expressions in a string that is part of a dynamic SQL statement. Then prepare and execute that string.
In general, you can't "optimize for everything." Trying to optimize when you don't know in advance which queries your users will submit is a nigh-impossible task. There's no perfect solution.
The most common method of optimizing search is to create indexes. But you need to know the types of search in advance to create indexes. You need to know which columns will be included, and which types of search operations will be used, because the column order in an index affects optimization.
For N columns, there are N-factorial permutations of columns, but clearly this is impractical because MySQL only allows 64 indexes per table. You simply can't create all the indexes needed to optimize every possible query your users attempt.
The alternative is to optimize queries partially, by indexing a few combinations of columns, and hope that these help the users' most common queries. Use application logs to determine what the most common queries are.
There are other types of indexes. You could use fulltext indexing, either the implementation built in to MySQL, or else supplement your MySQL database with ElasticSearch or similar technology. These provide a different type of index that effectively indexes everything with one index, so you can search based on multiple columns.
There's no single product that is "best." Which fulltext indexing technology meets your needs requires you to evaluate different products. This is some of the unglamorous work of software development — testing, benchmarking, and matching product features to your application requirements. There are few types of work that I enjoy less. It's a toss-up between this and resolving git merge conflicts.
It's also more work to manage copies of data in multiple datastores, making sure data changes in your SQL database are also copied into the fulltext search index. This involves techniques like ETL (extract, transform, load) and CDC (change data capture).
But you asked how big companies with huge databases do this, and this is how.
Input
I to that "all the time". The web page has a <form>. When submitted, I look for fields of that form that were filled in, then build
WHERE this = "..."
AND that = "..."
into the suitable SELECT statement.
Note: I leave out any fields that were not specified in the form; I make sure to escape the strings.
I'm walking through $_GET[] instead of JSON, so it is quite easy.
INDEXing
If you have columns for each possible fields, then it is a matter of providing indexes only for the most likely columns to search on. (There are practical and even hard-coded limits on Indexes.)
If you have stored the attributes in EAV table structure, you have my condolences. Search the [entitity-attribute-value] tag for many other poor soles who wandered into that swamp.
If you store the attributes in JSON, well that is likely to be an order of magnitude worse than EAV.
If you throw all the information in a FULLTEXT columns and use MATCH, then you can get enough speed for "millions" or rows. But it comes with various caveats (word length, stoplist, endings, surprise matches, etc).
If you would like to discuss further, then scale back your expectations and make a list of likely search keys. We can then discuss what technique might be best.

Database design to create tables on the fly

I need to create dynamic tables in the database on the fly. For example, in the database I will have tables named:
Table
Column
DataType
TextData
NumberData
DateTimedata
BitData
Here I can add a table in the table named table, then I can add all the columns to that table in the columns table and associate a datatype to each column.
Basically I want to create tables without actually creating a table in the database. Is this even possible? If so, can you direct me to the right place so I can research? Also, I would prefer sql server or any free database software.
Thanks
What you are describing is an entity-attribute-value model (EAV). It is a very poor way to design a data model.
Although the data model is quite flexible, querying such a data model is quite complicated. You frequently end up having to self-join a table n times if you want to select or filter on n different attributes. That gets slow rather slow and becomes rather hard to optimize relatively quickly.
Plus, you generally end up building a lot of functionality that the database or your ORM would provide.
I'm not sure what the real problem you're having is, but the solution you proposed is the "database within a database" antipattern which makes so many people cringe.
Depending on how you're querying your data, if you were to structure things like you're planning, you'd either need a bunch of piece-wise queries which are joined in the middleware (slow) or one monster monolithic query (either slow or creates massive index bloat), if one is even possible.
If you must create tables on the fly, learn the CREATE TABLE ALTER TABLE and DROP TABLE DDL statements for the particular database engine you're using. Better yet, find an ORM that will do this for you. If your real problem is that you need to store unstructured data, check out MongoDB, Redis, or some of the other NoSQL variants.
My final advice is to write up the actual problem you're trying to solve as a separate question, and you'll probably learn a lot more.
Doing this with documents might be easier. Perhaps you should look at a noSQL solution such as mongoDB.
Or you can still create the Temporary tables but use a cronjob and create the Temporary tables every %% hours and rename it to the correct name after the query's are done. so your site is stil in the air
What you are trying to archive is not not bad but you must use it in the correct logic way.
*sorry for my bad english
I did something like this in LedgerSMB. While we use EAV modelling for a few things (where the flexibility is needed and the sort of querying we are doing is straight-forward, for example menu nodes use this in part), in general, you want to stay away from this as much as possible.
A better approach is to do all of what you are doing except for the data columns. Then you can (shock of shocks) just create the tables. This gives you a catalog of what you have added so your app knows this (and you can diff from the system catalogs if you ever have to check!) but at the same time you get actual relational modelling.
What we did in LedgerSMB was to have stored procedures that would accept a table name exists ('extends_' || name supplied). If so would add a column with the datatype required and write this to the application catalogs. This gives us relational modelling of extended attributes. At load time, the application loads the application catalogs and writes queries as appropriate at appropriate points to load/save the data. It works pretty well, actually.

How can I store a query in a MySQL column then subsequently use that query?

For example, say I'm creating a table which stores promo codes for a shipping site. I want to have the table match a promo code with the code's validator, e.g.
PROMO1: Order must have 3 items
PROMO2: Order subtotal must be greater than $50
I would like to store the query and/or routine in a column in the table, and be able to use the content to validate, in the sense of
SELECT * FROM Orders
WHERE Promo.ID = 2 AND Promo.Validation = True
Or something to that effect. Any ideas?
I wouldn't save the query in the database, there are far better possibilities.
You have to decide, which best fits your needs (it's not clear for me based on your question). You can use
Views
or
Prepared Statements
or
Stored Procedures
There's probably a better way to solve the issue, but the answer to your question is to write stored procedures that return the results you want. Where I work (and I hate this design), they actually store all queries and dml used by the application in stored procedures.
You can also dynamically build your queries using dynamic sql. For MySql, see the post below which might be of some help to you.
How To have Dynamic SQL in MySQL Stored Procedure
Otherwise, you can also store your queries in a string format in the database, and retrieve them and execute them using the EXECUTE statement, such as that post points out.
I'd personally keep away from designs like that though. Storing queries in XML isn't a bad alternative, and have your app be written to be extensible and configurable from XML, so you don't need to make code changes to add new validation logic, and instead just have to configure it in XML.

Advice required - using entity framework with normalised data

I've recent gone through the process of revamping my database, normaising a lot of entities. Obviously I now have a few more tables than I had. A lot of data I use on the website is readonly so this is simple to denormalise using a view, however there are entities that could benefit from denormalised retrieval but still need to be updated.
Here's an example.
A User may be a Member
A Member may have a Profile
A Member may have an Account
In addition I have 3 further lookup tables.
In total there are 3 tables for User and 4 tables for Member.
Ideally, I can create 2 views from the above tables.
However, User needs to be updated as do the entities belonging to Member. Additionally there are 6 separate tables associated with Users/Members, i.e. FavouriteCategories that also need to be retreived and updated from time to time.
I'm struggling to come up with the best, most efficient way of doing this.
I could simply not use views and bring all the entities and lookups into the model, but I would be reliant on EF to produce the retreival queries. The stuff I've read suggest that EF is not best at dealing with joined data.
I could add both the view and tables, using the tables for updates only. This seems sloppy due to the duplication, complication of the model, as well as underutilising the EF model functionality.
Maybe I could use the readonly view for data retrieval and create stored procs. I believe that the process of using EF with stored procs is a bit of a hack, so I'd probably keep the stored procs distinct from EF and simply pass params and call the SP via traditional methods. This again seems like a bit of a halfway house.
I'm not that experienced with .net or EF, so would appreciate some solid advice on either the methods I've referred to above or any better technique to acheive this. I don't want to go hacking the edmx file at this stage because... well it's just wrong.
I have a few entities that would benefit from the right solution. The User example is amongst the simplest, so there's a lot to gain from the right approach.
Help and advice would be very much appreciated.
Do you want to use EF? If yes use either first approach with not using views at all and allowing EF to handle everything or the last approach with using views and mapping stored procedures for insert, update and delete operations.
Combining mapped views for reading and mapped tables for modifications is possible as well but it is mostly the first solution (allowing EF to handle everything) with additional views for some query optimization.
You will not find cleaner approaches. Are mentioned approaches are valid solution for your problem. The only question is if you want to write SQL yourselves (view and stored procedures) or let EF to do that.
The worst approach is using EF for querying and manual calling of stored procedures for updating but in some cases it can be also useful.

Best practises : is sql views really worth it? [duplicate]

This question already has answers here:
Why do you create a View in a database?
(25 answers)
Closed 8 years ago.
I am building a new web applications with data stored in the database. as many web applications, I need to expose data from complexe sql queries (query with conditions from multiple table). I was wondering if it could be a good idea to build my queries in the database as sql view instead of building it in the application? I mean what would be the benefit of that ? database performance? do i will code longer? debug longer?
Thank you
This can not be answered really objectively, since it depends on case by case.
With views, functions, triggers and stored procedures you can move part of the logic of your application into the database layer.
This can have several benefits:
performance -- you might avoid roundtrips of data, and certain treatment are handled more efficiently using the whole range of DBMS features.
consisness -- some treatment of data are expressed more easily with the DBMS features.
But also certain drawback:
portability -- the more you rely on specific features of the DBMS, the less portable the application becomes.
maintenability -- the logic is scattered across two different technologies which implies more skills are needed for maintenance, and local reasoning is harder.
If you stick to the SQL92 standard it's a good trade-off.
My 2 cents.
I think your question is a little bit confusing in what you are trying to achieve (Gain knowledge regarding SQL Views or how to structure your application).
I believe all database logic should be stored at the database tier, ideally in a stored procedure, function rather in the application logic. The application logic should then connect to the database and retrieve the data from these procedures/functions and then expose this data in your application.
One of the the benefits of storing this at the database tier is taking advantage of the execution plans via SQL Server (which you may not get by accessing it directly via the application logic). Another advantage is that you can separate the application, i.e. if the database needs to be changed you don't have to modify the application directly.
For a direct point on Views, the advantages of using them include:
Restrict a user to specific rows in a table.
For example, allow an employee to see only the rows recording his or her work in a labor- tracking table.
Restrict a user to specific columns.
For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information.
Join columns from multiple tables so that they look like a single table.
http://msdn.microsoft.com/en-us/library/aa214068(v=sql.80).aspx
Personally I prefer views, especially for reports/apps as if there are any problems in the data you only have to update a single view rather than re-building the app or manually editing the queries.
SQL views have many uses. Try first reading about them and then asking a more specific question:
http://odetocode.com/code/299.aspx
http://msdn.microsoft.com/en-us/library/ms187956.aspx
I have seen that views are used a lot to do two things:
Simplify queries, if you have a HUGE select with multiple joins and joins and joins, you can create a view that will have the same performance but the query will be only a couple of lines.
For security reason, if you have a table with some information that shouldn't be accessed for all the developers, you can create views and grant privileges to see the views and not the main table, I.E:
table 1: Name, Last_name, User_ID, credit_card, social_security. You create a view table.table view: name, last_name, user_id .
You can run into performance issues and constraints on the types queries you can run against a view.
Restrictions on what you can do with views.
http://dev.mysql.com/doc/refman/5.6/en/view-restrictions.html
Looks like the big one is that you cannot create an index on the view. This could cause a big performance hit if your final result table is large
This is also a good forum discussing views: http://forums.mysql.com/read.php?100,22967,22967#msg-22967
In my experience a well indexed table, using the right engine, and properly tuned (for example setting an appropriate key_buffer value) can perform better than a view.
Alternatively you could create a trigger that updates a table based on the results of other tables. http://dev.mysql.com/doc/refman/5.6/en/triggers.html
The technic you are saying is called denormalization. Cal Henderson, software engineer from Flickr, openly supports this technic.
In theory JOIN operation is one of the most expensive operations, so it is a good practice to denormalize, since you are transforming n queries with m JOIN in 1 query with m JOIN and n queries that select from a view.
That said, the best way is to test it for yourself. Because what could be incredibly good for Flickr may not be so good for your application.
Moreover, the performance of views may vary a lot from one RBDMS to another. For instance, depending on the RBDMS views can be updated when the orginal table is changed, can have indexes, etc.