MySql views performance [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
If you are going down the road of using views, how can you ensure good performance?
Or is it better not to use views in the first place and just incorporate the equivalent into your select statements?

It Depends.
It totally depends on what you are viewing through view. But most probably reducing your effort and giving higher performance. When SQL statement references a nonindexed view, the parser and query optimizer analyze the source of both the SQL statement and the view and then resolve them into a single execution plan. There is not one plan for the SQL statement and a separate plan for the view.
A view is not compiled. Its a virtual table made up of other tables. When you create it, it doesn't reside somewhere on your server. The underlying queries that make up the view are subject to the same performance gains or dings of the query optimizer. I've never tested performance on a view VS its underlying query, but i would imagine the performance may vary slightly. You can get better performance on an indexed view if the data is relatively static. This may be what you are thinking maybe in terms of "compiled".
Advantages of views:
View the data without storing the data into the object.
Restrict the view of a table i.e. can hide some of columns in the tables.
Join two or more tables and show it as one object to user.
Restrict the access of a table so that nobody can insert the rows into the table.
See these useful links:
Performance of VIEW vs. SQL statement
Is a view faster than a simple query?
Mysql VIEWS vs. PHP query
Are MySql Views Dynamic and Efficient?
Materialized View vs. Tables: What are the advantages?
Is querying over a view slower than executing SQL directly?
A workaround for the performance problems of TEMPTABLE views
See performance gains by using indexed views in SQL Server

Here's a tl;dr summary, you can find detailed evaluations from Peter Zaitsev and elsewhere.
Views in MySQL are generally a bad idea. At Grooveshark we consider them to be harmful and always avoid them. If you are careful you can make them work but at best they are a way to remember how to select data or keep you from having to retype complicated joins. At worst they can cause massive inefficiencies, hide complexity, cause accidental nested subselects (requiring temporary tables and leading to disk thrashing), etc.
It's best to just avoid them, and keep your queries in code.

I think the blog by Peter Zaitsev has most of the details. Speaking from personal experience views can perform well if you generally keep them simple. At one of my clients they kept on layering one view on top of another and it ended up in a perfomance nightmare.
Generally I use views to show a different aspect of a table. For example in my employees table show me the managers or hide the salary field from non HR employees. Also always make sure you run a EXPLAIN on the query and view to understand exactly what is happening inside MySQL.
If you want solid proof in your scenario I would suggest that you test. It is really hard to say using views is always a performance killer then again a badly written view is probably going to kill your performance.

They serve their purpose, but the hidden complexities and inefficiencies usually outweigh a more direct approach. I once encountered a SQL statement that was joining on two views, and sorting them the results. The views were sorting as well, so the execution time could be measured in what seemed like hours.

A thing not mentioned so far but making a huge difference is adequate indexing of the views' source tables.
As mentioned above, views do not reside in your DB but are rebuild every time. Thus everything that makes the rebuild easier for the DB increases performance of the view.
Often, views join data in a way that is very bad for storage (no normal form) but very good for further usage (doing analysis, presenting data to user, ...) and therewith joining and aggregating data from different tables.
Whether or not the columns on which the operations are made are indexed or not makes a huge difference on the performance of a view. If the tables and their relevant columns are indexed already accessing the view does not end in re-computing the indexes over and over again first. (on the downside, this is done when data is manipulated in the source tables)
! Index all columns used in JOINS and GROUP BY clauses in your CREATE VIEW statement !

If we are discussing "if you use views, how to ensure performance", and not the performance effect of views in general, I think that it boils down to restraint (as in yourself).
You can get in to big trouble if you just write views to make your query's simple in all cases, but do not take care that your views are actually usefull performance-wise. Any query's you're doing in the end should be running sane (see the comment example from that link by #eggyal). Ofcourse that's a tautology, but therefore not any less valuable
You especially need to be carefull not to make views from views, just because that might make it easier to make that view.
In the end you need to look at the reason you are using views. Any time you do this to make life easier on the programming end you might be better of with a stored procedure IMHO.
To keep things under control you might want to write down why you have a certain view, and decide why you are using it. For every 'new' use within your programming, recheck if you actually need the view, why you need it, and if this would still give you a sane execution-path. Keep on checking your uses to keep it speedy, and keep checking if you really need that view.

Related

Does a complex sql query ever become complex enough that it would be more machine-efficient to do multiple queries?

I have this sense that - inefficiently written queries aside - getting the information you want out of a database is always faster the less queries you make to do so. I don't know where I got that idea from and it gets challenged the more complicated the queries are that I produce (am I really doing MySQL any favors with all these joins?). I'm not asking for an opinion on ease for the programmer or best coding practices, but do conditions exist under which a program would perform faster with a query broken out into multiple steps? If so, how might one make an educated guess a query might reach such an upper limit before going through the effort of coding and comparing?
Yes, although it is less likely with MySQL. The reason is that MySQL doesn't have a really sophisticated cost-based optimizer. The advantage to intermediate tables is that the sizes are known. A cost-based optimizer can take advantage of this information to improve the query plan.
One place where this can help is when a subquery is repeated multiple times in a query. An intermediate table ensures that it is processed only once (although CTEs would normally do the same thing).
Another place where this can really help is when you add indexes to the intermediate tables. Adding the indexes -- and using them -- can be a big cost savings, more than making up for the cost of creating the index.
That said, I generally discourage using intermediate tables for this purpose, unless the results are needed for multiple queries. I find that just the overhead in debugging makes it not worth it -- for some reason, I don't always delete the intermediate tables and then waste time wondering why some modification doesn't work.
More importantly, as the data changes, modifying the queries can be a pain. I find that changing a column name, for instance, is simpler in a single query than when the logic is spread across multiple queries.

Best practises : is sql views really worth it? [duplicate]

This question already has answers here:
Why do you create a View in a database?
(25 answers)
Closed 8 years ago.
I am building a new web applications with data stored in the database. as many web applications, I need to expose data from complexe sql queries (query with conditions from multiple table). I was wondering if it could be a good idea to build my queries in the database as sql view instead of building it in the application? I mean what would be the benefit of that ? database performance? do i will code longer? debug longer?
Thank you
This can not be answered really objectively, since it depends on case by case.
With views, functions, triggers and stored procedures you can move part of the logic of your application into the database layer.
This can have several benefits:
performance -- you might avoid roundtrips of data, and certain treatment are handled more efficiently using the whole range of DBMS features.
consisness -- some treatment of data are expressed more easily with the DBMS features.
But also certain drawback:
portability -- the more you rely on specific features of the DBMS, the less portable the application becomes.
maintenability -- the logic is scattered across two different technologies which implies more skills are needed for maintenance, and local reasoning is harder.
If you stick to the SQL92 standard it's a good trade-off.
My 2 cents.
I think your question is a little bit confusing in what you are trying to achieve (Gain knowledge regarding SQL Views or how to structure your application).
I believe all database logic should be stored at the database tier, ideally in a stored procedure, function rather in the application logic. The application logic should then connect to the database and retrieve the data from these procedures/functions and then expose this data in your application.
One of the the benefits of storing this at the database tier is taking advantage of the execution plans via SQL Server (which you may not get by accessing it directly via the application logic). Another advantage is that you can separate the application, i.e. if the database needs to be changed you don't have to modify the application directly.
For a direct point on Views, the advantages of using them include:
Restrict a user to specific rows in a table.
For example, allow an employee to see only the rows recording his or her work in a labor- tracking table.
Restrict a user to specific columns.
For example, allow employees who do not work in payroll to see the name, office, work phone, and department columns in an employee table, but do not allow them to see any columns with salary information or personal information.
Join columns from multiple tables so that they look like a single table.
http://msdn.microsoft.com/en-us/library/aa214068(v=sql.80).aspx
Personally I prefer views, especially for reports/apps as if there are any problems in the data you only have to update a single view rather than re-building the app or manually editing the queries.
SQL views have many uses. Try first reading about them and then asking a more specific question:
http://odetocode.com/code/299.aspx
http://msdn.microsoft.com/en-us/library/ms187956.aspx
I have seen that views are used a lot to do two things:
Simplify queries, if you have a HUGE select with multiple joins and joins and joins, you can create a view that will have the same performance but the query will be only a couple of lines.
For security reason, if you have a table with some information that shouldn't be accessed for all the developers, you can create views and grant privileges to see the views and not the main table, I.E:
table 1: Name, Last_name, User_ID, credit_card, social_security. You create a view table.table view: name, last_name, user_id .
You can run into performance issues and constraints on the types queries you can run against a view.
Restrictions on what you can do with views.
http://dev.mysql.com/doc/refman/5.6/en/view-restrictions.html
Looks like the big one is that you cannot create an index on the view. This could cause a big performance hit if your final result table is large
This is also a good forum discussing views: http://forums.mysql.com/read.php?100,22967,22967#msg-22967
In my experience a well indexed table, using the right engine, and properly tuned (for example setting an appropriate key_buffer value) can perform better than a view.
Alternatively you could create a trigger that updates a table based on the results of other tables. http://dev.mysql.com/doc/refman/5.6/en/triggers.html
The technic you are saying is called denormalization. Cal Henderson, software engineer from Flickr, openly supports this technic.
In theory JOIN operation is one of the most expensive operations, so it is a good practice to denormalize, since you are transforming n queries with m JOIN in 1 query with m JOIN and n queries that select from a view.
That said, the best way is to test it for yourself. Because what could be incredibly good for Flickr may not be so good for your application.
Moreover, the performance of views may vary a lot from one RBDMS to another. For instance, depending on the RBDMS views can be updated when the orginal table is changed, can have indexes, etc.

Vertical partitioning of tables in MySQL

Another question.
Is it better to vertically partition wide table (in my instance I am thinking about splitting login details from address, personal etc. details of the user) on a design stage or better leave it be and partition it after having some data and doing profiling?
Answer seems to be obvious but I am concerned that row splitting a table sometime down the line will mean additional work to rewrite the user model + it seems reasonable to split often accessed login details from more static personal details.
Anyone has some experience backed advice on how to proceed :)? Thanks in advance.
Premature optimization is...
Splitting columns off to a different table has drawbacks:
Some operations that required a single query now require two queries or a join
It's not trivial to enforce that every row in each table needs to have a corresponding row in the other. Thus, you might face integrity problems
On the other hand, it's dubious at best that doing it will improve performance. Unless you can prove it beforehand (and creating a 10 million records table with random data and running some queries is trivial), I wouldn't do it. Doug Kress' suggestions of encapsulation and avoiding SELECT * are the right way.
The only reason to do it is if your single table design is not normalized and normalization implies breaking up the table.
I believe it would be better to keep it as a single table, but encapsulate your access to the data as much as possible, so that it would be easy to refactor later.
When you do access the data, be sure to only gather the information you need in the query (avoid 'SELECT *').
Having said that, be sure that the data saved with the table is normalized appropriately. You may find that you want to store multiple addresses for a user, for instance - in which case you should put it in a separate table.

How can I optimize my database?

I am creating a platform for some clients. Each client needs to have contacts and manage them in groups, categories (which depends of the group) and subcategories (which depends of the category).
The database is going to be very big, and Im afraid about the performance. I want to optimize the database; now, I have these options:
Manage only one database with multiple tables (as we manage now)
Create a database for each client (each database will have the same multiple tables as the option 1)
Manage multiple XML files (like option 2, each client will have a directory with an XML for contacts, another XML file for groups, another for categories, and so on)
Wich is the best option for performance and management of the data (CRUD, create, read, update, delete)??
Thanks!!
I think one database with multiple tables is the way to go, because duplicating the database and schema for each new client doesn't scale well. XML files sounds cool but so far I haven't seen an XML read/write engine which is as fast as most RDBMSes, so bin that one.
To make this work (lots of tables in one database) you should pay attention to indexing and optimizing the one database; indexes in particular will help you maintain speed as you scale up.
Use clustered indexing on the clienId in whichever table it might exist as a foreign key. This procedure will give you the best client-centric performance because you would (usually) be pulling a particular client's info in a page fetch.
For #2, I would suggest making that a premium service to your clients. If they want "priority hosting" on a separate server of "their own" then they pay extra. That will make the maintenance headache worthwhile.
Have you tried actually implementing 1 (which is the easiest)?
Did you profile the code?
What is the performance now?
use EXPLAIN to see how the queries are performing?
Do you use indexes (often correct indexes are enough to give excellent performance changes)?
Optimize when you hit a bottleneck (or when you set certain benchmarks for performance), not during design phase...
UPDATE: You mentioned "millions of entries". That's nothing for mysql (provided you use correct indexes on your tables). I have a table with about 40 million rows & although it's not lightning fast it gives me results in a couple of seconds. So there you go...
3 is not advisable. Search etc. is not what XML files do efficiently.
2 is a maintenance problem.
1 should be doable. "very big" means what? I have a database with a tabe with currently 1.5 billion entries - that is "big" not "very big". What do you define as very big?
As far as ongoing maintenance and support goes I think only option 1 makes sense for you.
Index all columns you need to but nothing more. Look at your code and see how tables are being JOINed and index the columns which will otherwise require a table scan.
Indicies will speed up the read operations but slow down your write operations as you need to update the indicies as well as the column. They also need more space in the DB.
As suggested above use EXPLAIN to see how your queries are executing and what can be optimized there.
Finally performance tuning only works well after you baseline your existing performance, make a change, then baseline performance again to see if it helped. If not roll back and try something else. But always start with a known level of performance, otherwise you might end up making multiple changes which in total slow things down. Good luck!

Database structure - To join or not to join

We're drawing up the database structure with the help of mySQL Workbench for a new app and the number of joins required to make a listing of the data is increasing drastically as the many-to-many relationships increases.
The application will be quite read-heavy and have a couple of hundred thousand rows per table.
The questions:
Is it really that bad to merge tables where needed and thereby reducing joins?
Should we start looking at horizontal partitioning? (in conjunction with merging tables)
Is there a better way then pivot tables to take care of many-to-many relationships?
We discussed about instead storing all data in serialized text columns and having the application make the sorting instead of the database, but this seems like a very bad idea, even though that the database will be heavily cached. What do you think?
Go with the normalized form of the database. For most part of the tasks you won't need more than 3 or 4 Joins and you still can write views for the most common joins. Denormalization will have you to always think of updating fields in multiple places/tables when changing one property and will surely lead to more problems than benefits.
If you worry about reporting performance then you still can extract the data in timed batches into separate tables to get the desired performance for your reporting queries. If it's for query simplicity you can use views.
In inverse order:
Forget it. Use the database. People saynig "make it in the application" are pretty often those ignorant to the amount of work going into writing databases.
Depends on exact need.
Depends on exact need. OLTP (Transaction processing) - go for for firth normal form. OLAP (Analytical processing) - go for a proper star diagram and denormalize to get optimal performance. Mixed - forget it. Does not work for larger installs because the theories are different... except if you make the database OLTP and then use a special OLAP cube database (which mySQL does not have).
Databases are designed to handle lots of joins. Use this feature as it will make many kinds of data manipulation in the database much easier. Otherwise, why not just use a flat file?
As always, it depends on your application, but in general, too much denormalisation can come back and bite you later on. A well normalised database means that you should be able to query your data in most ways that you may need later on, particularly for reporting (which often is an afterthought).
If you stick all your data in serialized text columns and your client asks for a report showing all rows that have a particular attribute, then you're going to have to do a bunch of string manipulation to get this data out.
If you're worried about too many joins for your queries, you could consider exposing certain sets of the data as a view...
If you make sure to index the foreign keys (you did set up foreign keys didn't you?) and have proper where clauses in your queries, 10-15 joins should be easily handled by a database. Especially with so few rows. I have queries with that many joins on tables with millions of rows and they run fine.
Usually it is better to partition data than to denormalize.
As far as denomalizing goes, don't do it unless you also institute a strategy for keeping the denormalized data in synch with the parent table.
As to whether you really need that many tables or if your design is bad, well the only way we could comment on that is if we saw the table structure.
Unless you have clear evidence that performance is suffering because of the joins, stay normalised. Otherwise, as others have said, you'll have to worry about multiple updates.
Especially if the database is heavily cached, as you say, you'll be surprised how quick the DBMS is at doing this kind of thing - it is what it's designed for, after all.
Unless it's the sort of monster application, with huge amounts of data, that demands special performance optimisations, you'll find that keeping down the development, testing, and later, maintenance effort, will be much more important.
Joins are good, usually, not bad. They allow you to keep the data where it should be, which gives you maximum flexibility.
And as has been said many times, premature optimisation is usually bad, not good.