How to implement Materialized View with MySQL? - mysql

How to implement Materialized Views?
If not, how can I implement Materialized View with MySQL?
Update:
Would the following work? This doesn't occur in a transaction, is that a problem?
DROP TABLE IF EXISTS `myDatabase`.`myMaterializedView`;
CREATE TABLE `myDatabase`.`myMaterializedView` SELECT * from `myDatabase`.`myRegularView`;

I maintain LeapDB (http://www.leapdb.com) which adds incrementally refreshable materialized views to MySQL (aka fast refresh), even for views that use joins and aggregation. I've been working on this project for 13 years. It includes a change data capture utility to read the database logs. No triggers are used.
It includes two refresh methods. The first is similar to your method, except a new version is built, and then RENAME TABLE is used to swap the new for the old. At no point is the view unavailable for querying, but 2x the space is used for a short time.
The second method is true "fast refresh", it even has support for aggregation and joins.
LeapDB is significantly more advanced than the FromDual example referenced by astander.

Your example approximates a "full refresh" materialized view. You may need a "fast refresh" view, often used in a data warehouse setting, if the source tables include millions or billions of rows.
You would approximate a fast refresh by instead using insert / update (upsert) joining the existing "view table" against the primary keys of the source views (assuming they can be key preserved) or keeping a date_time of the last update, and using that in the criteria of the refresh SQL to reduce the refresh time.
Also, consider using table renaming, rather than drop/create, so the new view can be built and put in place with nearly no gap of unavailability. Build a new table 'mview_new' first, then rename the 'mview' to 'mview_old' (or drop it), and rename 'mview_new' to 'mview'. In your above sample, your view will be unavailable while your SQL populate is running.

This thread is rather old, so I will try to re-fresh it a bit:
I've been experimenting and even deployed in production several methods for having materialized views in MySQL. Basically all methods assume that you create a normal view and transfer the data to a normal table - the actual materialized view. Then, it's only a question of how you refresh the materialized view.
Here's what I've success with so far:
Using triggers - you can set triggers on the source tables on which you build the view. This minimizes the resource usage as the refresh is only done when needed. Also, data in the materialized view is realtime-ish
Using cron jobs with stored procedures or SQL scripts - refresh is done on a regular basis. You have more control as to when resources are used. Obviously you data is only as fresh as the refresh-rate allows.
Using MySQL scheduled events - similar to 2, but runs inside the database
Flexviews - using FlexDC mentioned by Justin. The closest thing to real materialized
I've been collecting and analyzing these methods, their pros and cons in my article Creating MySQL materialized views
looking forwards for feedback or proposals for other methods for creating materialized views in MySQL

According to the mySQL docs and comments at the bottom of the page, it just seems like people are creating views then creating tables from those views. Not sure if this solution is the equivalent of creating a materialized view, but it seems to be the only avenue available at this time.

Related

How does views calculated columns impact performance?

I understand from this question that SQL language does support calculated columns in views.
I have a requirement where I have a table with multiple columns, and I need to calculate a sorting column in order to simplify my queries. I am thinking of creating a view for my origin table with those sorting columns calculated. But I am afraid that could be a performance nightmare as my table grows bigger.
Does any one have an idea on how that would affect performance?
Is it possible to create index on a calculated column in a view ?
UPDATE 1:
I am planning on using postgresql, but I am open to other opensource alternatives like MySQL
UPDATE 2:
as N.B. suggested:
I'm not a Postgres user, but the docs here are showing how to create that view and how to index it. If you're using Postgres and are familiar with it - stick with it. All databases work nearly the same, but if you're more proficient with one - no reason to change it. As for how it affects the performance - be it a view or a query that you construct dynamically - it's the same thing. View is just a huge help when querying, and if you can index it it means some memory will be spent on index. You have to measure
I am thinking now that materialized views are the way to go for my functional requirements, I can setup a trigger to refresh the Materialized View on each and every update on my table once I confirm this point:
How does REFRESH MATERIALIZED VIEW work ? does it drop the data and recreate the view from scratch ? or does it do some kind of differential refresh ?
Disclaimer: I have used both MySQL and PostgreSQL Database on a remote server for about 8 months only, and I have a preference for PostgreSQL for your use case.
TL;DR
According to the documentation, REFRESH MATERIALIZED VIEW
command will drop all data and re-populate the entire query's data if you add the WITH DATA clause.
You can create indexes for materialized view. The index could be on the calculated fields that are stored in the columns.
You cannot index a view (non-materialized)
You can create different types of materialized views depending on your needs (see URL link below).
Long Explanation
A) Materialized Views types and performance
I have a requirement where I have a table with multiple columns, and I need to calculate a sorting column in order to simplify my queries. I am thinking of creating a view for my origin table with those sorting columns calculated. But I am afraid that could be a performance nightmare as my table grows bigger.
If the calculations are very expensive, consider consuming more memory to store the results in materialized views or tables.
A materialized view is like a table that stores the result of a query. In the case of PostgreSQL materialized view, indexes can be created on it to speed up queries and it can be vacuumed to update the meta-data.
The materialized view that PostgreSQL provides is a naive one because you must manually refresh the data with REFRESH MATERIALIZED VIEW command. According to the documentation, this will drop all data and re-populate the data if you add the WITH DATA clause.
After that, you need to consider the performance needed for insert, update, delete operations:
If you have no real-time requirements (i.e. a full table
re-population is acceptable) then this option is fine.
Else, you might want to see this website post for different setup of materialized views, some of which allows for lazy refresh of data (trigger refresh data by rows)
https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql
The second point also applies to MySQL as well (and is actually the traditional and customized way of building materialized views). To my knowledge, MySQL does not support materialized views out-of-the-box (require plugins). The convenience provided in (1) is one of the reasons why I chose PostgreSQL.
Is it possible to create index on a calculated column in a view ?
It is possible to index the columns of a materialized view, just as you do for a table.
B) Window functions in PostgreSQL
The second reason for choosing PostgreSQL over MySQL is because the former provides extended-SQL functions (or I would like to call them OLAP functions) that help with complex queries like ranking of rows and so on.
I shall leave it to you to explore this option (just do a Google Search on "PostgreSQL Window Functions").
According to my latest knowledge, MySQL has no built in support for this (maybe rely on plugins or own coding?).

SQL Server View Access Speed Versus Writing View to Table

I have a SQL Server 2008 DB that has a set of views that are accessed by a program. Our goal is to optimize the access speed of the program (which pulls in data on a user request), to minimize end-user impact.
Right now we are writing all of our views to tables, and passing those mappings to the application (we found the application performed better reading from tables as opposed to views). We are soon going to implement indexes (still need to discuss with the application vendor what indexes will speed up their import), but for now we're trying to figure out the best way to optimize the import.
The plan currently is to write the views to tables, add the proper indexes and then run a (select *) statement to force them into memory. My question is whether A) writing them to tables is necessary once we have the indices and the select * and B) what are some methods that we are missing?
Edited to clarify question goal.
OK I think I follow
Select into implies you are dropping the table and let select into create
You are probably better off with a truncate
If it is a FK then you need to delete but they tend to be smaller
And then just do an insert into
This way you also don't need to drop and recreate views
If you can take the hit you are better off taking a tablock hit the whole table
If the linked is slow then insert into some local staging tables
From the staging table load the production table
I totally don't get why you would materialize a view into table.
If you have performance issue with a view then first optimize the view.
What is going on in the view that is slow?

Automatically Loading the View

Can we create Materialized view in mysql or sql which will automatically reloaded with the data from the underlying bases table without hitting the base table.
Elobration:
I have created viewMasterTable view which is join of 3 tables,
TableA,TableB,Table = viewMasterTable
Now i want this view to be reloaded with the data if any changes i.e upadate,insert or delete is made on the base table without hitting the base table.
**Will this view concept will help in performance increase**
You can create materialized views in SQL-Server Enterprise Edition. In MySQL you cannot create materialized views. Thus this only applicable to SQL-Server and a very specific edition.
Now you don't get something for nothing. If you materialize a view it means that the source columns used in the base table(s) have to remain in synchronization with the data in the base tables. Thus any updates/inserts/deletes on the base table will be impacted as the server now has to write to the base tables and update the view. So you will have an extra operation to complete for every write this will incur a performance penalty on the server itself. Depending on size of tables, views and frequency of updates this might or might not be a small penalty.
You can index materialized views and this is where the power really shines. Say you have a very complex view that can be filtered by various columns a materialize view will allow you to index the fields in the view allowing a user to filter much faster. However the downside is that for every index that you create on a materialized view it will incur more write penalties as the server needs to update indexes when updating the view.
So while it can be a really good way to increase performance for reads on a complex query you will see a performance penalty on writes. How bad will this penalty be? Well that depends on how you have arranged your Disk IO pathways i.e. for example placing your indexes, views and tables on separate physical spindles will help alleviate some of the write overhead.

Should I use a MySQL view or a report cronjob

At my work my colleagues always build report cronjobs for heavy tables. With the cronjob we get all data from 1 day per user and insert the totals in a report table. The report overview page is not correct because it has a delay for at most 1 hour.
The cronjob runs 24 times a day (every hour).
Is it better to use a MySQL view? When a record has been added to the master table the MySQL view will updated, right? This is a very though action. Will that affect the users using the dashboard?
Kind regards,
Joost
Okay so some terminology first.
The cron jobs are most likely appending data to existing tables (perhaps using an upsert method like INSERT ... ON DUPLICATE KEY UPDATE). These data you are writing to the existing tables may be indexed, just like normal MySQL tables, and they are also persistent on disk
Views, on the other hand, are really nothing more than saved queries in MySQL. Every time you open a view, you run the query again. Views aren't really useful for performance optimization as much as they are useful for small, efficient queries that otherwise might be a pain to remember. Views cannot have indices (although they are effectively saved queries, so the query itself can make use of the indices on the tables it's referencing) and they are not persistent to disk. Every time you load the view, you will be running the query that makes up the view again
Now, in between views and tables populated by Cron jobs, you also could install a plugin for MySQL called Flexviews (https://github.com/greenlion/swanhart-tools). Flexviews allows MySQL to use what are called materialized views (eg http://en.wikipedia.org/wiki/Materialized_view). Materialized views are basically views that are persisted to disk as tables. And, since they are tables, they can also use indices.
Materialized views are not native to MySQL, but the developer who maintains that plugin is well known in the MySQL community, and he tends to write good, reliable SQL tools . Obviously it would be a mistake to test the plugin in a production environment, or without using backups. But there are plenty of folks who use Flexviews in production to accomplish exactly what it seems like you'd like to do... obtain near real time updates of dashboard/summary tables in a way that doesn't murder DB performance.
I'd definitely check Flexviews out... you can learn more about it
here: http://www.percona.com/blog/2011/03/23/using-flexviews-part-one-introduction-to-materialized-views/
and here: http://www.percona.com/blog/2011/03/25/using-flexviews-part-two-change-data-capture/

where views are stored in Mysql

I have some questions on views -
Where views are created/stored in Mysql? Or they are only virtual and deleted after some time period?
When the data of views is refresh? (It refresh automatically when we insert data in actual table or we have to update view each time?)
Use of views is good or we should fire the queries each time?
Views are pure metadata. MySQL doesn't copy any data during the creating of a view, and it's also it is not deleted after a time.
When you run a select on a view, mysql (or any other database) runs the query defined at creation time.
There's no performance difference (or almos not different) between running a query on a table or on a view.
Some databases, such as oracle, support something called materialised views. These views, do copy the data, so they have to be refreshed, so the data doesn't become stale.
Leaving this as it turned up in Google results.
To see view definitions in MySQL you can use this query:
SELECT * FROM information_schema.VIEWS;
Regards,
James