Using indexes on/through a MySQL view - mysql

We've got a MySQL table in which rows are never updated, but instead new rows are added and the old ones marked obsolete. Think Rails' acts_as_paranoid, but for every update.
To make working with Rails sane, we've got a view which selects only the rows which are "current". That makes a much better "table" for our ActiveRecord model.
The snag: our indexes aren't being used anymore.
Queries on the view don't use the underlying tables' indexes. You can't add an index to a view. Without indexes, the app is unbearably slow.
The only solution we've come up with is to build a materialized view, but that's a pain in MySQL because they're not natively supported.
Is there a better way to do this?

Since MySQL executes the query underlying the view, it should still use the indexes on the query that composes the view. Do an explain on the query that you used to create the view and post here if it's not indexing.

Related

How does views calculated columns impact performance?

I understand from this question that SQL language does support calculated columns in views.
I have a requirement where I have a table with multiple columns, and I need to calculate a sorting column in order to simplify my queries. I am thinking of creating a view for my origin table with those sorting columns calculated. But I am afraid that could be a performance nightmare as my table grows bigger.
Does any one have an idea on how that would affect performance?
Is it possible to create index on a calculated column in a view ?
UPDATE 1:
I am planning on using postgresql, but I am open to other opensource alternatives like MySQL
UPDATE 2:
as N.B. suggested:
I'm not a Postgres user, but the docs here are showing how to create that view and how to index it. If you're using Postgres and are familiar with it - stick with it. All databases work nearly the same, but if you're more proficient with one - no reason to change it. As for how it affects the performance - be it a view or a query that you construct dynamically - it's the same thing. View is just a huge help when querying, and if you can index it it means some memory will be spent on index. You have to measure
I am thinking now that materialized views are the way to go for my functional requirements, I can setup a trigger to refresh the Materialized View on each and every update on my table once I confirm this point:
How does REFRESH MATERIALIZED VIEW work ? does it drop the data and recreate the view from scratch ? or does it do some kind of differential refresh ?
Disclaimer: I have used both MySQL and PostgreSQL Database on a remote server for about 8 months only, and I have a preference for PostgreSQL for your use case.
TL;DR
According to the documentation, REFRESH MATERIALIZED VIEW
command will drop all data and re-populate the entire query's data if you add the WITH DATA clause.
You can create indexes for materialized view. The index could be on the calculated fields that are stored in the columns.
You cannot index a view (non-materialized)
You can create different types of materialized views depending on your needs (see URL link below).
Long Explanation
A) Materialized Views types and performance
I have a requirement where I have a table with multiple columns, and I need to calculate a sorting column in order to simplify my queries. I am thinking of creating a view for my origin table with those sorting columns calculated. But I am afraid that could be a performance nightmare as my table grows bigger.
If the calculations are very expensive, consider consuming more memory to store the results in materialized views or tables.
A materialized view is like a table that stores the result of a query. In the case of PostgreSQL materialized view, indexes can be created on it to speed up queries and it can be vacuumed to update the meta-data.
The materialized view that PostgreSQL provides is a naive one because you must manually refresh the data with REFRESH MATERIALIZED VIEW command. According to the documentation, this will drop all data and re-populate the data if you add the WITH DATA clause.
After that, you need to consider the performance needed for insert, update, delete operations:
If you have no real-time requirements (i.e. a full table
re-population is acceptable) then this option is fine.
Else, you might want to see this website post for different setup of materialized views, some of which allows for lazy refresh of data (trigger refresh data by rows)
https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql
The second point also applies to MySQL as well (and is actually the traditional and customized way of building materialized views). To my knowledge, MySQL does not support materialized views out-of-the-box (require plugins). The convenience provided in (1) is one of the reasons why I chose PostgreSQL.
Is it possible to create index on a calculated column in a view ?
It is possible to index the columns of a materialized view, just as you do for a table.
B) Window functions in PostgreSQL
The second reason for choosing PostgreSQL over MySQL is because the former provides extended-SQL functions (or I would like to call them OLAP functions) that help with complex queries like ranking of rows and so on.
I shall leave it to you to explore this option (just do a Google Search on "PostgreSQL Window Functions").
According to my latest knowledge, MySQL has no built in support for this (maybe rely on plugins or own coding?).

MySQL: Adding indexes on a table with existing records

I have a query that is running very slowly. The table is was querying has about 100k records and no indexes on most of the columns used in the where clause. I just added indexes on those columns but the query hasn't gotten any faster.
I think this is because when a column is indexed, it's value is written in the index at the time of insertion. I just added the indexes now after all those records were added. So is there a way to "re-run the indexes" on the table?
Edit
Here is the query and explain result:
Oddly enough when I copy the query and run in directly in my SQL manager tool it runs quite fast so may bye the problem is in my application code and not in the query itself.
Mysql keeps consistent indexes. It does not matter if the data is added first, the index is added first, or the data is changed at any time. The same final index will result (assuming the same final data and index type).
Your slow query is not caused by adding the index later. There will be some other reason.
This is an extremely common problem.
Use MySQL explain http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
When you precede a SELECT statement with the keyword EXPLAIN, MySQL displays information from the optimizer about the query execution plan. That is, MySQL explains how it would process the statement, including information about how tables are joined and in which order.
Using these results... verify the index you created is functioning the way you expected.
If not, you will want to tweak your index until you have it working as expected.
You might want to create a new table, create indexes, then insert all elements from old table to new while testing this. It's easier than dropping and re-adding indices a million times.

Optimizing MySQL queries/database

I have two tables TABLE A and TABLE B.
TABLE A contain 1 million (1,000,000) records and 4 fields while TABLE 2 contain 60,000 and 3 fields.
I am running a query which joins these two tables and usees WHERE clause to find specific products like WHERE product like '%Bags%' and product like 'Bags%' e.t.c.
When I run the query directly in phpMyAdmin then it returns records in around 1 or 2 seconds. But when they are being used on website, they are sometime taking 9 or 10 seconds according to MySQL 'slow query' log. Actually my website response was very slow at times so upon investigation I found out it is due to MySQL as I came to know about 'slow query log'.
The slow query log consists of all SQL statements that took more than long_query_time seconds to execute and required at least min_examined_row_limit rows to be examined.
So according to that log "query_time" for above query was 13 seconds while in some cases they even had "query_time" exceeding 50 seconds.
Both my tables are using PRIMARY keys as well as INDEXES. So I want to know how can I optimize them more or is there any way I can optimize MySQL settings in general?
This slowness of website doesn't happen all the time but sometimes (may be once in a week) and lasts for around 1 or 2 minutes. It gets decent amount of traffic and there are many other queries too, the above I posted was just one example.
Thanks
For all things MySQL and performance related, check out http://www.mysqlperformanceblog.com/
Check your queries with EXPLAIN, see here and here for info on how to use EXPLAIN as query diagnostic tool.
It's not enough to just have indexes. Are you indexing the fields searched in the WHERE clause? Also do you have indexes for the fields used in the WHERE clause (including the fields you mention in ORDER BY, GROUP BY, and HAVING clauses as well as JOINs)? If you have grouped fields in a single index, that index won't be hit unless you have a query that searches all those fields together. If you group fields in an index make sure they the index will actually be used in your query (EXPLAIN is your friend).
That said, it could be many other things as well: poorly configured MySQL server, poorly tuned server, bad schema. But your queries and your indexes are good place to start your investigation.
Here is a nice summary of performance best practices from Jay Pipes of MySQL.
like '%Bags%' query cannot be optimized using indexes.
The only way to improve performance here is to use fulltext indexes or get sphinx to search.
Its because of some other queries are run at the time when you are going to refresh the page of your website. so if for example your website going to run 8-10 queries at time of page refresh then it will take some more time than you run single query in phpmyadmin. and if its take 1-1.5 min to execute then its may not the query problem but it may have prob with the server speed also.
and you also can use MATCH() AGAINST() statement for optimize this type of search queries.
Otherwise you are already using PRIMARY KEY, INDEXES and JOINS so there is no need to worry about other things.
just check it out.
Thanks.
There are many ways to optimize Databases and queries. My method is the following.
Look at the DB Schema and see if it makes sense
Most often, Databases have bad designs and are not normalized. This can greatly affect the speed of your Database. As a general case, learn the 3 Normal Forms and apply them at all times. The normal forms above 3rd Normal Form are often called de-normalization forms but what this really means is that they break some rules to make the Database faster.
What I suggest is to stick to the 3rd normal form except if you are a DBA (which means you know subsequent forms and know what you're doing). Normalization after the 3rd NF is often done at a later time, not during design.
Only query what you really need
Filter as much as possible
Your Where Clause is the most important part for optimization.
Select only the fields you need
Never use "Select *" -- Specify only the fields you need; it will be faster and will use less bandwidth.
Be careful with joins
Joins are expensive in terms of time. Make sure that you use all the keys that relate the two tables together and don't join to unused tables -- always try to join on indexed fields. The join type is important as well (INNER, OUTER,... ).
Optimize queries and stored procedures (Most Run First)
Queries are very fast. Generally, you can retrieve many records in less than a second, even with joins, sorting and calculations. As a rule of thumb, if your query is longer than a second, you can probably optimize it.
Start with the Queries that are most often used as well as the Queries that take the most time to execute.
Add, remove or modify indexes
If your query does Full Table Scans, indexes and proper filtering can solve what is normally a very time-consuming process. All primary keys need indexes because they makes joins faster. This also means that all tables need a primary key. You can also add indexes on fields you often use for filtering in the Where Clauses.
You especially want to use Indexes on Integers, Booleans, and Numbers. On the other hand, you probably don't want to use indexes on Blobs, VarChars and Long Strings.
Be careful with adding indexes because they need to be maintained by the database. If you do many updates on that field, maintaining indexes might take more time than it saves.
In the Internet world, read-only tables are very common. When a table is read-only, you can add indexes with less negative impact because indexes don't need to be maintained (or only rarely need maintenance).
Move Queries to Stored Procedures (SP)
Stored Procedures are usually better and faster than queries for the following reasons:
Stored Procedures are compiled (SQL Code is not), making them faster than SQL code.
SPs don't use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned.
Stored Procedures are run on the server, which is typically faster.
Calculations in code (VB, Java, C++, ...) are not as fast as SP in most cases.
It keeps your DB access code separate from your presentation layer, which makes it easier to maintain (3 tiers model).
Remove unneeded Views
Views are a special type of Query -- they are not tables. They are logical and not physical so every time you run select * from MyView, you run the query that makes the view and your query on the view.
If you always need the same information, views could be good.
If you have to filter the View, it's like running a query on a query -- it's slower.
Tune DB settings
You can tune the DB in many ways. Update statistics used by the optimizer, run optimization options, make the DB read-only, etc... That takes a broader knowledge of the DB you work with and is mostly done by the DBA.
****> Using Query Analysers****
In many Databases, there is a tool for running and optimizing queries. SQL Server has a tool called the Query Analyser, which is very useful for optimizing. You can write queries, execute them and, more importantly, see the execution plan. You use the execution to understand what SQL Server does with your query.

MySQL: Data query from multiple tables with views

I want to create a query result page for a simple search, and i don't know , should i use views in my db, would it be better if i would write a query into my code with the same syntax like i would create my view.
What is the better solution for merging 7 tables, when i want to build a search module for my site witch has lots of users and pageloads?
(I'm searching in more tables at the same time)
you would be better off using a plain query with joins, instead of a view. VIEWS in MySQL are not optimized. be sure to have your tables properly indexed on the fields being used in the joins
If you always use all 7 tables, i think you should use views. Be aware that mysql changes your original query when creating the view so its always good practice to save your query elsewhere.
Also, remember you can tweak mysql's query cache env var so that it stores more data, therefore making your queries respond faster. However, I would suggest that you used some other method for caching like memcached. The paying version of mysql supports memcached natively, but Im sure you can implement it in the application layer no problem.
Good luck!

How to implement Materialized View with MySQL?

How to implement Materialized Views?
If not, how can I implement Materialized View with MySQL?
Update:
Would the following work? This doesn't occur in a transaction, is that a problem?
DROP TABLE IF EXISTS `myDatabase`.`myMaterializedView`;
CREATE TABLE `myDatabase`.`myMaterializedView` SELECT * from `myDatabase`.`myRegularView`;
I maintain LeapDB (http://www.leapdb.com) which adds incrementally refreshable materialized views to MySQL (aka fast refresh), even for views that use joins and aggregation. I've been working on this project for 13 years. It includes a change data capture utility to read the database logs. No triggers are used.
It includes two refresh methods. The first is similar to your method, except a new version is built, and then RENAME TABLE is used to swap the new for the old. At no point is the view unavailable for querying, but 2x the space is used for a short time.
The second method is true "fast refresh", it even has support for aggregation and joins.
LeapDB is significantly more advanced than the FromDual example referenced by astander.
Your example approximates a "full refresh" materialized view. You may need a "fast refresh" view, often used in a data warehouse setting, if the source tables include millions or billions of rows.
You would approximate a fast refresh by instead using insert / update (upsert) joining the existing "view table" against the primary keys of the source views (assuming they can be key preserved) or keeping a date_time of the last update, and using that in the criteria of the refresh SQL to reduce the refresh time.
Also, consider using table renaming, rather than drop/create, so the new view can be built and put in place with nearly no gap of unavailability. Build a new table 'mview_new' first, then rename the 'mview' to 'mview_old' (or drop it), and rename 'mview_new' to 'mview'. In your above sample, your view will be unavailable while your SQL populate is running.
This thread is rather old, so I will try to re-fresh it a bit:
I've been experimenting and even deployed in production several methods for having materialized views in MySQL. Basically all methods assume that you create a normal view and transfer the data to a normal table - the actual materialized view. Then, it's only a question of how you refresh the materialized view.
Here's what I've success with so far:
Using triggers - you can set triggers on the source tables on which you build the view. This minimizes the resource usage as the refresh is only done when needed. Also, data in the materialized view is realtime-ish
Using cron jobs with stored procedures or SQL scripts - refresh is done on a regular basis. You have more control as to when resources are used. Obviously you data is only as fresh as the refresh-rate allows.
Using MySQL scheduled events - similar to 2, but runs inside the database
Flexviews - using FlexDC mentioned by Justin. The closest thing to real materialized
I've been collecting and analyzing these methods, their pros and cons in my article Creating MySQL materialized views
looking forwards for feedback or proposals for other methods for creating materialized views in MySQL
According to the mySQL docs and comments at the bottom of the page, it just seems like people are creating views then creating tables from those views. Not sure if this solution is the equivalent of creating a materialized view, but it seems to be the only avenue available at this time.