I have a 8.6M table, with full text search but it is still impossible to use it. I can split this table in two, but I think there must be a better way to use it.
I tried to create a view, with temptable algorithms but it didn't create a physical table.
Table 1 - 8.6M rows
id name age
1 john 20
2 jean 25
View 1 - 200K rows - Only records where age = 25.
id name age
2 jean 25
In MySQL, views are not "materialized views." Every time you query the view, it's like querying the base table. Some other RDBMS products have materialized views, where the subset of the table is also stored, but MySQL does not have this feature.
You have misunderstood the temptable algorithm for views. That means it creates a temporary table every time you query the view. This is probably not going to improve performance.
I'm not sure why you say that it's impossible to use the table. Do you mean that queries are not fast enough? That's not really the biggest table that MySQL can handle. There are tables that have hundreds of millions of rows and are still usable.
You may need different indexes to serve your query. You might benefit from partitioning (although I don't think table partitioning is compatible with fulltext indexes). You might need server hardware with more RAM or CPU horsepower.
If fulltext searches are important, you could also consider copying searchable data to a fulltext search technology like Sphinx Search. See my presentation Full Text Search Throwdown.
if you want a view then create a VIEW like here: http://dev.mysql.com/doc/refman/5.0/en/create-view.html
CREATE VIEW ....
If you want a temporary table then create a temporary table like here: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
CREATE TEMPORARY TABLE ...
A view is permanent, but it is a view, not a table. Meaning that it will only execute a sql statement for you in the background each time you access it. Its purposes and role are not just to make your queries to look prettier but for example to be like a filter that will hide certain information.
A view will be accessible all the time. A temporary table has to be create within the DB connection cycle in order to be used.
Hopefully this answer will help you to decide what you really need - What to use? View or temporary Table
Related
Which one is fast either Index or View both are used for optimization purpose both are implement on table's column so any one explain which one is more faster and what is difference between both of them and which scenario we use view and index.
VIEW
View is a logical table. It is a physical object which stores data logically. View just refers to data that is tored in base tables.
A view is a logical entity. It is a SQL statement stored in the database in the system tablespace. Data for a view is built in a table created by the database engine in the TEMP tablespace.
INDEX
Indexes are pointres that maps to the physical address of data. So by using indexes data manipulation becomes faster.
An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed columns.
ANALOGY:
Suppose in a shop, assume you have multiple racks. Categorizing each rack based on the items saved is like creating an index. So, you would know where exactly to look for to find a particular item. This is indexing.
In the same shop, you want to know multiple data, say, the Products, inventory, Sales data and stuff as a consolidated report, then it can be compared to a view.
Hope this analogy explains when you have to use a view and when you have to use an index!
Both are different things in the perspective of SQL.
VIEWS
A view is nothing more than a SQL statement that is stored in the database with an associated name. A view is actually a composition of a table in the form of a predefined SQL query.
Views, which are kind of virtual tables, allow users to do the following:
A view can contain all rows of a table or select rows from a table. A view can be created from one or many tables which depends on the written SQL query to create a view.
Structure data in a way that users or classes of users find natural or intuitive.
Restrict access to the data such that a user can see and (sometimes) modify exactly what they need and no more.
Summarize data from various tables which can be used to generate reports.
INDEXES
While Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.
For example, if you want to reference all pages in a book that discuss a certain topic, you first refer to the index, which lists all topics alphabetically and are then referred to one or more specific page numbers.
An index helps speed up SELECT queries and WHERE clauses, but it slows down data input, with UPDATE and INSERT statements. Indexes can be created or dropped with no effect on the data.
view:
1) view is also a one of the database object.
view contains logical data of a base table.where base table has actual data(physical data).another way we can say view is like a window through which data from table can be viewed or changed.
2) It is just simply a stored SQL statement with an object name. It can be used in any SELECT statement like a table.
index:
1) indexes will be created on columns.by using indexes the fetching of rows will be done quickly.
2) It is a way of cataloging the table-info based on 1 or more columns. One table may contain one/more indexes. Indexes are like a 2-D structure having ROWID & indexed-column (ordered). When a table-data is retrieved based on this column (col. which are used in WHERE clause), this index gets into the picture automatically and it's pointer search the required ROWIDs. These ROWIDs are now matched with actual table's ROWID and the records from table are shown.
I am building an analytics platform where users can create reports and such against a MySQL database. Some of the tables in this database are pretty huge (billions of rows), so for all of the features so far I have indexes built to speed up each query.
However, the next feature is to add the ability for a user to define their own query so that they can analyze data in ways that we haven't pre-defined. They have full read permission to the relevant database, so basically any SELECT query is a valid query for them to enter. This creates problems, however, if a query is defined that filters or joins on a column we haven't currently indexed - sometimes to the point of taking over a minute for a simple query to execute - something as basic as:
SELECT tbl1.a, tbl2.b, SUM(tbl3.c)
FROM
tbl1
JOIN tbl2 ON tbl1.id = tbl2.id
JOIN tbl3 ON tbl1.id = tbl3.id
WHERE
tbl1.d > 0
GROUP BY
tbl1.a, tbl1.b, tbl3.c, tbl1.d
Now, assume that we've only created indexes on columns not appearing in this query so far. Also, we don't want too many indexes slowing down inserts, updates, and deletes (otherwise the simple solution would be to build an index on every column accessible by the users).
My question is, what is the best way to handle this? Currently, I'm thinking that we should scan the query, build indexes on anything appearing in a WHERE or JOIN that isn't already indexed, execute the query, and then drop the indexes that were built afterwards. However, the main things I'm unsure about are a) is there already some best practice for this sort of use case that I don't know about? and b) would the overhead of building these indexes be enough that it would negate any performance gains the indexes provide?
If this strategy doesn't work, the next option I can see working is to collect statistics on what types of queries the users run, and have some regular job periodically check what commonly used columns are missing indexes and create them.
If using MyISAM, then performing an ALTER statement on tables with large (billions of rows) in order to add an index will take a considerable amount of time, probably far longer than the 1 minute you've said for the statement above (and you'll need another ALTER to drop the index afterwards). During that time, the table will be locked meaning other users can't execute their own queries.
If your tables use the InnoDB engine and you're running MySQL 5.1+, then CREATE / DROP index statements shouldn't lock the table, but it still may take some time to execute.
There's a good rundown of the history of ALTER TABLE [here][1].
I'd also suggest that automated query analysis to identify and build indeces would quite difficult to get right. For example, what about cases such as selecting by foo.a but ordering by foo.b? This kind of query often needs a covering index over multiple columns, otherwise you may find your server tries a filesort on a huge resultset which can cause big problems.
Giving your users an "explain query" option would be a good first step. If they know enough SQL to perform custom queries then they should be able to analyse EXPLAIN in order to best execute their query (or at least realise that a given query will take ages).
So, going further with my idea, I propose you segment your datas into well identified views. You used abstract names so I can't reuse your business model, but I'll take a virtual example.
Say you have 3 tables:
customer (gender, social category, date of birth, ...)
invoice (date, amount, ...)
product (price, date of creation, ...)
you would create some sorts of materialized views for specific segments. It's like adding a business layer on top of the very bottom data representation layer.
For example, we could identify the following segments:
seniors having at least 2 invoices
invoices of 2013 with more than 1 product
How to do that? And how to do that efficiently? Regular views won't help your problem because they will have poor explain plans on random queries. What we need is a real physical representation of these segments. We could do something like this:
CREATE TABLE MV_SENIORS_WITH_2_INVOICES AS
SELECT ... /* select from the existing tables */
;
/* add indexes: */
ALTER TABLE MV_SENIORS_WITH_2_INVOICES ADD CONSTRAINT...
... etc.
So now, your guys just have to query MV_SENIORS_WITH_2_INVOICES instead of the original tables. Since there are less records, and probably more indexes, the performances will be better.
We're done! Oh wait, no :-)
We need to refresh these datas, a bit like a FAST REFRESH in Oracle. MySql does not have (not that I know... someone corrects me?) a similar system, so we have to create some triggers for that.
CREATE TRIGGER ... AFTER INSERT ON `seniors`
... /* insert the datas in MV_SENIORS_WITH_2_INVOICES if it matches the segment */
END;
Now we're done!
I have few MySQL tables-these have around 300 columns and 100 million rows. These store data for log files, hence the size. I am using InnoDB engine. Few queries involving joins of these tables obviously do not work. I tried adding indices to these, but the queries do not finish at all.
I wanted to know if there is any other way to speed up performance, or some way to make the 'create index' work on the tables?
Thank you.
Creating an index takes time, proportional to the number of rows in the table. 100 million rows is quite a lot for a MySQL table. It will probably take many hours to create an index on that table. Exactly how long varies, based on other factors including your server hardware, the data type of the columns you are creating the index for, other current load on the database, etc.
One tool that can help you is pt-online-schema-change. It actually takes longer to build the index, but you can continue to read and write the original table while it's working. Test with a smaller table so you get some experience with using this tool.
You can view a webinar about this tool here: Zero-Downtime Schema Changes in MySQL (free to view, but requires registration).
Another technique is to create an empty table like your original, create the index in that table, and then start copying data from your original table into the new table gradually. If this is a log table, it's likely that you write to the table more than you read from the table, so you can probably swap the tables immediately and start logging new events immediately, and backfill it over time.
A tool like pt-archiver can help you to copy data gradually without putting too much load on the server. Simply doing INSERT INTO... SELECT is not good for your database server's health if you try to copy 100 million rows in one transaction. It also puts locks on the original table. pt-archiver works by copying just a bite-sized chunk of rows at a time, so it avoids the high cost of such a large transaction.
If you use an auto-increment primary key, take care to adjust the value to be higher than the max value in the original table before you let log events start writing to it, so you don't accidentally id values more than once.
Use
create table newtable like oldtable;
Then apply the index to the newtable while it is empty.
Then
insert into newtable select * from oldtable;
This may also take a long time to finish.
The Bug
On MySQL tables using MyISAM engine there are some problems creating a new secondary index.
A known issue with MyISAM engine, on some MySQL versions like 5.7.24 (shipped with Wamp for instance) not only causes a table scan, as expected, but need a full table rebuild when you create an index. If you just drop an index, the table is also rebuilt :-(
Ref: https://bugs.mysql.com/bug.php?id=93530
Alternative
Sometimes you cannot upgrade MySQL or can't ask for customers to do that, to run your solution. Change engine to InnoDB can lead to another problems if you don't need all features InnoDB provides.
The Index Table
So, there is an approach that consists of creating an "index table" manually, with the benefit you can filter records that you really need, as I explain below:
Imagine you have a 100M records of companies of the world on a table, where about 30M are companies of USA and 10M from Canada, plus other companies.
Each company has a COUNTRY and a STATE field, you want to index, because you need to search USA or CANADA companies by it's state.
So, in MySQL if you create an index for Country and State, all 100M records will be indexed, even with NULL states.
To solve this you create an index-table and a real index, like this:
create table index_tb_companies (
company_id int unique,
company_country char(2), -- US/CA
company_state char(2) -- AL/AK/.../WI/WY
);
create index index_tb_companies_index
on index_tb_companies (company_country, company_state);
Fill the Index Table
Now you can import original data to the index-table, with a simple insert into or replace into with a filtered select.
replace into index_tb_companies(
company_id, company_country, company_state)
(select
company_id, company_country, company_state
from original_company_table
where country in ('US', 'CA')
);
This will take a while, since maybe you don't have a index for country yet, and need a full table scan. But the final index-table size will be lower then a MySQL index size, since only US/CA data will be in there.
How to Select
Now, the final part is to make use of the index-table with your specific report of US and CA companies, since other countries are not covered by the index.
select o.*
from
original_company_table o INNER JOIN
index_tb_companies idx ON idx.company_id = o.company_id
where
idx.company_country = 'US'
and idx.company_state = 'NY'
This approach is particularly good when you want to index a tiny portion of your data on MySQL, so the index size is small.
Partial Index
Other databases, like PostgreSQL, have a "Partial Indexes", you can create regular indexes and pass a where clause on it's creation.
PG Partial Indexes: https://www.postgresql.org/docs/8.0/indexes-partial.html
Like and share this solution if you learn from it, I'm producing some material about databases and appreciate the feedback.
This is not a problem but it belongs to site optimization. I have 110K records of hotels. When I use SELECT something query it will pulled out data from 110k records.
If I search a hotel list with more than 3 star rating, price between 100 - 300 $ and within Mexico City. Suppose I got 45 matching results.
Is there any other way when I add more refinement, it will pulled out data from just only the 45 matching and not go with the 110K data?
The key is indexes my friend... make sure you have indexes of all items used in the WHERE and this will reduce cardinality when selecting...
On a side not... 110k rows is still an extremely small data set for MySQL so shouldn't pose much of a performance issue if you haven't got correct indexing on the table anyway.
It is more depend on how often your data updates.
See.
The MySQL Query Cache
Query Caching in MySQL
Caching question MySQL or Filesystem
I am saying that is there any other way when I add more refinement, it
will pulled out data from just only the 45 matching and not go with
the 110K data.
Then make view of those 45 rows and apply query to it.
Create a view using query
Create view refined as select * from ....
And after that add more select queries to that view
like
Select * from refined where ...
Firs of all, i tend to agree with Brian, indexes matter.
Check what kind(s) of queries are most frequent, and construct multi-column indexes on the table accordingly. Note that the order of columns in the indexes does matter (as the index is a tree, first column appears in tree root, so if your query does not use that column - the whole tree is useless).
Enable slow query log to see what queries actually take long (if any), or not use indexes, so you can improve indexes over time.
Having said this, query cache is a real performance boost, if your table data is mostly read. Here is a useful article on mysql query cache.
BACKGROUND
I'm working with a MySQL InnoDB database with 60+ tables and I'm creating different views in order to make dynamic queries fast and easier in the code. I have a couple of views with INNER JOINS (without many-to-many relationships) of 20 to 28 tables SELECTING 100 to 120 columns with row count below 5,000 and it works lighting fast.
ACTUAL PROBLEM
I'm creating a master view with INNER JOINS (without many-to-many relationships) of 34 tables and SELECTING about 150 columns with row count below 5,000 and it seems like it's too much. It takes forever to do a single SELECT. I'm wondering if I hit some kind of view-size limit and if there is any way of increasing it, or any tricks that would help me pass through this apparent limit.
It's important to note that I'm NOT USING Aggregate functions because I know about their negative impact on performance, which, by the way I'm very concerned about.
MySql does not use the "System R algorithm" (used by Postgresql, Oracle, and SQL Server, I think), which considers not only different merge algorithms (MySQL only has nested-loop, although you can fake a hash join by using a hash index), but also the possible ways of joining the tables and possible index combinations. The result seems to be that parsing of queries - and query execution - can be very quick upto a point, but performance can dramatically drop off as the optimizer chooses the wrong path through the data.
Take a look at your explain plans and try to see if a) the drop in performance is due to the number of columns you are returning (just do SELECT 1 or something) or b) if it is due to the optimizer choosing a table scan instead of index usage.
A view is just a named query. When you refer to a view in MySQL it just replaces the name with the actual query and run it.
It seems that you confuse it with materialized views, which are tables you create from a query. Afterwards you can query that table, and does not have to do the original query again.
Materialized views are not implemented in MySQL.
To improve the performance try to use the keyword explain to see where you can optimize your query/view.