Mysql FULLTEXT index, search locks table - mysql

Consider this scenario, my database table has 300000 rows and has a fulltext index. Whenever a search is done it locks the database and doesn't allow anyone else to login to the portal.
Any advice on how to get things sorted out here will be really appreciable.

Does logging on perform a write to the table? eg. a 'last visit' time?
If so you may expect behaviour something like this because MyISAM writes do a lock over the entire table. Usually this is avoided by not using noddy MyISAM and going to InnoDB instead, which has row-level locking (amongst other desirable database features).
The problem, of course, is that you only get fulltext search with MyISAM.
So you'll need to split your tables up. If you can keep the read-heavy and fulltext stuff in a different table to the stuff that needs writing (but linked using the same primary key), you can probably make it so that the two operations don't affect each other.
Better, migrate the bulk of the table to InnoDB, leaving only a fulltext field in MyISAM. Everything except fulltext searches can then steer clear of the MyISAM table, and use only the InnoDB table which exhibits much better locking performance. Personally, I now tend to store everything in the InnoDB table, including the text, and store a second copy of the text in the MyISAM table purely for fulltext searchbait purposes; this simplifies queries and code and brings the advantages of InnoDB's consistency to the text content, and I also use it to process the searchbait to get stemming and other features MySQL's fulltext doesn't normally support. But it does mean you have to spend a lot more space on storage.
You can also improve matters by cutting down number of writes. For example if it is a 'last visit' timestamp you're writing, you can avoid writing that unless, say, a minute has passed between the previous time and now, on the basis that no-one needs to know the exact second someone last accessed the site.

If you use an external search engine or MySQL search plug-ins Lucene or Sphinx, they should be able to read and index without locking the table. They store a local version of the indexed records, so they don't have to read the table very often, and never need to write to it.

Related

Mysql Xtradb build indexes by sort instead of via insertion

This post says:
If you’re running Innodb Plugin on Percona Server with XtraDB you get
benefit of a great new feature – ability to build indexes by sort
instead of via insertion
However I could not find any info on this. I'd like to have an ability to reorganize how a table is laid out physically, similar to Postgre CLUSTER command, or MyISAM "alter table ... order by". For example table "posts" has millions of rows in random insertion order, most queries use "where userid = " and I want the table to have rows belonging to one user physically separated nearby on disk, so that common queries require low IO. Is it possible with XtraDB?
Clarification concerning the blog post
The feature you are basically looking at is fast index creation. This features speeds up the creation of secondary indexes to InnoDB tables, but it is only used in very specific cases. For example the feature is not used while OPTIMIZE TABLE, which can therefore be dramatically speed up by dropping the indexes first, then run OPTIMIZE TABLE and then recreate the indexes with fast index creation (about this was the post you linked).
Some kind of automation for the cases, which can be improved by using this feature manually like above, was added to Percona Server as a system variable named expand_fast_index_creation. If activated, the server should use fast index creation not only in the very specific cases, but in all cases it might help, such as OPTIMIZE TABLE — the problem mentioned in the linked blog article.
Concerning your question
Your question was actually if it is possible to save InnoDB tables in a custom order to speed up specific kind of queries by exploiting locality on the disk.
This is not possible. InnoDB rows are saved in pages, based on the clustered index (which is essentially the primary key). The rows/pages might be in chaotic ordering, for which one can OPTIMIZE TABLE the InnoDB table. With this command the table is actually recreated in primary key order. This allows to gather primary key local rows on the same or neighboring pages.
That is all you can force InnoDB to do. You can read the manual about clustered index, another page in the manual as a definite answer that this is not possible ("ORDER BY does not make sense for InnoDB tables because InnoDB always orders table rows according to the clustered index.") and the same question on dba.stackexchange which answers might interest you.

How should I set up these tables for searching?

My PHP site is an online store with about 5k products. Products belong to a vendor, a category, and possibly a subcategory. Each of those items has a name and the products have descriptions.
The search queries we've set up work wonderfully, but tend to run pretty slow. They range between 0.20s and 30s (yes 30 seconds). We've optimized like crazy and I'm starting to think we're out of room to improve on that front, so we're caching them and that's making life a lot easier.
But when they run they are still killing the server, because what appears to be all of the table locking that comes with MyISAM.
So on to my question: Is there a way for us to use InnoDB (row-level locking) and still maintain FULLTEXT? Should we move our DB offsite and use a service like DB2? Is there some other search engine type software we should use instead?
Any help is greatly appreciated :)
InnoDB does have full-text indexing now: http://blogs.innodb.com/wp/2011/07/innodb-full-text-search-tutorial/
but since it's basically brand new, most MySQL installs will not support it yet.
The standard workaround is to have a 'mirror' MyISAM table that contains copies of the searchable data with a fulltext index. You then join the original InnoDB table against the MyISAM copies, with fulltext searches on the myisam fields and regular other 'where' clauses on the innodb copies.
With appropriate triggers on the InnoDB table, there's no reason that the MyISAM copies would get stale/incorrect, or you could simply rebuild them on a scheduled basis so that you've got a staleness window that matches the rebuild interval.

Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?

So, one of my tables in MySQL which uses the InnoDB storage engine will contain multi-billion rows(with potentially no limit to how many will be inserted).
Can you tell me what sort of optimizations i can do to help speed up things?
Cause with a few million rows already, it will start getting slow.
Of course if you suggest to use something else. The only options i have are PostgreSQL and Sqlite3. But I've been told that sqlite3 is not a good choice for that.
As for postgresql, i have absolutely no idea how it is, as i've never used it.
I imagine though, at least about 1000-1500 inserts per second in that table.
A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.
There is a host of optimization that is possbile.
The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.
Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).
With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.
As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.
One other thing that you can do is partition the table.
You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).
If you need any specific advice you can contact me, I would be more than willing to help.
optimizations
Take care not to have too many indexes. They are expensive when inserting
Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)
MySQL as of version 5.4 does not
support partial indexes.[3] In MySQL,
the term "partial index" is sometimes
used to refer to prefix indexes, where
only a truncated prefix of each value
is stored in the index. This is
another technique for reducing index
size.[4]
No idea on the MySQL/InnoDB part (I'd assume it'll cope). But if you end up looking at alternatives, PostgreSQL can manage a DB of unlimited size on paper. (At least one 32TB database exists according to the FAQ.)
Can you tell me what sort of optimizations i can do to help speed up things?
Your milage will vary depending on your application. But with billions of rows, you're at least looking into partitioning your data, in order to work on smaller tables.
In the case of PostgreSQL, you'd also look into creating partial indexes where appropriate.
You may want to have a look at:
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
http://forums.whirlpool.net.au/archive/954126
If you have a very large table (Billions of records) and need to data mine the table (queries that read lots of data), mysql can slow to a crawl.
Large databases (200+GB) are fine, but they are bound by IO/ temp table to disk and multiple other issues when attempting to read large groups that don't fit in memory.

InnoDB or MyISAM - Why not both?

I've read various threads about which is better between InnoDB and MyISAM. It seems that the debates are to use or the other. Is it not possible to use both, depending on the table?
What would be the disadvantages in doing this? As far as I can tell, the engine can be set during the CREATE TABLE command. Therefore, certain tables which are often read can be set to MyISAM, but tables that need transaction support can use InnoDB.
You can have both MyISAM and InnoDB tables in the same database. What you'll find though, when having very large tables is, MyISAM would cause table-lock issues. What this ultimately does is, it locks the whole table for a single transaction, which is very bad if you have many users using your website. e.g If you have a user searching for something on your site and the query takes minutes to complete, no other users could use your site during that period because the whole table is locked.
InnoDB on the other hand uses row-level locking, meaning, that row is the only row locked from the table during a transaction. InnoDB can be slower at searches because it doesn't offer full text search like MyISAM, but that isn't a big problem when you compare it to table-level locking of MyISAM. If you use InnoDB, like many large sites, then you could use a server side search engine like Sphinx for full text searches. Or you could even use a MyISAM table to do the searching like f00 suggested. I'd personally recommended InnoDB mainly because of the row-level locking feature, but also because you can implement full text searching in many other ways.
Basically, if you have a message board application with lots of selects, inserts as well as updates, InnoDB is probably the generally appropriate choice.
But if you're not building something like that (or any other thing with registered users) and your working mostly with static content (or more reads than writes), then you could use MyISAM.
Yes indeed you may use both in the same database, you may choose for each table separately.
In short, InnoDB is good if you are working on something that needs a reliable database that can handles a lot of INSERT and UPDATE instructions.
and, MyISAM is good if you needs a database that will mostly be taking a lot of read (SELECT) instructions rather than write (INSERT and UPDATES), considering its drawback on the table-lock thing.
you may want to check out;
Pros and Cons of InnoDB
Pros and Cons of MyISAM
You don't choose InnoDB or MyISAM on a database level, but instead on a table level. So within the one database you could have some tables running the InnoDB engine and some running MyISAM. As you pointed out, you could choose to use InnoDB on the tables that require transactions etc, and MyISAM where you need other features such as fulltext searching.

Does this case call for InnoDB or MyISAM?

I'm doing a search on a table (few inner joins) which takes max 5 seconds to run (7.5 million rows). It's a MyISAM table and I'm not using full-text indexing on it as I found there to be no difference in speed when using MATCH AGAINST and a normal "like" statement in this case from what I can see.
I'm now "suffering" from locked tables and queries running for several minutes before they complete because of it.
Would it benefit me at all to try and switch the engine to InnoDB? Or does that only help if I need to insert or update rows... not just select them? This whole table-locking thing is busy grinding my balls...
InnoDB supports row-level locking instead of table-level locking... so that should alleviate your problem (although I'm not sure it will remove it entirely).
Your best bet would be to use a dedicated search system (like Sphinx, Lucene, or Solr)
The difference between row-level and table-level locking is only important for insert and update queries. If you're mostly do selects (so the inserts/updates do not happen too often to lock the table) the difference will not be all that much (even though in recent benchmarks InnoDB seems to be outperforming MyISAM).
Other ways you could think about is to reorganise your data structure, perhaps including additional lookup table with 'tags' or 'keywords'. Implementing more efficient full text engine as suggested by webdestroya.
Last but not least, I'm also surprised that you got similar results with FULL TEXT vs LIKE. This could happen if the fields you're searching are not really wide, in which case maybe a stndard B-TREE index with = search would be enough?