How to use FULLTEXT index in MySQL? - mysql

Suppose that I have two tables stuff and search_index. For various reasons I would prefer to generate a custom index table with schema along the lines of the following.
I believe that this has to be a separate table as it would require MyISAM storage engine for FULLTEXT support. All of my other tables are InnoDB for transaction support.
search_index (MyISAM):
-----------------------
stuff_id BIGINT PRIMARY
keywords TEXT FULLTEXT ; Not same as text in `stuff`
; (optimised & space delimited)
How often should search_index table be updated:
Whenever stuff records are created or updated.
Periodically scan "dirty" stuff records and update search_index accordingly.
Other...
My view is that the benefit of 1 would be easier to maintain and search becomes effective immediately but with an immediate re-index in database. 2 would not be as effective but all reindexing can happen in one hit. Is this true?
How efficient is MySQL at inserting/updating records that have FULLTEXT indexing?
Additional Note: I am trying to keep the schema as portable as possible (different rdbms drivers with PDO)

Related

Create index and then insert or insert and then create index?

I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
Create an index and then insert all data
Insert all data and then create an index
thanks in advance!
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.

In mysql/mariadb is index stored database level or in table level?

I'm in the process of moving an sql server database to mariadb.
In that i'm now doing the index naming, and have to modify some names because they are longer than 64 chars.
That got me wondering, do in mariadb the indexes get stored on the table level or on the database level like on sql server?
To rephrase the question in another way, do index name need to be unique per database or per table?
The storage engine I'm using is innoDB
Index names (in MySQL) are almost useless. About the only use is for DROP INDEX, which is rarely done. So, I recommend spending very little time on naming indexes. The names only need to be unique within the table.
The PRIMARY KEY (which has no other name than that) is "clustered" with the data. That is, the PK and the data are in the same BTree.
Each secondary key is a separate BTree. The BTree is sorted according to the column(s) specified. The leaf node 'records' contain the columns of the PK, thereby providing a way to get to the actual record.
FULLTEXT and SPATIAL indexes work differently.
PARTITIONing... First of all, partitioning is rarely useful. But if you have any partitioned tables, then here are some details about indexes. A Partitioned table is essentially a collection of sub-tables, each identical (including index names). There is no "global index" across the table; each index for a sub-table refers only to the sub-table.
Keys belong to a table, not a database.

Why my mysql table has to optimize frequently

I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.

MYSQL innodb or myisam? Which one is best for this case?

I'm really lost on which type of database engine should I pick for my table.
+-----------------------+
| id | userid | content |
+-----------------------+
Imagine this table. userid is holding user ids which are stored in another table. Also, some other tables are using the id field of this table. Therefore, I thought that setting id as primary key and userid as a foreign key would speed up the join processes. However, if I select my table as InnoDB to set foreign keys, then I cannot conduct a FULLTEXT search on content (which is a TEXT field).
So basically, if I switch back to MyISAM to use the FULLTEXT searches, will I have problems when joining, say, 3-4 tables of hundreds of millions of rows?
PS: If there is another liable way to create tables to handle both joins and fulltexts, please tell me so, I can change the tables structure as well.
Take a look at the answer for this question: Fulltext Search with InnoDB
In short, MyISAM locks an entire table when you write to it, so that will be bad for performance when you have a lot of writes to the table. The solution is to go for the InnoDB tables for the referential integrity, and use a dedicated search engine for the indexing/searchfing of the content (for example Lucene).
InnoDB scales better than MyISAM. If you're talking about hundreds of millions of row then go for InnoDB and adapt a search engine. AFAIK, FULLTEXT becomes really slow after a certain point. Therefore, go for InnoDB + a search engine of your choice.

When should you choose to use InnoDB in MySQL?

I am rather confused by the hurt-mongering here.
I know how to do them, see below, but no idea why? What are they for?
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no)) type=InnoDB;
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no));
InnoDB is a storage engine in MySQL. There are quite a few of them, and they all have their pros and cons. InnoDB's greatest strengths are:
Support for transactions (giving you support for the ACID property).
Row-level locking. Having a more fine grained locking-mechanism gives you higher concurrency compared to, for instance, MyISAM.
Foreign key constraints. Allowing you to let the database ensure the integrity of the state of the database, and the relationships between tables.
Always. Unless you need to use MySQL's full-text search or InnoDB is disabled in your shared webhost.
InnoDB:
The InnoDB storage engine in MySQL.
InnoDB is a high-reliability and high-performance storage engine for MySQL. Key advantages of InnoDB include:
Its design follows the ACID model, with transactions featuring commit, rollback, and crash-recovery capabilities to protect user data.
Row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent reads increase multi-user concurrency and performance.
InnoDB tables arrange your data on disk to optimize common queries based on primary keys. Each InnoDB table has a primary key index called the clustered index that organizes the data to minimize I/O for primary key lookups
To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints.
You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement. For example, you can use a join operation to combine data from InnoDB and MEMORY tables in a single query.
InnoDB Limitations:
No full text indexing (Below-5.6 mysql version)
Cannot be compressed for fast, read-only
More Details:
Refer this link
I think you are confused about two different issues, when to use InnoDB instead of MyISAM, and when to use foreign key (FK) constraints.
As for the first issue, there have been many answers that do a great job of explaining the differences between MyISAM and InnoDB. I'll just reiterate that, in tvanfosson's quote from an article, MyISAM is better suited for system with mostly reads. This is because it uses table level locking instead of row level like InnoDB, so MyISAM can't handle high concurrency as well, plus it's missing features that help with data integrity such as transactions and foreign keys (again, already mentioned by others).
You don't have to use FK constraints in your data model. If you know what the relationships between your tables is, and your application is free of bugs, then you'll get by without FKs just fine. However, using FKs gives you extra insurance at the database layer because then MySQL won't let your application insert bad data based on the constraints that you created.
In case you aren't clear on why to use primary keys (PK), making a column such as id_order for example the PK of the orders table means that MySQL won't let you INSERT the same value of id_order more than once because every row in a PK column must be unique.
A FK would be used on a table that has a dependency on another table, for example, order_items would have a dependency on orders (below). id_order_items is the PK of order_items and you could make id_order_items the FK of the orders table to establish a one-to-many relationship between orders and order_items. Likewise, id_item could be a FK in the order_items table and a PK in the items table to establish a one-to-many relationship between order_items and items.
**Then, what the FK constraint does is prevent you from adding an id_item value to theorder_itemstable that isn't in theitemstable, or from adding aid_order_itemstoordersthat isn't in theorder_items `table.
All a FK does is insure data integrity, and it also helps convey relationships among your tables to other developers that didn't write the system (and yourself months later when you forget!), but mainly it's for data integrity.**
Extra credit: so why use transactions? Well you already mentioned a quote that says that they are useful for banking system, but they are useful in way more situations than that.
Basically, in a relational database, especially if it's normalized, a routine operation such as adding an order, updating an order, or deleting an order often touches more than 1 table and/or involves more than one SQL statement. You could even end up touching the same table multiple times (as the example below does). Btw Data Manipulation Language (DML) statements (INSERT/UPDATE/DELETE) only involve one table at a time.
An example of adding an order:
I recommend an orders table and an order_items table. This makes it so can you can have a PK on the id_order in the orders table, which means id_order cannot be repeated in orders. Without the 1-to-many orders - order_items relationship, you'd have to have multiple rows in the orders table for every order that had multiple items associated with it (you need an items table too for this e-commerce system btw). This example is going to add an order and touch 2 tables in doing so with 4 different INSERT statements.
(no key constraints for illustration purposes)
-- insert #1
INSERT INTO orders (id_order, id_order_items, id_customer)
VALUES (100, 150, 1)
-- insert #2
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 1)
-- insert #3
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 2)
-- insert #4
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 3)
So what if the insert #1 and insert #2 queries run successfully, but the insert #3 statement did not? You'd end up with an order that was missing an item, and that would be garbage data. If in that case, you want to rollback all the queries so the database is in the same state it was before adding the order and then start over, well that's exactly what transactions are for. **You group together queries that you either want all of them done, or in case of an exception, then none at all, into a transaction.
So like PK/FK constraints, transactions help insure data integrity.**
In your example, you create foreign keys. Foreign keys are only supported for InnoDB tables, not for MyISAM tables.
You may be interested in this article from Database Journal which discusses the InnoDB table type in MySQL.
Excerpt:
Last month we looked at the HEAP table
type, a table type which runs entirely
in memory. This month we look at
setting up the InnoDB table type, the
type of most interest to serious
users. The standard MyISAM table type
is ideal for website use, where there
are many reads in comparison to
writes, and no transactions. Where
these conditions do not apply (and
besides websites, they do not apply
often in the database world), the
InnoDB table is likely to be the table
type of choice. This article is aimed
at users who are familiar with MySQL,
but have only used the default MyISAM
table type.
I wouldn't be put off by the other question. Keep proper backups of your database, of any type -- and don't drop tables by accident ;-) -- and you'll be ok whatever table type you choose.
In general for me the most important point is that InnoDB offers per row locking, while MyISAM does look per table. On big tables with a lot of writes this might make a big performance issue.
On the otherhand MyISAM table have a easier file structure, copying and repairing table on file level is way easier.
A comment has a command to convert your databases to InnoDB here.
Everywhere! Deprecate myisam, innodb is the way to go. Is not only about performance, but data integrity and acid transactions.
A supplement to Machine and knoopx's answer about transactions:
The default MySQL
table type, MyISAM, does not support
transactions. BerkeleyDB and InnoDB
are the transaction-safe table types
available in open source MySQL,
version 3.23.34 and greater.
The definition of the transaction and an banking example
A transaction is a sequence of
individual database operations that
are grouped together. -- A good
example where transactions are useful
is in banking.
Source of the citations