mysql search optimization with hex function - mysql

I have a INNODB table and need to search in a VARCHAR column but searching is very very slow. I cannot add a FULLTEXT index :( and MYISAM is not an option.
Some guys gave me advice to use HEX like the code below.
Is that right? Does it perform better? I don't see any progress in my application.
SELECT *
FROM order_line
WHERE HEX(description) LIKE '%6B616C6B7A616E64737465656E%'

Even if the table in question requires InnoDB and you are restricted to an outdated MySQL version for some reason, there are at least two possible improvements (and a third that builds on one of them):
Copy the primary key(s) and the searchable text column(s) into a MyISAM table with a fulltext index and run the searches against that. This will obviously require you to update the search table every time the original is updated.
You can create cache tables storing information about the executed searches and their hits for later reuse (one to track the cached searches and the other to cache the actual results). Again, this will require you to periodically update the cache or purge it completely.
Since you're using LIKE to search, you could use solution #2 with an additional hack: first check if there are cached searches for substrings of the current search query and limit the scope of your search to items that matched those previous searches only.

Related

MySQL indexing has no speed effect through PHP but does on PhpMyAdmin

I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)

Mysql Xtradb build indexes by sort instead of via insertion

This post says:
If you’re running Innodb Plugin on Percona Server with XtraDB you get
benefit of a great new feature – ability to build indexes by sort
instead of via insertion
However I could not find any info on this. I'd like to have an ability to reorganize how a table is laid out physically, similar to Postgre CLUSTER command, or MyISAM "alter table ... order by". For example table "posts" has millions of rows in random insertion order, most queries use "where userid = " and I want the table to have rows belonging to one user physically separated nearby on disk, so that common queries require low IO. Is it possible with XtraDB?
Clarification concerning the blog post
The feature you are basically looking at is fast index creation. This features speeds up the creation of secondary indexes to InnoDB tables, but it is only used in very specific cases. For example the feature is not used while OPTIMIZE TABLE, which can therefore be dramatically speed up by dropping the indexes first, then run OPTIMIZE TABLE and then recreate the indexes with fast index creation (about this was the post you linked).
Some kind of automation for the cases, which can be improved by using this feature manually like above, was added to Percona Server as a system variable named expand_fast_index_creation. If activated, the server should use fast index creation not only in the very specific cases, but in all cases it might help, such as OPTIMIZE TABLE — the problem mentioned in the linked blog article.
Concerning your question
Your question was actually if it is possible to save InnoDB tables in a custom order to speed up specific kind of queries by exploiting locality on the disk.
This is not possible. InnoDB rows are saved in pages, based on the clustered index (which is essentially the primary key). The rows/pages might be in chaotic ordering, for which one can OPTIMIZE TABLE the InnoDB table. With this command the table is actually recreated in primary key order. This allows to gather primary key local rows on the same or neighboring pages.
That is all you can force InnoDB to do. You can read the manual about clustered index, another page in the manual as a definite answer that this is not possible ("ORDER BY does not make sense for InnoDB tables because InnoDB always orders table rows according to the clustered index.") and the same question on dba.stackexchange which answers might interest you.

Optimizing php/mysql translation lookup with huge database and hash indexes

I'm currently using a utf8 mysql database. It checks if a translation is already in the database and if not, it does a translation and stores it in the database.
SELECT * FROM `translations` WHERE `input_text`=? AND `input_lang`=? AND `output_lang`=?;
(The other field is "output_text".) For a basic database, it would first compare, letter by letter, the input text with the "input_text" "TEXT" field. As long as the characters are matching it would keep comparing them. If they stop matching, it would go onto the next row.
I don't know how databases work at a low level but I would assume that for a basic database, it would search at least one character from every row in the database before it decides that the input text isn't in the database.
Ideally the input text would be converted to a hash code (e.g. using sha1) and each "input_text" would also be a hash. Then if the database is sorted properly it could rapidly find all of the rows that match the hash and then check the actual text. If there are no matching hashes then it would return no results even though each row wasn't manually checked.
Is there a type of mysql storage engine that can do something like this or is there some additional php that can optimize things? Should "input_text" be set to some kind of "index"? (PRIMARY/UNIQUE/INDEX/FULLTEXT)
Is there an alternative type of database that is compatible with php that is far superior than mysql?
edit:
This talks about B-Tree vs Hash indexes for MySQL:
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
None of the limitations for hash indexes are a problem for me. It also says
They are used only for equality comparisons that use the = or <=> operators (but are very fast)
["very" was italicized by them]
NEW QUESTION:
How do I set up "input_text" TEXT to be a hash index? BTW multiple rows contain the same "input_text"... is that alright for a hash index?
http://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
Says "The MEMORY storage engine uses HASH indexes by default" - does that mean I've just got to change the storage engine and set the column index to INDEX?
A normal INDEX clause should be enough (be sure to index all your fields, it'll be big on disk, but faster). FULLTEXT indexes are good when you're using LIKE clauses ;-)
Anyway, for that kind of lookups, you should use a NoSQL store like Redis, it's blazingly fast and has an in-memory store and also does data persistence through snapshots.
There is an extension for php here : https://github.com/nicolasff/phpredis
And you'll have redis keys in the following form: YOUR_PROJECT:INPUT_LANG:WORD:OUTPUT_LANG for better data management, just replace each value with your values and you're good to go ;)
An index will speed up the lookups a lot.
By default indexes in InnoDB and MyISAM use search trees (B-trees). There is a limitation on the length of the row the index so you will have to index only the 1-st ~700 bytes of text.
CREATE INDEX txt_lookup ON translations (input_lang, output_lang, input_text(255));
This will create an index on input_lang, output_lang and the 1-st 255 characters of input_text.
When you select with your example query MySQL will use the index to find the rows with the appropriate languages and the same starting 255 characters quickly and then it will do the slow string compare with the full length of the column on the small set of rows which it got from the index.

How can I access MySQL InnoDB index values directly without the MySQL client?

I've got an index on columns a VARCHAR(255), b INT in an InnoDB table. Given two a,b pairs, can I use the MySQL index to determine if the pairs are the same from a c program (i.e. without using a strcmp and numerical comparison)?
Where is a MySQL InnoDB index stored in the file system?
Can it be read and used from a separate program? What is the format?
How can I use an index to determine if two keys are the same?
Note: An answer to this question should either a) provide a method for accessing a MySQL index in order to accomplish this task or b) explain why the MySQL index cannot practically be accessed/used in this way. A platform-specific answer is fine, and I'm on Red Hat 5.8.
Below is the previous version of this question, which provides more context but seems to distract from the actual question. I understand that there are other ways to accomplish this example within MySQL, and I provide two. This is not a question about optimization, but rather of factoring out a piece of complexity that exists across many different dynamically generated queries.
I could accomplish my query using a subselect with a subgrouping, e.g.
SELECT c, AVG(max_val)
FROM (
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b) AS t
GROUP BY c
But I've written a UDF that allows me to do it with a single select, e.g.
SELECT b, MY_UDF(a, b, val)
FROM table
GROUP by c
The key here is that I pass the fields a and b to the UDF, and I manually manage a,b subgroups in each group. Column a is a varchar, so this involves a call to strncmp to check for matches, but it's reasonably fast.
However, I have an index my_key (a ASC, b ASC). Instead of checking for matches on a and b manually, can I just access and use the MySQL index? That is, can I get the index value in my_key for a given row or a,b pair in c (inside the UDF)? And if so, would the index value be guaranteed to be unique for any value a,b?
I would like to call MY_UDF(a, b, val) and then look up the mysql index value (a,b) in c from the UDF.
Look back at your original query
SELECT c, AVG(max_val)
FROM
(
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b
) AS t
GROUP BY c;
You should first make sure the subselect gives you what you want by running
SELECT c, MAX(val) AS max_val
FROM table
GROUP BY a, b;
If the result of the subselect is correct, then run your full query. If that result is correct, then you should do the following:
ALTER TABLE `table` ADD INDEX abc_ndx (a,b,c,val);
This will speed up the query by getting all needed data from the index only. The source table never needs to be consulted.
Writing a UDF is and calling it a single SELECT is just masquerading a subselect and creating more overhead than the query needs. Simply placing your full query (one nested pass over the data) in the Stored Procedure will be more effective that getting most of the data in the UDF and executing single row selects iteratively ( something like O(n log n) running time with possible longer Sending data states).
UPDATE 2012-11-27 13:46 EDT
You can access the index without touching the table by doing two things
Create a decent Covering Index
ALTER TABLE table ADD INDEX abc_ndx (a,b,c,val);
Run the SELECT query I mentioned before
Since the all the columns of the query all in the index, the Query Optimizer will only touch the index (or precache index pages). If the table is MyISAM, you can ...
setup the MyISAM table to have a dedicated key cache that can be preloaded on mysqld startup
run SELECT a,b,c,val FROM table; to load index pages into MyISAM's default keycache
Trust me, you really do not want to access index pages against mysqld's will. What do I mean by that?
For MyISAM, the index pages for a MyISAM table are stored in the .MYI file of the table. Each DML statement will summon a full table lock.
For InnoDB, the index pages are loaded into the InnoDB Buffer Pool. Consequently, the associated data pages will load into the InnoDB Buffer Pool as well.
You should not have to circumvent access to index pages using Python, Perl, PHP, C++, or Java because of the constant I/O needed by MyISAM or the constant MVCC protocols being exercised by InnoDB.
There is a NoSQL paradigm (called HandlerSocket) that would permit low-level access to MySQL tables that can cleanly bypass mysqld's normal access patterns. I would not recommend it since there was a bug in it when using it to issue writes.
UPDATE 2012-11-30 12:11 EDT
From your last comment
I'm using InnoDB, and I can see how the MVCC model complicates things. However, apparently InnoDB stores only one version (the most recent) in the index. The access pattern for the relevant tables is write-once, read-many, so if the index could be accessed, it could provide a single, reliable datum for each key.
When it comes to InnoDB, MVCC is not complicating anything. It can actually become your best friend provided:
if you have autocommit enabled (It should be enabled by default)
the access pattern for the relevant tables is write-once, read-many
I would expect the accessed index pages to be sitting in the InnoDB Buffer Pool virtually forever if it is read repeatedly. I would just make sure your innodb_buffer_pool_size is set high enough to hold necessary InnoDB data.
If you just want to access an index outside of MySQL, you will have to use the API for one of the MySQL storage engines. The default engine is InnoDB. See overview here: InnoDB Internals. This describes (at a very high level) both the data layout on disk and the APIs to access it. A more detailed description is here: Embedded InnoDB.
However, rather than write your own program that uses InnoDB APIs directly (which is a lot of work), you might use one of the projects that have already done that work:
HandlerSocket: gives NoSQL access to InnoDB tables, runs in a UDF. See a very informative blog post from the developer. The goal of HandlerSocket is to provide a NoSQL interface exposed as a network daemon, but you could use the same technique (and much of the same code) to provide something that would be used by a query withing MySQL.
memcached InnoDB plugin. gives memcached style access to InnoDB tables.
HailDB: gives NoSQL access to InnoDB tables, runs on top of Embedded InnoDB. see conference presentation. EDIT: HailDB probably won't work running side-by-side with MySQL.
I believe any of these can run side-by-side with MySQL (using the same tables live), and can be used from C, so they do meet your requirements.
If you can use/migrate to MySQL Cluster, see also NDB API, a direct API, and ndbmemcache, a way to access MySQL Cluster using memcache API.
This is hard to answer without knowing why you are trying to do this, because the implications of different approaches are very different.
You probably cannot access the key directly.
I don't think this would actually make any difference performance-wise.
If you set covering indizes in the right order MySQL will not fetch a single page from the hard disk but deliver the result directly out of the index. There's nothing faster than this.
Note that your subselect may end up in a temptable on disk if its result is getting larger than your tmp_table_size or max_heap_table_size.
Check the status of Created_tmp_tables_disk_tables if you're not sure.
More on how MySQL is using internal temporary tables you find here
http://dev.mysql.com/doc/refman/5.5/en/internal-temporary-tables.html
If you want, post your table structure for a review.
No. There is no practical way to make use of a MySQL index, from within a C program, accessing a MySQL index in a means other than the MySQL engine, to check whether two (a,b) pairs (keys) are the same or not.
There are more practical solutions which don't require accessing MySQL datafiles outside of the MySQL engine or writing a user-defined function.
Q: Do you know where the mysql index is stored in the file system?
The location the index within the file system is going to depend on the storage engine for the table. For MyISAM engine, the indexes are stored in .MYI files under the datadir/database directory; InnoDB indexes are stored within an InnoDB managed tablespace file. f innodb_file_per_table variable was set when the table was created, there will be a separate .ibd file for each table under the innodb_data_home_dir/database subdirectory.
Q: Do you know what the format is?
The storage format of each storage engine is different, MyISAM, InnoDB, et al., and also depends on the version. I have some familiarity with how the data is stored, in terms of what MySQL requires of the storage engine. Detailed information about the internals would be specific to each engine.
Q: What makes it impractical?
It's impractical because it's a whole lot of work, and it's going to be dependent on details of storage engines that are likely to change in the future. It would be much more practical to define the problem space, and to write a SQL statement that would return what you want.
As Quassnoi pointed out in his comment to your question, it's not at all clear what particular problem you are trying to solve by creating a UDF or accessing MySQL indexes from outside of MySQL. I'm certain that Quassnoi would have a good way to accomplish what you need with an efficient SQL statement.

Mysql FULLTEXT index, search locks table

Consider this scenario, my database table has 300000 rows and has a fulltext index. Whenever a search is done it locks the database and doesn't allow anyone else to login to the portal.
Any advice on how to get things sorted out here will be really appreciable.
Does logging on perform a write to the table? eg. a 'last visit' time?
If so you may expect behaviour something like this because MyISAM writes do a lock over the entire table. Usually this is avoided by not using noddy MyISAM and going to InnoDB instead, which has row-level locking (amongst other desirable database features).
The problem, of course, is that you only get fulltext search with MyISAM.
So you'll need to split your tables up. If you can keep the read-heavy and fulltext stuff in a different table to the stuff that needs writing (but linked using the same primary key), you can probably make it so that the two operations don't affect each other.
Better, migrate the bulk of the table to InnoDB, leaving only a fulltext field in MyISAM. Everything except fulltext searches can then steer clear of the MyISAM table, and use only the InnoDB table which exhibits much better locking performance. Personally, I now tend to store everything in the InnoDB table, including the text, and store a second copy of the text in the MyISAM table purely for fulltext searchbait purposes; this simplifies queries and code and brings the advantages of InnoDB's consistency to the text content, and I also use it to process the searchbait to get stemming and other features MySQL's fulltext doesn't normally support. But it does mean you have to spend a lot more space on storage.
You can also improve matters by cutting down number of writes. For example if it is a 'last visit' timestamp you're writing, you can avoid writing that unless, say, a minute has passed between the previous time and now, on the basis that no-one needs to know the exact second someone last accessed the site.
If you use an external search engine or MySQL search plug-ins Lucene or Sphinx, they should be able to read and index without locking the table. They store a local version of the indexed records, so they don't have to read the table very often, and never need to write to it.