The webpage in question is https://www.christart.com/poetry/
I have a MySQL table with little over 7,000 records of poems entries. I'm getting requests from my users to be able to run queries against they body of the poems. But they are saved in a 'text' column.
I know how to write the SQL statement. That's easy enough. My concern is the load on the database. I always index columns that are queried or join on. But can't index a 'text' column.
There must be a way. How should I approach this?
You could use a full text index:
CREATE FULLTEXT INDEX poem_contents ON poems(body);
And then search using match:
SELECT *
FROM poems
WHERE MATCH(body) AGAINST ('some phrase' IN BOOLEAN MODE)
There's no reason that you can't index a text field. That being said, there's probably very little value in indexing a text field that's containing entire poems.
If your database only has 7,000 rows, you probably won't see a massive performance hit unless you scale much larger than it currently is. For a larger scale, a better solution would probably be to extract keywords from the body and search on those.
I think you must explore Apache Lucene or similar kind of project which provide full text search. Alternatively you can check mongodb instead of mysql. It got number of index types. There are also Solr/ElasticSearch which at back uses Lucene.
Poem body, I assume, it will be stored in varchar type. I dont know indexing possible on varchar or not & dont think it wise to indexing entire poem body. Something like Lucene/Solr provides better option.
Please note, I am not related to any of the product mentioned above.
Related
So I have used MySQL a lot in small projects, for school; however, I'm not taking over a enterprise-ish scale project, and now speed matters, not just getting the right information back. I have Googled around a lot trying to learn how indexes might make my website faster, and I am hoping to further understand how they work, not just when to use them.
So, I find myself doing a lot of SELECT DISTINCTS in order to get all the distinct values, so i can populate my dropdowns. I have heard that this would be faster if this column was indexed; however, I don't completely understand why. If the values in this columns were ints, I would totally understand; basically a data structure like a BST would be created, and search times could be Log(n); however, if my column is strings, how can it put a string in a BST? This doesn't seem possible, since there is no metric to compare a string against another string (like there are with numbers). It seems like an index would just create a list of all the possible values for that column, but it seems as if the search would still require the database to go through every single row, making this search linear, just like if the database just scanned a regular tables.
My second question is what does the database do once it finds the right value in the index data structure. For example, let's say I'm doing a where age = 42. So, the database goes through the data structure until it finds 42, but how does it map that lookup to the whole row? Does the index have some sort of row number associated with it?
Lastly, if I am doing these frequent SELECT DISTINCT statements, is adding an index going to help? I feel like this must be a common task for websites, as many sites have dropdowns where you can filter results, I'm just trying to figure out if I'm approaching it the right way.
Thanks in advance.
You logic is good, however, your assumption that there is no metric to compare string to other strings is incorrect. Strings can simply be compared in alphabetical order, giving them a perfectly usable comparison metric that can be used to build the index.
It takes a tiny bit longer to compare strings then it does ints, however, having an index still speeds things up, regardless of the comparison cost.
I would like to mention however that if you are using SELECT DISTINCT as much as you say, there are probably problems with your database schema.
You should learn about normalizing your database. I recommend starting with this link: http://databases.about.com/od/specificproducts/a/normalization.htm
Normalization will provide you with querying mechanism that can vastly outweigh benefits received from indexing.
if your strings are something small like categories, then an index will help. If you have large chunks of random text, then you will likely want a full text index. If you are having to use select distinct a lot, your database may not be properly normalized for what you are doing. You could also put the distinct values in a separate table (that only has the distinct values), but this only helps if the content does not change a lot. Indexing strategies are particular to your application's access patterns, the data itself, and how the tables are normalized (or not).
HTH
I use InnoDB on RDS, which unfortunately does not yet support MySQL full text search. I'm therefore looking into alternatives. My app is on Heroku and I have considered the various addons that provide search capabilities, but have a very large table of companies (~100M records) and I think that they are prohibitively expensive. I only need to be able to search one field on the table -- company name.
I am therefore considering creating my own 'keyword' table. Essentially this would list every word contained in every company name. There would then be another table that shows association between these keywords and the company_id.
Does this sound like a good idea? Are there any better alternatives?
What would be the most efficient way of creating the keyword table and the association table? I'd like to do it using T-SQL, if possible.
You can do it, and it's far better than using LIKE '%word%' queries.
But it's not nearly as good as using proper fulltext indexing.
See my presentation Full Text Search Throwdown, where I compare the fulltext solutions for MySQL, including trigrams, which is approximately like the keyword solution you're considering.
The fastest solution -- by far -- was Sphinx Search.
I am wondering how MySQL finds the rows in a table when searching like so:
select * from table where field = 'text';
Does it use a particular search algorithm? Is it practically the fastest way to look up information in a table? Or would building a search macro using another algorithm (like Boyer-Moore) work faster?
If there is an index on field, then databases often use a b-tree for indexed searches. If there is no index, then the entire table is scanned. This describes some of the techniques used in MySql
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
Many hours of work has gone into optimizing MySql. Take advantage of that work already done, and resist trying to re-doing it
For that query it can do nothing other than searching every entry of that table and comparing its field column against that string.
Boyer-Moore isn't needed because it's exact equality that's requested and not asking whether the field contains that string.
If you are interested in how it found those records try executing using the EXPLAIN keyword:
EXPLAIN select * from table where field = 'text';
I would recommend looking at this article to get a better understanding what is happening in the background.
I would be very surprised if you would be able to write something on your own that is faster. You could look at creating indexes on the table in question to speed up selects.
Okay, mysql indexing. Is indexing nothing more than having a unique ID for each row that will be used in the WHERE clause?
When indexing a table does the process add any information to the table? For instance, another column or value somewhere.
Does indexing happen on the fly when retrieving values or are values placed into the table much like an insert or update function?
Any more information to clearly explain mysql indexing would be appreciated. And please dont just place a link to the mysql documentation, it is confusing and it is always better to get a personal response from a professional.
Lastly, why is indexing different from telling mysql to look for values between two values. For Example: WHERE create_time >= 'AweekAgo'
I'm asking because one of my tables is 220,000+ rows and it takes more than a minute to return values with a very simple mysql select statement and I'm hoping indexing will speed this up.
Thanks in advanced.
You were down voted because you didn't make effort to read or search for what you are asking for. A simple search in google could have shown you the benefits and drawbacks of Database Index. Here is a related question on StackOverflow. I am sure there are numerous questions like that.
To simplify the jargons, it would be easier to locate books in a library if you arrange the in shelves numbered according to their area of specialization. You can easily tell somebody to go to a specific location and pick the book - that is what index does
Another example: imagine an alphabetically ordered admission list. If your name start with Z, you will just skip A to Y and get to Z - faster? If otherwise, you will have to search and search and may not even find it if you didn't look carefully
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
You can create an index like this way :
CREATE INDEX index_name
ON table_name ( column1, column2,...);
You might be working on a more complex database, so it's good to remember a few simple rules.
Indexes slow down inserts and updates, so you want to use them carefully on columns that are FREQUENTLY updated.
Indexes speed up where clauses and order by.
For further detail, you can read :
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://www.tutorialspoint.com/mysql/mysql-indexes.htm
There are a lot of indexing, for example a hash, a trie, a spatial index. It depends on the value. Most likely it's a hash and a binary search tree. Nothing really fancy because most likely the fancy thing is expensive.
I have an authors table in my database that lists an author's whole name, e.g. "Charles Dickinson". I would like to sort of "decatenate" at the space, so that I can get 'Charles" and "Dickinson" separately. I know there is the explode function in PHP, but is there anything similar for a straight mysql query? Thanks.
No, don't do that. Seriously. That is a performance killer. If you ever find yourself having to process a sub-column (part of a column) in some way, your DB design is flawed. It may well work okay on a home address book application or any of myriad other small databases but it will not be scalable.
Store the components of the name in separate columns. It's almost invariably a lot faster to join columns together with a simple concatenation (when you need the full name) than it is to split them apart with a character search.
If, for some reason you cannot split the field, at least put in the extra columns and use an insert/update trigger to populate them. While not 3NF, this will guarantee that the data is still consistent and will massively speed up your queries. You could also ensure that the extra columns are lower-cased (and indexed if you're searching on them) at the same time so as to not have to fiddle around with case issues.
This is related: MySQL Split String