Acronyms are a pain in my database, especially when doing a search. I haven't decided if I should accept periods during search queries. These are the problems I face when searching:
'IRQ' will not find 'I.R.Q.'
'I.R.Q' will not find 'IRQ'
'IRQ.' or 'IR.Q' will not find 'IRQ' or 'I.R.Q.'
etc...
The same problem goes for ellipses (...) or three series of periods.
I just need to know what directions should I take with this issue:
Is it better to remove all periods when inserting the string to the database?
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
My responses for each question:
Is it better to remove all periods when inserting the string to the database?
Yes and no. You want the database to have the original text. If you want, create a separate field that is "cleaned up" to search against. Here, you can remove periods, make everything lowercase, etc.
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
/\.+/
That finds one or more periods in a given spot. But you'll want to integrate it with your search formula.
Note: regex on a database isn't known to have high performance. Be cautious with this.
Other note: you may want to use FullText search in MySQL. This also, isn't known to have high performance with data sets over 1000+ entries. If you have big data and need fulltext search, use Sphinx (available as a MySQL plug-in and RAM-based indexing system).
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
Yes, by having the 2 fields I described in the first bullet's answer.
You need to consider the sanctity of your input. If it is not yours to alter then don't alter it. Instead you should have a separate system to allow for text searching, and that can alter the text as it sees fit to be able to handle these types of issues.
Have a read up on Lucene, and specifically Lucene's standard analyzer, to see the types of changes that are commonly carried out to allow successful searching of complex text.
I think you can use the REGEXP function of MySQL to send an acronym :
SELECT col1, col2...coln FROM yourTable WHERE colWithAcronym REGEXP "#I\.?R\.?Q\.?#"
If you use PHP you can build your regexp by this simple loop :
$result = "#";
foreach($yourAcronym as $char){
$result.=$char."\\.?";
}
$result.="#";
The functionality you are searching for is a fulltext search. Mysql supports this for myisam-tables, but not for innodb. (http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html)
Alternatively you could go for an external framework that provides that funcionality. Lucene is a popular open-source one. (lucene.apache.org)
There would be 2 methods,
1. save data -removing symbols from text and match accordingly,
2. you can make a regex ,like this for eg.
select * from table where acronym regexp '^[A-Z]+[.]?[A-Z]+[.]?[A-Z]+[.]?$';
Please note, however, that this requires the acronym to be stored in uppercase. If you don't want the case to matter, just change [A-Z] to [A-Za-z].
Related
To start off with, I have looked into this issue and gone through quite a few suggestions here on SO, but many leave me in doubt whether they are good performance-wise.
So to my problem:
I have a table with usernames and want to provide users the possibility to search for others by their name. As these names are taken from Steam though, the names not containing some form of special character are in the minority.
The easiest solution would be to use LIKE name%, but with the table size constantly increasing, I don't see this as the best solution, even though it may be the only one.
I tried using a fulltext search, but the many special characters crushed that idea.
Any other solutions or am I stuck with LIKE?
Current table rows: 120k+
Well I don't believe that string-functions are faster, but contemporary I don't got any big database for testing performance. Let's give it a try:
WHERE substr(name, 1, CHAR_LENGTH('ste')) = 'ste'
I would like to suggest one solution which I applied before.
First of all, I clean all special characters from the string in name column.
Then I store cleaned string in another column (called cleaned_name) and index (fulltext search) this column instead of the original column.
Finally, I used the same function in step 1 to clean the queried name before executing a fulltext search on cleaned_name.
I hope that this solution is suitable for you.
I want to use a wysiwyg html editor (like this) and save it to my mysql database.
What is the best way to store the content (I guess that in a Text type field all html code)?
so then you can search content. (Like:blogs,taringa,stackoberflow....)
If you store html code in the database, how can you do the query so it only search text content and not html tags?
Note:I have a Laravel 4 project. (preference using Eloquent).
So now you're getting into search engine type of searching. You can go for DB simplicity or performance based searching. This answer will assume you have space to spare and you're not trying to condense as much space as possible.
DB Simplicity:
For this method you can really just throw the text (sanitized) into the DB and upon getting it back out you can print it with no sanitation {{{ $txt }}}. As for searching you just do a full text search on the entirety of the column for whatever you're searching for "%query%". You'll need to look into some raw querying as you can optimize it a bit.
Performance:
Upon entry you have two editions of the text which can help with printing and searching. Since you don't care about tags just have all of those stripped out (a rough regex would just delete anything in between angle brackets and the brackets themselves replace("<*>",""))
Might as well also remove punctuation as that can mess around with searching (your, you're). After you sanitize your text for search optimization you can then just have the printable column and do your searches on the search column. It'll still be slow as you're doing full-text searching but faster than also having to deal with tags.
Another strategy is to have yet another column as a unique word column, these usually can be called tags. So in this case you can also pull your data through another filter from the previous search-text and drop all the common words that don't have much meaning to them (the, or, by, it, is). You now probably have a list of semi related words to the article that possibly have duplicates, merge the duplicates with a count and order them from greatest to least in another column.
You now have multiple granularities of search depending on what you're goal is. You can also have this be enhanced with fuzzy searching, which does increase your search time and you'll probably have to create specific index tables to help decrease time spent searching.
How to optimize MySQL table with roughly 1 million of records for sub-string search (%xxx, xx%, %xxx%)? All records contain just one word (11 characters avg, 41 max).
I know that query LIKE %xxx is problem but I do not see any way how to avoid it.
So question is: Is there any way how to help MySQL to minimize effort for these queries? Or Is there any other way how to query such data different way to utilize some index?
Available technologies: MySQL, PHP, Javascript (MySQL and PHP are commercially used so not possible to reconfigure specific way).
background: It is "complete" list of unique words used during last 15 years in literature written in my native language. I want to give users chance to find all relevant words by entering just part of the word (any part).
You can't use a standard MySQL index for sub-string matching. It won't work for anything except prefix matches.
You could maybe generate a SOUNDEX() for the word, but that is probably not what you want.
You could generate all possible substrings for each row and store them in another table. That would be a lot of rows (maybe 50 million), especially if you include single characters as substrings (EDIT: see below)
After that, you could try looking for a free text match library that does fuzzy matching to plug in to your application. I don't know of anything in PHP. FREJ is something in Java.
Quick and dirty solution:
1M rows * 11 characters = 22MB of memory (i.e., nothing).
Load it into memory and scan it.
EDIT: as suggested, you could just store substrings and index to the end of the string and then use prefix matching to return a candidate set. This will require only n index entries per word, where n is the word length.
For really efficient use of storage, you need to look at advanced techniques using n-grams N-grams
I've looked for this question on stackoverflow, but didn't found a really good answer for it.
I have a MySQL database with a few tables with information about a specific product. When end users use the search function in my application, it should search for all the tables, in specific columns.
Because the joins and many where clauses where not performing really well, I created a stored procedure, which splits all the single words in these tables and columns up, and inserts them in the table. It's a combination of 'word' and 'productID'.
This table contains now over 3.3 million records.
At the moment, I can search pretty quick if I match on the whole word, or the beginning of the word (LIKE 'searchterm%'). This is obvious, because it uses an index right now.
However, my client want to search on partial words (LIKE '%searchterm%'). This isn't performing at all. Also FULLTEXT search isn't option, because it can only search for the beginning of a word, with a wildcard after it.
So what is the best practice for a search function like this?
While more work to set up, using a dedicated fulltext search package like Lucene or Solr may be what you are looking for.
MySQL is not well tailored for text search. Use other software to do that. For example use Sphinx to index data for text search. It will do a great job and is very simple to set up. If you user MySQL 5.1 you could use sphinx as an engine.
There are other servers for performing text search better than Spinx, but they are eather not free or require other software installed.
You can read more about: ElasticSearch, Sphinx, Lucene, Solr, Xapian. Which fits for which usage?
I have an authors table in my database that lists an author's whole name, e.g. "Charles Dickinson". I would like to sort of "decatenate" at the space, so that I can get 'Charles" and "Dickinson" separately. I know there is the explode function in PHP, but is there anything similar for a straight mysql query? Thanks.
No, don't do that. Seriously. That is a performance killer. If you ever find yourself having to process a sub-column (part of a column) in some way, your DB design is flawed. It may well work okay on a home address book application or any of myriad other small databases but it will not be scalable.
Store the components of the name in separate columns. It's almost invariably a lot faster to join columns together with a simple concatenation (when you need the full name) than it is to split them apart with a character search.
If, for some reason you cannot split the field, at least put in the extra columns and use an insert/update trigger to populate them. While not 3NF, this will guarantee that the data is still consistent and will massively speed up your queries. You could also ensure that the extra columns are lower-cased (and indexed if you're searching on them) at the same time so as to not have to fiddle around with case issues.
This is related: MySQL Split String