Is it possible to make some characters stored in mysql invisible for search queries?
Of course, I can do this in application, but is there maybe some setting option in mysql for this?
I am still not sure I am following what you want. It sounds like a query like
SELECT * FROM `table` WHERE REPLACE(string_field, "#", "") = "user query"
might be what you are looking for.
See REPLACE. For more complicated matching, there's also regular expressions, although that would probably be rather messy for what you are describing.
EDIT: Just saw your comment. It sounds like you want to blacklist certain characters from the user's query as they are special to your system. No, there's no way to do that. Somewhere you are going to want a string replace operation to remove those characters; either in your application or in a stored procedure/function if you want to put it in the database.
Related
I have a table colum with general text values ex:
"This is Gerald's Sample Text: With some special chars"
I need to convert this text to:
"this-is-geralds-sample-text-with-some-special-chars"
with MySQL InnoDB and save the value in a separate unique column in the same table. Is there a simpler way of achieving this with a query without using procedures?
The short answer is "No". You're looking for something that behaves exactly like a regular expression, and MySQL does not support regex replace natively.
The longer answer is "No, but there are workarounds." You have a couple of options, and I don't terribly like either. The first is to create a function like in this question. The second is to come up with a list of bad characters and then use a set of REPLACE calls. It's ugly, but it will work.
On a side note: you might consider creating this value with your application and then just store along with the original. That would be cleaner in some ways than using a custom MySQL function.
Acronyms are a pain in my database, especially when doing a search. I haven't decided if I should accept periods during search queries. These are the problems I face when searching:
'IRQ' will not find 'I.R.Q.'
'I.R.Q' will not find 'IRQ'
'IRQ.' or 'IR.Q' will not find 'IRQ' or 'I.R.Q.'
etc...
The same problem goes for ellipses (...) or three series of periods.
I just need to know what directions should I take with this issue:
Is it better to remove all periods when inserting the string to the database?
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
My responses for each question:
Is it better to remove all periods when inserting the string to the database?
Yes and no. You want the database to have the original text. If you want, create a separate field that is "cleaned up" to search against. Here, you can remove periods, make everything lowercase, etc.
If so what regex can I use to identify periods (instead of ellipses or three series of periods) to identify what needs to be removed?
/\.+/
That finds one or more periods in a given spot. But you'll want to integrate it with your search formula.
Note: regex on a database isn't known to have high performance. Be cautious with this.
Other note: you may want to use FullText search in MySQL. This also, isn't known to have high performance with data sets over 1000+ entries. If you have big data and need fulltext search, use Sphinx (available as a MySQL plug-in and RAM-based indexing system).
If it is possible to keep the periods in acronyms, how can it be scripted in a query to find 'I.R.Q' if I input 'IRQ' in the search field, through MySQL using regex or maybe a MySQL function I don't know about?
Yes, by having the 2 fields I described in the first bullet's answer.
You need to consider the sanctity of your input. If it is not yours to alter then don't alter it. Instead you should have a separate system to allow for text searching, and that can alter the text as it sees fit to be able to handle these types of issues.
Have a read up on Lucene, and specifically Lucene's standard analyzer, to see the types of changes that are commonly carried out to allow successful searching of complex text.
I think you can use the REGEXP function of MySQL to send an acronym :
SELECT col1, col2...coln FROM yourTable WHERE colWithAcronym REGEXP "#I\.?R\.?Q\.?#"
If you use PHP you can build your regexp by this simple loop :
$result = "#";
foreach($yourAcronym as $char){
$result.=$char."\\.?";
}
$result.="#";
The functionality you are searching for is a fulltext search. Mysql supports this for myisam-tables, but not for innodb. (http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html)
Alternatively you could go for an external framework that provides that funcionality. Lucene is a popular open-source one. (lucene.apache.org)
There would be 2 methods,
1. save data -removing symbols from text and match accordingly,
2. you can make a regex ,like this for eg.
select * from table where acronym regexp '^[A-Z]+[.]?[A-Z]+[.]?[A-Z]+[.]?$';
Please note, however, that this requires the acronym to be stored in uppercase. If you don't want the case to matter, just change [A-Z] to [A-Za-z].
I was recently told that it is not recomended to use the "LIKE" keyword in SQL. is this true? if so why? if it is true are there any alternatives to it?
The reason is primarily performance. However, on the other side of the argument, LIKE is standard SQL and should work in all databases. Because LIKE has to parse the pattern string, it is a bit less efficient than looking for a substring in a longer string (using charindex or instr or your database's favorite function). However, processors are so fast that this rarely makes a difference now, except perhaps for the largest queries.
The one caution with LIKE is in a join statement (and this is true of the alternatives as well). In general, database engines will not use an index for a LIKE in a join. So, if you can express the join clause in a more index-friendly way, then you might see a substantial increase in performance.
By the way, I'm something of an old-timer with the SQL language, and tend to be a person who avoids using it personally. However, this is not a habit that should be passed on, because there is little basis anymore for avoiding it.
Specifically in MySQL (and since this has a MySQL tag I guess that's what you are using), when using LIKE on a column which has an Index you should be carefull of not putting a % in front of the string you are matching if you don't have to, because it will kill the possibility of using the Index for looking efficiently, otherwise there is no problem in using LIKE. e.g.
BAD:
col_with_index LIKE '%someText'
GOOD:
col_with_index LIKE 'someText%'
There are no valid reasons to not use like!!!
The only exception comes when you can use the EQUAL(=) operator to achieve the same results (my_column LIKE 'XYZ').
If you need to use LIKE any other alternative to achieve the same result should cause the same (or even more) performance problems!
So, in those cases, just think if the use of like is necessary and then use it with no hesitations.
Because of the way our databases are collated they ignore case for usernames and passwords, which we're currently unable to fix at the database level. It appears from this thread that WHERE BINARY 'something' = 'Something' should fix the problem, but I haven't been able to figure out to get Django to insert BINARY. Any tips?
I don't think there's an easy way to force Django to add something into a query content.
You may want to simply write a raw SQL query within the Django to get objects filtered with case-sensitive comparison and then use them in normal queries.
The other approach is to use Django case-sensitive filters in order to achieve the same result. E.g. contains/startswith both use BINARY LIKE and may be used while comparing two strings with the same length (like a password hash)). Finally, you can use regexp to do the case-sensitive comparison. But these are rather ugly methods in that situation. They have an uncessary overhead and you should avoid them as long as it's possible.
Under what conditions do I need to single quote a variable in a Mysql statement in PHP?
If you put values directly in the query, as in SELECT * FROM users WHERE age > 25, then the single quotes are used only with strings. If you write SELECT * FROM users WHERE age > '25', the query works the same, but you are forcing MySQL to convert the string to an integer (if that field is an integer), which is a not necessary operation.
In theory only varchars, texts, and BLOBs I think, but I say quote `em all. That has nothing to do with PHP by the way but only with the way you build your mySQL query, unless you mean something completely different.
Not a direct answer, but I suggest a database class like Zend DB to interact with your database. I have found this to be a great way to abstract away some of the grunt work like figuring out what to do with variables.
For example:
$db->select()->from('users', array('uid'))->where('email = ?', $indata['email'])->where('actkey = 0')
Makes a cleaner query than building the same by hand, and also takes care of making those variables safe a lot better than I would.
Hope that's helpful info.