Does Active Record provide a way to generate SQL that forces a text search to be case-sensitive?
Ruby-on-Rails generators instructed to create a string-type column produce a simple VARCHAR(255) field, in a mysql database. It turns out that queries on such columns are case insensitive by default.
Thus, an Active Record search such as:
Secret.where(token: 'abcde')
will match records with tokens abcde, ABcdE, etc.
Without changing the underlying database column (e.g. specifying a utf8_bin collation) searches can be made case sensitive by explicitly tweaking the where clause:
Secret.where('binary token = ?', 'abcde')
However, this is database-specific, and I am wondering if Active Record has an idiom to accomplish the same for any database. Just as an example, something resembling the where.not construct:
Secret.where.binary(token: 'abcde')
Wouldn't this be a common enough need?
In short: there is NO ActiveRecord idiom for case-sensitive
search.
For case-insensitive search you can try to use
this.
It still works, but source code was changed a bit. So, use it on your
own risk.
In general, case sensitivity is subset of the Collation idiom.
And different DBMS use very different default collations for string(text) data types, including default case sensitivity.
Detailed description for MySQL.
There is a sql operator COLLATE which is very common across DBMS(but seems still is not in SQL Standard).
But ActiveRecord sources show it only in schema creation code.
Neither ActiveRecord, nor Arel gems do not use COLLATE in where search(sad).
Note: Please, don't miss the database tag(mysql etc) in a Question.
There are many Questions and Answers on SO without the tags(or with sql one), which are completely irrelevant for the most of DBMSs except author's one.
Related
Intro
I have been using new databases such as mysql and mariaDB for the last few years.
For a project I am now using oracle after many years and I was surprised to see that Oracle order by is case sensitive... UpperCase characters go above LowerCase characters for example in ASC.
Is this behavior following the SQL specifications?
Do the general SQL language specs outline what the behavior should be or is it up to each vender?
I'm asking because many years of using mysql/mariaDB made me consider order by to be case insensitive.
Mysql
I see for example that in the documentation for mysql:
On character type columns, sorting—like all other comparison
operations—is normally performed in a case-insensitive fashion. This
means that the order is undefined for columns that are identical
except for their case. You can force a case-sensitive sort for a
column by using BINARY like so: ORDER BY BINARY col_name.
From: http://dev.mysql.com/doc/refman/5.7/en/sorting-rows.html
From Oracle Database SQL Language Reference, 12c Release 1 (12.1):
When character values are compared linguistically for the ORDER BY
clause, they are first transformed to collation keys and then
compared like RAW values. The collation keys are generated either
explicitly as specified in NLSSORT or implicitly using the same
method that NLSSORT uses. Both explicitly and implicitly generated
collation keys are subject to the same restrictions that are
described in "NLSSORT" on page 7-207. As a result of these
restrictions, two values may compare as linguistically equal if they
do not differ in the prefix that was used to produce the collation
key, even if they differ in the rest of the value.
Hope, this will help you!
Have a nice day!
I found the Scala slick package's "sortBy" method is not case sensitive. Ex:
after implementing the following command: q.sortBy(columnMap("name").desc), I got:
TestingIsFun,
testing foo1,
Testing foo,
Is this expected behavior? How can I make it case sensitive? Thx.
I think as it currently stands, slick just depends on the RDBMS default handling of case in sorting. You did not mention the RDBMS type, but e.g. in mysql, case-insensitive is the default in sorting. However, you can define a column to-be-sorted in a way overiding that, in mysql, as per Altering Mysql Table column to be case sensitive. This will work without having to touch the query or slick parameters, as the solution is at the schema definition level. It should be possible to define the column as a binary string in the first place, with slick if needed:
O.DBType("binary") in the slick column definition should work for that.
When it comes to the database, the sorting of particular column will be done according to the collation for that column. By default, MySQL uses case-insensitive collation (unless you specify binary charset). You can override the default collation on any of the 4 levels (server, database, table or column) or even only in specific ORDER BY clause. Which way is the most efficient, depends on your particular use case. Using case-sensitive collation obviously affects performance, so most of the time it makes sense doing it either on table or on column level.
I have a database of occupation titles I'm trying to run some queries on. I'm using Match() to try and find the best match occupational title for a user-entered string using this SQL:
SELECT *, MATCH (occupation_title) AGAINST ('EGG PROCESSOR')
AS score FROM occupational_titles WHERE MATCH (occupation_title)
AGAINST ('EGG PROCESSOR') ORDER BY score DESC;
When I run this query against my database, the first three results are "Processor", "Egg Processor", and "COPRA Processor". The first two have the exact same match score of 6.04861688613892. Why on earth would MySQL not rank an exact match hit as the number one result? Is there anything I can do to refine the search algorithm?
You probably want to use one of the modifier modes in your searches. Check the fulltext documentation.
In particular, by default it uses "natural language" searching, while you probably want to consider "boolean mode" and prefixing each keyword with a plus sign to make it mandatory in results, or using double quotes to search for the exact phrase. Check the boolean mode documentation for more information on the syntax.
You can also consider performing multiple searches using a variety of modes and doing your own weighting.
I guess you should change the collation of your Column to case insensitive ones.
eg. latin1 to latin1_bin
Case sensitive Match is being done in your case.
Have a look here:
http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
I would like to know if the Sphinx engine works with any delimiters (like commas and periods in normal MySQL). My question comes from the urge, not to use them at all, but to escape them or at least thay they don't enter in conflict when performing MATCH operations with FULLTEXT searches, since I have problems dealing with them in MySQL by default and I would prefer not to be forced to replace those delimiters by any other characters to provide a good set of results.
Sorry if I'm saying something stupid, but I don't have experience with Sphinx or other complementary (?) search engines.
To give you an example, if I perform a search with
"Passat 2.0 TDI"
MySQL by default would identify the period in this case as a delimiter and since the "2" and "0" are too short to be considered words by default, the results would be a bit messed up.
Is it easy to handle with Sphinx (or other search engine)? I'm open to suggestions.
This is for a large project, with probably more than 500.000 possible records (not trivial at all).
Cheers!
You can effectively control which characters are delimiters by specifying the charset table of a specific sphinx index.
If you exclude a character from your charset table, it effectively acts as a delimiter. If you specify it in your charset table (even spaces as U+0020), it will no longer acts as a delimiter and will be part of your token strings.
Each index (which uses one or more sphinx data sources) can have a different charset table for flexibility.
NB: If you want single character words, you can specify the min_word_len of each the sphinx index.
This is probably the best section of the documentation to read. As sphinx is a fulltext engine primarily it's highly tunable as to how it handles phrases and also how you pass them in.
I'm trying to find a collation in MySQL (my version is 5.0) where strings that differ in case are considered the same but there're no other rules like:
á = a
and so on.
I tried to find the proper collation here: http://www.collation-charts.org/mysql60/by-charset.html but it seems that the collation I'm looking for doesn't exist.
I can't use in SQL query: SELECT ... WHERE lower(column1) = lower(column2) because indices on columns column1 and column2 are not used then and my query is terrible slow.
Thanks for any suggestion!
I was given an advice: simply have table like this: id, word, word_in_lowercase.. it's true that data are redundant but otherwise it fulfils all my needs.
Automatic update of word_in_lowercase may be done via trigger or some additional programming.
Which type of collation set in the tables that in question? I'm currently using a lot of tables with utf8_hungarian_ci because of this one is case-insensitive.
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html indicates that nonbinary strings are case insensitive by default. Have you tested to see that it is not working properly without using lower()?
Why don't you use the full text search functions of MySQL for your search query?
For tasks like yours I am using the MATCH AGAINST function.
Read the Specifications at mysql.com to make it clear - Link
One example:
SELECT * FROM customer WHERE status = 1 AND MATCH (person, city, company, zipcode, tags) AGAINST ('".$searchstring."' IN BOOLEAN MODE)
And this will be executed case insensitive.