MySQL case sensitive foreign key for textual data - mysql

I have ISO 639-2 language codes (eng, fre, hin, etc.) in english as primary key in my master table. This column is foreign key in may other tables. Now my issue is even though my master have only lower case values, due to human error some values were added in other tables with language id in mixed cases. Even though there was foreign key, it didn't prevent it from happening.
Now this is first time I am working in MySQL, previously I worked on Oracle and it applies case sensitivity to keys. What should be done to have same functionality in MySQL?
Also what should be the column type?
Right now it is varchar(3). Should I convert it to something else? I am not going to use any like condition in any of the query, only = and in.

It happens because collation of collumn is case insensitive - something like latin1_swedish_ci
change it to case sensitive collation - latin1_swedish_cs
ALTER TABLE t1 MODIFY
col1 VARCHAR(3)
CHARACTER SET latin1
COLLATE latin1_swedish_cs;
link text

Related

What does [cs] means in DDL for MariaDb

I have a DDL for a table
create table AIRPORT (
AIRP_CODE varchar(10)[cs] not null,
AIRP_NAME nvarchar(60)[cs] not null,
GEOR_ID_LOCATED integer not null,
PRCC_CONST integer,
AIRP_TIME_ZONE char(5),
AIRP_TRANSLATION mediumtext,
LCOUNT integer default 0
);
I am trying to figure out what does [cs] mean in it. I think its for collation but I am not sure how it works. Table DDL isn't written by me and I can't figure it out.
In that position would be CHARACTER SET and/or COLLATION.
An "airport code" would best be CHARACTER SET ascii. Depending on whether you want to allow case folding, you could use COLLATION ascii_bin (disallow folding) or COLLATION ascii_ci (allow folding).
For the airport name, it would probably be best to use UTF-8:
AIRP_NAME varchar(60) CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci not null
Note: NVARCHAR is a notation from non-MySQL vendors; for MySQL the sharset is important.
Perhaps you also want to specify a charset for AIRP_TRANSLATION? Again, utf8mb4 is probably appropriate.
(I have never seen "[cs]"; my advice is aimed at what should be specified in that context.)
That SQL code is invalid, period.
You must be missing information. Wherever you got the code from, there's possibly some documentation that explains what it is and how to use it. If it was handed to you by some other person, he didn't consider sharing the information with you.
Judging from column names and types, I suspect the code comes from the README file of an airport database available for download, perhaps in CSV format, and it's just a recommended table structure you are meant to take as starting point and adapt to your own system. My educated guess about [cs] is that it's an annotation to imply that those fields are case sensitive, meaning that you application should use e.g. MAD and not mad.
In any case, having no further context it's impossible to tell.

How to make a column in SQL case insensitive?

The problem is the page linking process in wikimedia whereby I create a link [[like this]] or [[Like This]] creating two different links. A third and separate link would be [[LIKE this]] ... I was hoping to make the database case insensitive so they would all link to the same page. Here are some proposed solutions: I was attempting solution #6. https://meta.wikimedia.org/wiki/Case_sensitivity_of_page_names
http://archive.is/Dm5YI#selection-376.1-393.32
Case insensitive means:
https://iglooo.000webhostapp.com/index.php?title=Computer_science
would return the same results as:
https://iglooo.000webhostapp.com/index.php?title=Computer_Science
Instead of a separate page like it currently does.
‘collate latin1_bin’ actually forces the mySQL column to be case sensitive.
Here is what I did, the key drop is required for the alter table:
alter table page drop key name_title;
alter table page modify column page_title varchar(255) NOT NULL default '';
There is table named 'page' that needs to become case insensitive. The above attempt at this has failed. Please help.
I am using the latest mediawiki distrobution.
If you don't clean the data going into the system, there is no way to store it in a case insensitive way. That being said, you can query on the data using a function.
select
foo.bar
from
foo
where
upper(foo.bar) = upper(MY_PARAMETER)
Note, the right hand upper() cluase would be better handled in your application logic, but it's perfectly possible to do it here.
Avoiding the alter table and the change of format for you column you can simply use a collate
in you query
col_name = 'Computer_science' COLLATE latin1_general_ci
in this way the select column is used as case insesitive (by passing) you bin collation in table definition

Changing collation on indexed columns without changing data

I am trying to change the collation on a bunch of columns. I don't want to mess up my current data, so I've been looking at doing something like in this answer:
ALTER TABLE something MODIFY name BLOB;
The next step is to convert the column to a nonbinary data type with the proper character set:
ALTER TABLE something MODIFY name VARCHAR(12) CHARACTER SET hebrew COLLATE hebrew_bin;
Or Try with this:
ALTER TABLE something MODIFY name VARCHAR(12) CHARACTER SET utf8 COLLATE utf8_unicode_ci
Unfortunately, mysql won't let me convert an indexed column to a blob.
SQL error: [1170] - BLOB/TEXT column 'baz' used in key specification without a key length
Query: ALTER TABLE foo.bar MODIFY baz blob;
Is there a way around this? Or do I need to somehow remove my indexes and rebuild them after the conversion?
Don't do the "2-step ALTER" unless the CHARACTER SET was wrong but the data was 'right'. If you want to convert both the CHARACTER SET and the data, use
ALTER TABLE something CONVERT TO CHARACTER SET ...;
See Case 5 in my blog, which, as you might guess, has other cases.
As for the error 1170 -- that had to do with some INDEX. Before I help you with that, decide whether it is relevant.

MySQL considers 'е' and 'ё' equal, how do I set it to consider them different?

I have a table with a unique constraint on a varchar field. When I try to insert 'e' and 'ё' in two different rows I'm prompted with a unique constraint violation. Executing the following select shows that MySQL considers the letters equivalent in spite of their HEX values being D0B5 and D191 respectively.
select 'е' = 'ё',
hex('е'),
hex('ё');
Following a fair amount of Googling I came across this MySQL bug report which seems to deal with this issue. The very last response by Sveta Smirnova states that this behavior is by design and refers to the Collation chart for utf8_unicode_ci, European alphabets (MySQL 6.0.4).
How do I tell MySQL that 'е' is not equal to 'ё' for query purposes and how do I change the unique constraint to take note of this fact?
You may wish to check this answer: Is it possible to remove mysql table collation?
The behavior you're seeing is standard. In most cases it produces the best results. From a point of interest do you have an example of how this is causing a problem for you. Have you found two words which match except for the diacritic?
Either way the only thing you can do about it is to change the collation. This can be done at the server, database, table or even field level.
Rather than my duplicating the manual on how to do this; please follow this link: http://dev.mysql.com/doc/refman/5.7/en/charset-syntax.html
There's a listing here of the different collations supported: http://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html
If you need that for an especific field you could add a duplicate of the column with a different collation for prevent this issue.
ALTER TABLE yourTable ADD COLUMN `copiedColumn` VARCHAR(100) CHARACTER SET 'binary' COLLATE 'binary';
Also, you can change the collation of your column if you don't need the your current collation in this field
ALTER TABLE yourTable CHANGE COLUMN yourColumn yourColumn VARCHAR(100) CHARACTER SET 'binary' COLLATE 'binary';

Does a utf8_unicode_cs collation exist?

Does anyone know if a utf8_unicode_cs collation for MySQL exists? So far, my searches have come up dry. If it simply doesn't exist yet, is it fairly straight-forward to create one? Or somehow use utf8_unicode_ci or utf8_bin but "simulate" what one would expect from a utf8_unicode_cs collation?
I came across the same issue and after some Googling, it seems that MySQL doesn't include it. To "simulate it", as you put it,
1) To ensure case-sensitivity in the DB: set the table column to utf8_bin collation
This allows:
strict SELECTs: SELECT "Joe" will NOT return rows with "joe" / "joE" / "jOe" / etc
strict UNIQUE index: a column with a UNIQUE index will treat case differences as different values. For example, if a utf8_unicode_ci collation is used, inserting "Joe" on a table that already has "joe" will trigger a "Duplicate key" error. If ut8_bin is used, inserting "Joe" will work fine.
2) To get the proper ordering in results: add the collation to the SQL query:
SELECT ... ORDER BY column COLLATE utf8_unicode_ci
This is an old question but does not seem to be superseded by any other, so I thought it worth posting that things have changed.
MySQL version 8 now has the following collations for utf8mb4:
utf8mb4_0900_ai_ci
utf8mb4_0900_as_ci
utf8mb4_0900_as_cs
... and many language-specific variants of same.
(no _ai_cs as far as I know, but that would in any case be less useful: few reasons to group [a] and [a-acute] and then separately group [A] and [A-acute]).
The purpose of the original question's hypothetical "utf8_unicode_cs" is fulfilled by utf8mb4_0900_as_cs. (The 0900 means it uses Unicode v 9.0.0 as opposed to 4.0.0 used by utf8_unicode_ci.)
To use these you'd need to change the field from utf8 to utf8mb4 character set - but that's a generally good idea anyway because the old 3-byte-max encoding can't handle e.g. emoji and other non-BMP characters.
Source: https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html