I have a DDL for a table
create table AIRPORT (
AIRP_CODE varchar(10)[cs] not null,
AIRP_NAME nvarchar(60)[cs] not null,
GEOR_ID_LOCATED integer not null,
PRCC_CONST integer,
AIRP_TIME_ZONE char(5),
AIRP_TRANSLATION mediumtext,
LCOUNT integer default 0
);
I am trying to figure out what does [cs] mean in it. I think its for collation but I am not sure how it works. Table DDL isn't written by me and I can't figure it out.
In that position would be CHARACTER SET and/or COLLATION.
An "airport code" would best be CHARACTER SET ascii. Depending on whether you want to allow case folding, you could use COLLATION ascii_bin (disallow folding) or COLLATION ascii_ci (allow folding).
For the airport name, it would probably be best to use UTF-8:
AIRP_NAME varchar(60) CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci not null
Note: NVARCHAR is a notation from non-MySQL vendors; for MySQL the sharset is important.
Perhaps you also want to specify a charset for AIRP_TRANSLATION? Again, utf8mb4 is probably appropriate.
(I have never seen "[cs]"; my advice is aimed at what should be specified in that context.)
That SQL code is invalid, period.
You must be missing information. Wherever you got the code from, there's possibly some documentation that explains what it is and how to use it. If it was handed to you by some other person, he didn't consider sharing the information with you.
Judging from column names and types, I suspect the code comes from the README file of an airport database available for download, perhaps in CSV format, and it's just a recommended table structure you are meant to take as starting point and adapt to your own system. My educated guess about [cs] is that it's an annotation to imply that those fields are case sensitive, meaning that you application should use e.g. MAD and not mad.
In any case, having no further context it's impossible to tell.
Related
I am working on 2 servers each having similar configurations, Including mysql variables specific to character set and collation and both are on running mysql server and client 5.6.x. By default all tables are in latin1 including tables with only integer columns, But when I run
ALTER TABLE `table_name` CONVERT TO CHARACTER SET `utf8` COLLATE `utf8_unicode_ci`
for all tables in each server only one of the servers is converting all tables to utf8.
What I already tried:
Converted the default database character (character_set_database) set to utf8 before running the above listed command
Solution already worked for me (but still unsure why it worked)
ALTER TABLE `table_name` CHARACTER SET = `utf8` COLLATE `utf8_unicode_ci`
Finally there are 2 questions:
CONVERT TO CHARACTER SET is working in one server and not in other
Solution already worked for me which is similar to CONVERT TO CHARACTER SET with only one difference I have come across is, it doesn't implicitly convert the all the columns to specified character set.
Can someone please help me understand what is happening?
Thank you in advance.
IIRC, that was a bug that eventually was fixed. See bugs.mysql.com . (The bug probably existed since version 4.1, when CHARACTER SETs were really added.)
I prefer to be explicit in two places, thereby avoiding the issue you raise:
When doing CREATE TABLE, I explicitly say what CHARACTER SET I need. This avoids depending on the default established when the database was created, perhaps years ago.
When adding a column (ALTER TABLE ADD COLUMN ...), I check (via SHOW CREATE TABLE) to see if the table already has the desired charset. Even so, I might explicitly state CHARACTER SET for the column. Again, I don't trust the history of the table.
Note: I am performing these queries from explicit SQL, not from some UI that might be "helping" me.
Follow on
#HBK found http://bugs.mysql.com/bug.php?id=73153 . From it, I suspect this is what 'should be' done by the user:
ALTER TABLE ...
CONVERT TO ...
DEFAULT CHARACTER SET ...; -- Do this also
I have a table encoded in latin1 and collate latin1_bin.
In my table there is a column comments of type 'TEXT', as you know this column inherits table's encoding and collation, but from now on I should change it to be utf8 and utf8_general_ci because I'm starting to store special characters in comments.
Would it cause any downside effect if I'd use a command like the following?
alter table notebooks modify comments text CHARACTER SET utf8 COLLATE utf8_general_ci;
Thank you for your answer.
Danger I think that that ALTER will destroy existing text.
Also, ... Your 'name' looks Chinese, so I would guess that you want to store Chinese characters? In that case, you should use utf8mb4, not just utf8. This is because some of the Chinese characters take 4 bytes (and are not in the Unicode BMP).
I believe you need 2 steps:
ALTER TABLE notebooks MODIFY comments BLOB;
ALTER TABLE notebooks MODIFY comments TEXT
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_520_ci;
Otherwise the latin1 characters will be "converted" to ut8. But if you really have Chinese in the column, you do not have latin1. The 2-step alter, above, does (1) turn off any knowledge of character set, and (2) establish that the bytes are really utf8mb4-encoded.
To be safer, first do
RENAME TABLE notebooks TO old;
CREATE TABLE notebooks LIKE old;
INSERT INTO notebooks SELECT * FROM old;
Then do the two ALTERs and test the result. If there is trouble, you can RENAME to get back the old copy.
Specifying any collating sequence that does not involve direct integral comparison of a NATIVE character set will slow down your query. Whether it will slow it down noticeably is another issue. Looking up the ranking of this, and the ranking of that, in a table and comparing the two results is much, MUCH faster than retrieving on-disk information from a database, wouldn't you imagine?
I'm writing a set of SQL statements in MySQL to create and modify a few tables. I need to get my output to match a document of sample output exactly (this is for school).
When I show my create table statements, all varchar columns need to look like this:
`name` varchar(10) COLLATE utf8_unicode_ci DEFAULT NULL,
but they weren't showing the collation. I tried changing the declaration to
name varchar COLLATE utf8_unicode_ci DEFAULT NULL,
but this caused the output to show both the charset and collation, and I need to be showing just the collation. The sample output document was created on Unix, while I am on Windows, so this could be the source of the difference, but I need to know for sure.
Is there a way I can alter my queries to show collation or is this just a Unix Windows inconsistency?
To be honest, I doubt very much that anyone intends for you to obtain output that is identical verbatem—it's more likely that they require it to be identical semantically. However, you might play around with the table's default charset/collation to see whether that makes a difference to the output obtained from SHOW CREATE TABLE:
ALTER TABLE foo CHARACTER SET utf8 COLLATE ut8_bin;
Failing that, it could be a difference between MySQL versions.
Does anyone know if a utf8_unicode_cs collation for MySQL exists? So far, my searches have come up dry. If it simply doesn't exist yet, is it fairly straight-forward to create one? Or somehow use utf8_unicode_ci or utf8_bin but "simulate" what one would expect from a utf8_unicode_cs collation?
I came across the same issue and after some Googling, it seems that MySQL doesn't include it. To "simulate it", as you put it,
1) To ensure case-sensitivity in the DB: set the table column to utf8_bin collation
This allows:
strict SELECTs: SELECT "Joe" will NOT return rows with "joe" / "joE" / "jOe" / etc
strict UNIQUE index: a column with a UNIQUE index will treat case differences as different values. For example, if a utf8_unicode_ci collation is used, inserting "Joe" on a table that already has "joe" will trigger a "Duplicate key" error. If ut8_bin is used, inserting "Joe" will work fine.
2) To get the proper ordering in results: add the collation to the SQL query:
SELECT ... ORDER BY column COLLATE utf8_unicode_ci
This is an old question but does not seem to be superseded by any other, so I thought it worth posting that things have changed.
MySQL version 8 now has the following collations for utf8mb4:
utf8mb4_0900_ai_ci
utf8mb4_0900_as_ci
utf8mb4_0900_as_cs
... and many language-specific variants of same.
(no _ai_cs as far as I know, but that would in any case be less useful: few reasons to group [a] and [a-acute] and then separately group [A] and [A-acute]).
The purpose of the original question's hypothetical "utf8_unicode_cs" is fulfilled by utf8mb4_0900_as_cs. (The 0900 means it uses Unicode v 9.0.0 as opposed to 4.0.0 used by utf8_unicode_ci.)
To use these you'd need to change the field from utf8 to utf8mb4 character set - but that's a generally good idea anyway because the old 3-byte-max encoding can't handle e.g. emoji and other non-BMP characters.
Source: https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html
I have ISO 639-2 language codes (eng, fre, hin, etc.) in english as primary key in my master table. This column is foreign key in may other tables. Now my issue is even though my master have only lower case values, due to human error some values were added in other tables with language id in mixed cases. Even though there was foreign key, it didn't prevent it from happening.
Now this is first time I am working in MySQL, previously I worked on Oracle and it applies case sensitivity to keys. What should be done to have same functionality in MySQL?
Also what should be the column type?
Right now it is varchar(3). Should I convert it to something else? I am not going to use any like condition in any of the query, only = and in.
It happens because collation of collumn is case insensitive - something like latin1_swedish_ci
change it to case sensitive collation - latin1_swedish_cs
ALTER TABLE t1 MODIFY
col1 VARCHAR(3)
CHARACTER SET latin1
COLLATE latin1_swedish_cs;
link text