When attempting to insert 💩 (for example, which is a 4-byte unicode char), both MySQL (5.7) and MariaDB (10.2/10.3/10.4) give the same error:
Incorrect string value: '\xF0\x9F\x92\xA9'
The statement:
mysql> insert into bob (test) values ('💩');
Here's my database's charset/collation:
mysql> select ##collation_database; +----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database; +--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
1 row in set (0.00 sec)
The server's character set:
mysql> show global variables like '%character_set_server%'\G; *************************** 1. row ***************************
Variable_name: character_set_server
Value: utf8mb4
The table:
create table bob ( `test` TEXT NOT NULL );
mysql> SHOW FULL COLUMNS FROM bob;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
Can anyone point me in the right direction?
Yes, as you commented, you need to use SET NAMES utf8mb4.
Your 4-byte character must pass from your client through the database connection and into a table. All of those must support utf8mb4. If any one of them does not support utf8mb4, then 4-byte characters will not be able to get through.
SET NAMES utf8mb4 makes the database session expect clients to send string using that encoding. The default for character_set_client on MySQL 5.7 is utf8, so you need to set it to utf8mb4.
In MySQL 8.0.1 and later, the default character_set_client is utf8mb4 already, so you won't need to change it.
Related
In my.ini I've changed properties from latin1 to cp1251 (then restarted the server)
[mysql]
default-character-set=cp1251
............................
[mysqld]
default-character-set=cp1251
I create database
CREATE DATABASE library DEFAULT CHARSET=cp1251;
Make request to check out the encoding:
SELECT ##character_set_database, ##collation_database;
+--------------------------+----------------------+
| ##character_set_database | ##collation_database |
+--------------------------+----------------------+
| cp1251 | cp1251_general_ci |
+--------------------------+----------------------+
show variables like "char%";
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp1251 |
| character_set_connection | cp1251 |
| character_set_database | cp1251 |
| character_set_filesystem | binary |
| character_set_results | cp1251 |
| character_set_server | cp1251 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+---------------------------------------------------------+
Create a table
CREATE TABLE genres (g_id INT, g_name VARCHAR(150)) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
As I try to insert cyrillic data, the Command Line window gets stuck:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Поэзия');
'>
'>
'>
'>
Latin strings get inserted ok:
mysql> INSERT INTO genres (g_id, g_name) VALUES (1, 'Poetry');
Query OK, 1 row affected (0.06 sec)
Yesterday, after the whole day of trying and testing, I got it working well. Created some more tables and inserted some Cyrillic strings. But next morning and the whole day long I can't get it working again. The previously inserted data wouldn't display. After firing
set names utf8
the Cyrillic words appeared, numeric columns didn't show right. What have I missed?
It's not just one change.
character_set_client/connection/results, but not the other two that you changed, specify the encoding of the client.
The column definitions in the database tables need to have a character set that can handle Cyrillic. One way is to do this to each table:
ALTER TABLE t CONVERT TO cp1251;
Have you have already stored Cyrillic in latin1 columns?
Check by doing SELECT HEX(col) .... You may need the 2-step Alter as discussed in http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
It would be best to switch to utf8mb4; that way you could handle all character sets throughout the world.
See also Trouble with UTF-8 characters; what I see is not what I stored
I have found a workaround. After starting cmd
C:\Users\nikol>chcp 866
Active code page: 866
Then after starting mysql
mysql> set names cp866;
Query OK, 0 rows affected (0.00 sec)
But when I select the data, there are multiple trailing spaces
mysql> SELECT * FROM genres;
+------+------------------+
| g_id | g_name |
+------+------------------+
| 1 | Поэзия |
| 2 | Программирование |
| 3 | Психология |
| 4 | Наука |
| 5 | Классика |
| 6 | Фантастика |
+------+------------------+
6 rows in set (0.00 sec)
I guess I'll have to TRIM
I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.
Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?
I.e. I need something like this
SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable
Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.
mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
mysql> insert into weirdTable values ('ЂЃЉЌ');
mysql> select * from weirdTable;
+-------------+
| weirdColumn |
+-------------+
| ЂЃЉЌ |
+-------------+
Use MySQL's CONVERT() function to force the characters to a different encoding:
mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:
mysql> create table w2
as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
Query OK, 1 row affected (0.07 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> show create table w2\G
*************************** 1. row ***************************
Table: w2
Create Table: CREATE TABLE `w2` (
`weirdColumnUtf8` longtext CHARACTER SET utf8
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> select * from w2;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.
If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:
mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------------+---------+-----+---------+----------+---------+---------------+
...
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | PAD SPACE |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | PAD SPACE |
...
In a Django application with MySQL DB back-end users try to insert notes which contain some smileys and hearts and stuff which are Unicode characters. MySQL refuses the operations with an error:
(1366, "Incorrect string value: '\\xE2\\x9D\\xA4\\xEF\\xB8\\x8F' for column 'note' at row 1")
(The column in question has longtext type. The Unicode characters in this case valid, it's a heart and a modifier https://codepoints.net/U+2764 https://codepoints.net/U+FE0F, so it's not that they would be 4 byte long UTF-8 characters. I made sure that MySQL's default character set is utf-8.)
What is interesting is that I cannot fully reproduce this error on my local developer environment. One particular difference is that it only emits a warning for that anomaly.
Update1:
This is still bothering to me:
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name="sblive";
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1 |
+----------------------------+
1 row in set (0.00 sec)
I converted the specific table's charset to utf-8:
mysql> alter table uploads_uploads convert to character set utf8 COLLATE utf8_general_ci;
Query OK, 1209036 rows affected (1 min 10.31 sec)
Records: 1209036 Duplicates: 0 Warnings: 0
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "sblive" AND table_name = "uploads_uploads" AND column_name = "note";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE '%colla%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
You are asking for ❤️ followed by a "non-spacing" "VARIATION SELECTOR-16".
Your bytes are utf8 -- good
Your connection needs to specify utf8 -- does it?
Your TEXT column need to be declared CHARACTER SET utf8 -- is it? Use SHOW CREATE TABLE to verify.
If you are using HTML, it needs to say charset=UTF-8 -- does it?
Suggest you switch to utf8mb4 if the 'back-end users' are likely to enter more emoticons -- the 'Emoji' will need it.
Addenda
Let's check the data... Please run this
SELECT col, HEX(col) FROM ...
Those two character should deliver hex E29DA4 and EFB88F. If you see C3A2C29DC2A4C3AFC2B8C28F, you have "double encoding", which is a messier problem. 2764FE0F would indicate utf16, I think.
I am using mysql and in the table 'items' updates on the variable image_url 'succeed' with no warnings. But, in reality, the update is failing: it prepends the value with a space and deletes the last character of the value I give it.
Here is the update:
UPDATE items
SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg'
WHERE id=38;
Here is the select:
select * from items\G;
Here is one line of the output:
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
...
notice the missing 'g' at the end and the extra space at the beginning.
How do I stop this?
Here is some system info that may help:
mysql> show variables LIKE '%version%';
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.46 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.46-0ubuntu0.14.04.2 |
| version_comment | (Ubuntu) |
| version_compile_machine | i686 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
EDIT 1 Table description:
mysql> desc items;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
...
| image_url | varchar(255) | NO | | NULL | |
...
EDIT 2 Checking for triggers:
mysql> show triggers \G
Empty set (0.00 sec)
EDIT 3 Another example:
I am doing all these commands from command line. Another example:
UPDATE items SET image_url = 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg33333333333333' WHERE id=38;
select * from items\G;
...
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg3333333333333
...
EDIT 4 Checking length of inputs and outputs:
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
1 row in set (0.00 sec)
http://www.lettercount.com/ gives http://ecx.images-amazon.com/images/I/61Dz5t8wjQL.SX522.jpg 61 characters as well, which makes sense given that the update is not changing the length of the string, just deleting the last characters and adding a space to the beginning,
EDIT 5 Trying encoding:
base64 encoding:
aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==
mysql> UPDATE items SET image_url = 'aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw==' WHERE id=38;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select image_url,length(image_url) from items where id=38\G;
*************************** 1. row ***************************
image_url: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
length(image_url): 84
1 row in set (0.00 sec)
decoding: aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS82MUR6NXQ4d2pRTC5fU1g1MjJfLmpwZw=
gives:
http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg
EDIT 6 Checking if Insert fails as well:
mysql> INSERT INTO items (url, image_url) VALUES('http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg', 'http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jpg');
Query OK, 1 row affected, 2 warnings (0.03 sec)
the warnings are because I did not give all the values where NULL:NO values in this insert
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field 'created_at' doesn't have a default value |
| Warning | 1364 | Field 'updated_at' doesn't have a default value |
+---------+------+-------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select image_url,length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
So, it also fails on insert.
EDIT 7 create table information
mysql> show create table items\G;
*************************** 1. row ***************************
Table: items
Create Table: CREATE TABLE `items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`image_url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`color` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
...
`store` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_items_on_id` (`id`),
KEY `index_items_on_url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 8 More table and column information
select * from information_schema.columns where table_name='items' and column_name='image_url'\G;
*************************** 2. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: development_database
TABLE_NAME: items
COLUMN_NAME: image_url
ORDINAL_POSITION: 5
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: varchar
CHARACTER_MAXIMUM_LENGTH: 255
CHARACTER_OCTET_LENGTH: 765
NUMERIC_PRECISION: NULL
NUMERIC_SCALE: NULL
CHARACTER_SET_NAME: utf8
COLLATION_NAME: utf8_unicode_ci
COLUMN_TYPE: varchar(255)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
2 rows in set (0.01 sec)
ERROR:
No query specified
EDIT 9 Charlength readouts
mysql> select image_url,length(image_url),char_length(image_url),url from items where id=39\G;
*************************** 1. row ***************************
image_url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
length(image_url): 61
char_length(image_url): 61
url: http://ecx.images-amazon.com/images/I/61Dz5t8wjQL._SX522_.jp
1 row in set (0.00 sec)
ERROR:
No query specified
EDIT 10 showing variables like character
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
EDIT 11: THE POTENTIAL ISSUE
The error does not appear in the users table, but it does occur in the items table. Here is the difference that I think may be causing the issue. (I do not yet have a solution since the item table has that UTF-8 for a reason: urls can have some funky characters)
show create table users\G;
ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1
show create table items\G;
ENGINE=InnoDB AUTO_INCREMENT=41 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
To be honest I think this should be a Community Answer, as I was a
little later on the scene and others had done some important ground work
establishing what was and was not a factor in this issue.
This link may be relevant, as your table character set is utf8 so the last character in the string may be getting skewed (and not saving correctly, thus disappearing).
All of the rows in EDIT 10 which reference latin1 or utf8 character set collations should be the same, and ideally should be utf8mb4 . I would now hazard a guess that the saving of UTF-8 characters in a non-true-utf-8 character collation is meaning the final character of any string is an incomplete reference and so not displaying.
So to solve your issue run the command:
ALTER TABLE items CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
For info / background:
utf8mb4 is the full and complete UTF-8 character set and so will show any and every character that can be used in a web address. If there are some obscure characters in the data I suggest you change the column to a BLOB column before then changing it to a utf8mb4 column, because this will preserve the correct character definitions as input rather than as assumed by MySQL on the data already entered.
You do not want utf8_ character sets, in MySQL that is as good as broken, what you want is utf8mb4, the standard UTF8 definition in MySQL is compromised because it saves 4-byte characters in 3-byte blocks and thus corrupts saved character data.
I'm trying to export some data from a MySQL database, but weird and wonderful things are happening to unicode in that table.
I will focus on one character, the left smartquote: “
When I use SELECT from the console, it is printed without issue:
mysql> SELECT text FROM posts;
+-------+
| text |
+-------+
| “foo” |
+-------+
This means the data are being sent to my terminal as utf-8[0] (which is correct).
However, when I use SELECT * FROM posts INTO OUTFILE '/tmp/x.csv' …;, the output file is not correctly encoded:
$ cat /tmp/x.csv
“fooâ€
Specifically, the “ is encoded with seven (7!) bytes: \xc3\xa2\xe2\x82\xac\xc5\x93.
What encoding is this? Or how could I tell MySQL to use a less unreasonable encoding?
Also, some miscellaneous facts:
SELECT ##character_set_database returns latin1
The text column is a VARCHAR(42):
mysql> DESCRIBE posts;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| text | varchar(42) | NO | MUL | | |
+-------+-------------+------+-----+---------+-------+
“ encoded as utf-8 yields \xe2\x80\x9c
\xe2\x80\x9c decoded as latin1 then re-encoded as utf-8 yields \xc3\xa2\xc2\x80\xc2\x9c (6 bytes).
Another data point: … (utf-8: \xe2\x80\xa6) is encoded to \xc3\xa2\xe2\x82\xac\xc2\xa6
[0]: as smart quotes aren't included in any 8-bit encoding, and my terminal correctly renders utf-8 characters.
Newer versions of MySQL have an option to set the character set in the outfile clause:
SELECT col1,col2,col3
FROM table1
INTO OUTFILE '/tmp/out.txt'
CHARACTER SET utf8
FIELDS TERMINATED BY ','
Many programs/standards (including MySQL) assume that "latin1" means "cp1252", so the 0x80 byte is interpreted as a Euro symbol, which is where that \xe2\x82\xac bit (U+20AC) comes from in the middle.
When I try this, it works properly (but note how I put data in, and the variables set on the db server):
mysql> set names utf8; -- http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
mysql> create table sq (c varchar(10)) character set utf8;
mysql> show create table sq\G
*************************** 1. row ***************************
Table: sq
Create Table: CREATE TABLE `sq` (
`c` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.19 sec)
mysql> insert into sq values (unhex('E2809C'));
Query OK, 1 row affected (0.00 sec)
mysql> select hex(c), c from sq;
+--------+------+
| hex(c) | c |
+--------+------+
| E2809C | “ |
+--------+------+
1 row in set (0.00 sec)
mysql> select * from sq into outfile '/tmp/x.csv';
Query OK, 1 row affected (0.02 sec)
mysql> show variables like "%char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
And from the shell:
/tmp$ hexdump -C x.csv
00000000 e2 80 9c 0a |....|
00000004
Hopefully there's a useful tidbit in there…
I've found that this works well.
SELECT convert(col_name USING latin1) FROM posts INTO OUTFILE '/tmp/x.csv' …;
To specifically address your question "What is this?", you have answered it yourself:
I suspect this is because “Column values are dumped using the binary character set. In effect, there is no character set conversion.” - dev.mysql.com/doc/refman/5.0/en/select-into.html
That is the way MySQL stores utf8 encoded data internally. It's a terribly inefficient variation of Unicode storage, apparently using a full three bytes for most characters, and not supporting four byte UTF-8 sequences.
As for how to convert it to real UTF-8 using INTO OUTFILE... I don't know. Using other mysqldump methods will do it though.
As you can see my MySQL database use latin1 and system is utf-8.
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
Every time I tried to export table I got strange encoded CSV file.
So, I put:
mysql_query("SET NAMES CP1252");
header('Content-Type: text/csv; charset=cp1252');
header('Content-Disposition: attachment;filename=output.csv');
as in my export script.
Then I have pure UTF-8 output.
Try SET CHARACTER SET <blah> before your select, <blah>=utf8 or latin1 etc...
See: http://dev.mysql.com/doc/refman/5.6/en/charset-connection.html
Or SET NAMES utf8; might work...
You can execute MySQL queries using the CLI tool (I believe even with an output format so it prints out CSV) and redirect to a file. Should do charset conversion and still give you access to do joins, etc.
You need to issue charset utf8 at the MySQL prompt before running the SELECT. This tells the server what to output the results as.