I'm trying to import data into SQL from a CSV in PHP My Admin so it may be a PHP My Admin problem. The problem i'm having is that some of the columns use special characters for instance:
"Adán, Antonio"
Ends up as just "Ad".
The column structure is as follows:
CREATE TABLE IF NOT EXISTS `players` (
`player_name` varchar(255) COLLATE utf8 NOT NULL,
`player_nation` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_club` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_position` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_age` tinyint(2) NOT NULL,
`player_dob` varchar(32) CHARACTER SET utf8 NOT NULL,
`player_based` varchar(255) CHARACTER SET utf8 NOT NULL,
`player_id` int(10) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8;
I'm guessing it's something to do with the character set but mysql.com just suggests to alter the table to characters set utf8 which it already is.
Any ideas how else I can prevent this?
UPDATE
Inserting into the database is fine, so i'm guessing it's not to do with my table structure. It seems to be specifically to do with importing from a CSV.
This is the query for load data, as generated by PHP My Admin
LOAD DATA LOCAL INFILE 'C:\\Windows\\Temp\\php21E4.tmp' INTO TABLE `players` FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n'.
SOLUTION
I discovered the problem. My CSV was encoded with a Western Europe charset. After faffing around for a bit to convert it to UTF8 it imported just fine. Not an SQL problem at all.
Can you track where the truncation is happening, its quite possible that the issue isn't with your DB at all.
Try a simple insert into your DB table from the command line with the special chars and see if it succeeds.
Then try logging the various steps in the import to track where the issue occurs...
Related
I'm trying to create a dump file from an MSAccess database to import in to a MySQL database. I can create the dump file, but when I try to import it I get this error Incorrect string value: '\xA325- R...' for column...
I'm not really sure what it means or how to fix it. I know the dump partially works as other tables and data get imported. The import gets to this error and then stops
I've tried setting this DEFAULT CHARACTER SET = utf8; on the table
I've also tried this on the column CHARACTER SET utf8 COLLATE utf8_unicode_ci
Here's an example from my dump file
CREATE TABLE Agent_Table(
AgentID INT NOT NULL AUTO_INCREMENT ,
Agent VARCHAR(255) ,
Archive VARCHAR(255) ,
AgentEmail VARCHAR(255) ,
AgentMobile VARCHAR(255) ,
PRIMARY KEY (`AgentID`)
)
I also tried
CREATE TABLE Agent_Table(
AgentID INT NOT NULL AUTO_INCREMENT ,
Agent VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
Archive VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci ,
AgentEmail VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
AgentMobile VARCHAR(255)CHARACTER SET utf8 COLLATE utf8_unicode_ci ,
PRIMARY KEY (`AgentID`)
)
DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
I've tried a number of the utf8 options as well and still no joy
Nothing seems to work.
I'm writing a dump file because I've exhausted all other methods to try and to import/export the data
All the methods/tools I have tried all fail with some form of ODBC error I'd list them here but there are so many as I have been at this all day.
If anyone knows how to fix this problem or how else I can import this data I would appreciate it
UPDATE
The problems are £ and ` and ’ that I have seen so far. If I do a find and replace of those chars they no longer throw up errors. But it seems no matter what encoding I set the database to it just does not like these characters during the import.
When trying to insert a unicode emoji character (😎) to a MYSQL table, the insert fails due to the error;
Incorrect string value: '\\xF0\\x9F\\x98\\x8E\\xF0\\x9F...' for column 'Title' at row 1
From what I've red about this issue, it's apparently caused by the tables default character set, and possible the columns default character set, being set incorrectly. This post suggests to use utf8mb4, which I've tried, but the insert is still failing.
Here's my table configuration;
CREATE TABLE `TestTable` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`InsertDate` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Title` text,
`Description` text,
`Info` varchar(250) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`Id`),
KEY `xId_TestTablePK` (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=2191 DEFAULT CHARSET=utf8mb4;
Note that the Title and Text columns dont have an explicitly stated character set. Initially I had no default table character set, and had these two columns were setup with DEFAULT CHARSET=utf8mb4. However, when I altered the table's default charset to the same, they were removed (presumably because the columns inherit the type from the table?)
Can anyone please help me understand how I can store these unicode values in my table?
Its worth noting that I'm on Windows, trying to perform this insert on the MYSQL Workbench. I have also tried using C# to insert into the database, specifying the character set with CHARSET=utf8mb4, however this returned the same error.
EDIT
To try and insert this data, I am executing the following;
INSERT INTO TestTable (Title) SELECT '😎😎';
Edit
Not sure if this is relevant or not, but my database is also set up with the same default character set;
CREATE DATABASE `TestDB` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;
The connection needs to establish that the client is talking utf8mb4, not just utf8. This involves changing the parameters used at connection time. Or executing SET NAMES utf8mb4 just after connecting.
I have a MySQL Database (myDB; ~2GB in size) with 4 Tables (tab1, tab2, tab3, tab4). Currently, the data that is stored in the tables was added using the charset ISO-8859-1 (i.e. Latin-1).
I'd like to convert the data in all tables to UTF-8 and use UTF-8 as default charset of the tables/database/columns.
On https://blogs.harvard.edu/djcp/2010/01/convert-mysql-database-from-latin1-to-utf8-the-right-way/ I found an interesting approach:
mysqldump myDB | sed -i 's/CHARSET=latin1/CHARSET=utf8/g' | iconv -f latin1 -t utf8 | mysql myDB2
I haven't tried it yet, but are there any caveats?
Is there a way to do it directly in the MySQL shell?
[EDIT:]
Result of SHOW CREATE TABLE messages; after running ALTER TABLE messages CONVERT TO CHARACTER SET utf8mb4;
CREATE TABLE `messages` (
`number` int(11) NOT NULL AUTO_INCREMENT,
`status` enum('0','1','2') NOT NULL DEFAULT '1',
`user` varchar(30) NOT NULL DEFAULT '',
`comment` varchar(250) NOT NULL DEFAULT '',
`text` mediumtext NOT NULL,
`date` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`number`),
KEY `index_user_status_date` (`user`,`status`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=3285217 DEFAULT CHARSET=utf8mb4
It is possible to convert the tables. But then you need to convert the application, too.
ALTER TABLE tab1 CONVERT TO utf8mb4;
etc.
To check, do SHOW CREATE TABLE tab1; it should show you CHARACTER SET utf8mb4.
Note: There are 3 things going on:
Convert the encoding of the data in any VARCHAR and TEXT columns.
Change the CHARACTER SET for such columns.
Change the DEFAULT CHARACTER SET for the table -- this comes into play if you add any new columns without specifying a charset.
The application...
When you connect from a client to MySQL, you need to tell it, in a app-specific way or via SET NAMES, the encoding of the bytes in the client. This does not have to be the same as the column declarations; conversion will occur during INSERT and SELECT, if necessary.
I recommend you take a backup and/or test a copy of one of the tables. Be sure to go all the way through -- insert, select, display, etc.
I am trying to import a small data set of Berlin street addresses using MySQL's LOAD DATA statement. The problem is that after the import runs, all of the beautiful ß characters in the German street names have become ß sets.
Here's the create-table statement I used for this table:
CREATE TABLE `subway_distances` (
`STN` varchar(255) DEFAULT NULL,
`HNR` int(9) DEFAULT NULL,
`Lat` decimal(36,15) DEFAULT NULL,
`Lon` decimal(36,15) DEFAULT NULL,
`Distance` decimal(45,20) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
... and here is my MySQL shell code:
charset utf8;
TRUNCATE TABLE subway_distances;
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
SELECT * FROM subway_distances LIMIT 0,10;
I have looked at output.csv in vim, and the eszett character appears to be fine there.
I am assuming that I simply need a different encoding declaration in MySQL, but I'm not sure where to start.
I am also assuming that collation doesn't matter yet, since I'm not comparing values -- just purely trying to get a valid import.
I found an answer to this relatively quickly. It looks like I just need to specify the CHARACTER SET value in my LOAD DATA statement. So the new statement looks like this:
LOAD DATA LOCAL INFILE '/path/to/output.csv'
INTO TABLE berlin.subway_distances
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\';
I am using Navicat 11.0.8 and I am trying to insert values into a table using a query but when i try to insert the values with the query it works but the character encoding is messed up!
As you can see on my code, the table is 'table' and i am inserting an ID, VNUM and a NAME. The VNUM is '体字' and the NAME is 'Versão'.
INSERT INTO table VALUES ('1', '体字', 'Versão');
Instead of showing '体字' on the VNUM and 'Versão' on the NAME, it shows '体å—' and 'Versão'.
This is very bad for me because I am trying to insert more than 5000 lines with alot of information.
I have tried to set the Character Encoding of the table using this these commands:
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
&
ALTER TABLE table CONVERT TO CHARACTER SET big5 COLLATE big5_chinese_ci;
&
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
I also have tried to delete my table and create a new one with the character encoding already to utf-8 and send the values..
SET FOREIGN_KEY_CHECKS=0;
DROP TABLE IF EXISTS `table`;
CREATE TABLE `table` (
`vnum` int(11) unsigned NOT NULL default '0',
`name` varbinary(200) NOT NULL default 'Noname',
`locale_name` varbinary(24) NOT NULL default 'Noname',
) ENGINE=MyISAM DEFAULT CHARSET=big5_chinese_ci;
INSERT INTO `table` VALUES ('1', '体字', 'Versão');
Still show's '体å—' and 'Versão'.
If i edit the table manually, it show's correct! But i am not going to edit 5000+ lines...
it shows '体å—' and 'Versão'. -- Sounds like you had SET NAMES latin1. Do
SET NAMES utf8;
after connecting and before INSERTing. That tells mysqld what encoding the client is using.
Verify the data stored by doing SELECT col, HEX(col) ... The utf8 (or utf8mb4) hex for 体字 is E4BD93E5AD97. Interpreting those same bytes as latin1 gives you 体å—;
utf8 can handle those, plus ã; I don't know if big5 can.
Actually, I suggest you use utf8mb4 instead of utf8. This is in case you encounter some of the 4-byte Chinese characters.
If you still have latin1 columns that need changing to utf8mb4, see my blog, which discusses the "2-step ALTER", but using BINARY or BLOB (not big5) as the intermediate.