Drupal encoding and node insert - mysql

I have a CCK type for storing mentions (Social Media search mentions). Some of the mentions I believe are ASCII (My knowledge of this stuff is little).
I retrieve data from API's, which I then using node_save to save to Drupal.
My question is, what should I use to safely convert whatever I am getting into a format Drupal and MySQL are happy with?
The particular db_query error I get is unhelpfull "Warning in test1\includes\common.inc on line 3538". Nice. I have traced it to be encoding, as I used the following code to make the input safe, but it is not working with all input.
$node->title = htmlentities($item['title'], ENT_COMPAT, 'UTF-8');
It worked well for some ASCII characters, like those square ones [] etc, but not for this "行けなくてもずっとユーミンは聴きつづけます".
I'm really stuck. :(
UPDATE: The EXACT error I get from PHP is "Warning in D:\sites\test1\includes\common.inc on line 3538", and the line reads "if (db_query($query, $values)) {".
UPDATE 2: I've confirmed that the encoding of the data I am receiving is UTF8. This really doesn't make sense now, and I've confirmed that the collation in the db is utf8_general_ci.
UPDATE 3: One of the title's is: How Much Does A Facebook Fan Cost?� $1.07
The output of:
var_export(array_map('ord', str_split($node->title))
gave me the character 160 for the funny question mark (which is a square like [] in eclipse).
UPDATE 4: MySQL version is 5.1.41, and the collation on the columns is utf8_general_ci.
UPDATE 5: I managed to get Drupal to print the query with db_queryd. Funny thing is now I get the exact error message and not "Warning in", but Drupal still doesn't have this error in the log! WTF. So the exact sql is:
INSERT INTO node (vid, type, language, title, uid, status, created, changed, comment, promote, moderate, sticky, tnid, translate) VALUES (0, 'sm_mention', '', 'How Much Does A Facebook Fan Cost?� $1.07 (Geoffrey A. Fowler/Digits)', 1, 1, 1298395302, 1298395302, 0, 0, 0, 0, 0, 0)
And the error given is: Incorrect string value: '\xA0 $1.0...' for column 'title' at row 1
This honestly sounds like something doesn't like extended ascii characters.
UPDATE 6:
SHOW CREATE TABLE node:
CREATE TABLE `node` (
`nid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`vid` int(10) unsigned NOT NULL DEFAULT '0',
`type` varchar(32) NOT NULL DEFAULT '',
`language` varchar(12) NOT NULL DEFAULT '',
`title` varchar(255) NOT NULL DEFAULT '',
`uid` int(11) NOT NULL DEFAULT '0',
`status` int(11) NOT NULL DEFAULT '1',
`created` int(11) NOT NULL DEFAULT '0',
`changed` int(11) NOT NULL DEFAULT '0',
`comment` int(11) NOT NULL DEFAULT '0',
`promote` int(11) NOT NULL DEFAULT '0',
`moderate` int(11) NOT NULL DEFAULT '0',
`sticky` int(11) NOT NULL DEFAULT '0',
`tnid` int(10) unsigned NOT NULL DEFAULT '0',
`translate` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`nid`),
UNIQUE KEY `vid` (`vid`),
KEY `node_changed` (`changed`),
KEY `node_created` (`created`),
KEY `node_moderate` (`moderate`),
KEY `node_promote_status` (`promote`,`status`),
KEY `node_status_type` (`status`,`type`,`nid`),
KEY `node_title_type` (`title`,`type`(4)),
KEY `node_type` (`type`(4)),
KEY `uid` (`uid`),
KEY `tnid` (`tnid`),
KEY `translate` (`translate`)
) ENGINE=InnoDB AUTO_INCREMENT=1700 DEFAULT CHARSET=utf8

\xA0 is not a valid start of a UTF8 sequence.
The character known as NO-BREAK SPACE having the Unicode codepoint 0x00A0 should be encoded as 0xC2A0 in UTF8.
Thus said, your input string is broken, it's not a valid UTF8.

Related

MySQL collation case INSENSITIVE search for utf8mb4_bin [duplicate]

I found out that when I query one of my tables it is case sensitive, so I tried to change the collation (I'm using Workbench in windows). I
right clicked on the table -> alter table -> collation
-> changed from utf8mb4_default to utf8mb4_general_ci
But it didn't work and the queries are still case sensitive. and when I
right click on the table -> alter table -> collation
is utf8mb4_default
and when I change it to utf8mb4_general_ci again, and apply the change, it says no changes detected!
The column type is VARBINARY, I tried this:
MySQL case insensitive search on varbinary field?
but it takes a lot of time, it is not acceptable.
This is t create statement:
CREATE TABLE `page` (
`page_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
`page_namespace` int(11) NOT NULL DEFAULT '0',
`page_title` varbinary(255) NOT NULL DEFAULT '',
`page_restrictions` tinyblob NOT NULL,
`page_counter` bigint(20) unsigned NOT NULL DEFAULT '0',
`page_is_redirect` tinyint(1) unsigned NOT NULL DEFAULT '0',
`page_is_new` tinyint(1) unsigned NOT NULL DEFAULT '0',
`page_random` double unsigned NOT NULL DEFAULT '0',
`page_touched` varbinary(14) NOT NULL DEFAULT '',
`page_links_updated` varbinary(14) DEFAULT NULL,
`page_latest` int(8) unsigned NOT NULL DEFAULT '0',
`page_len` int(8) unsigned NOT NULL DEFAULT '0',
`page_content_model` varbinary(32) DEFAULT NULL,
PRIMARY KEY (`page_id`),
UNIQUE KEY `name_title` (`page_namespace`,`page_title`),
KEY `page_random` (`page_random`),
KEY `page_len` (`page_len`),
KEY `page_redirect_namespace_len` (`page_is_redirect`,`page_namespace`,`page_len`),
KEY `idx_page_page_is_new` (`page_is_new`),
KEY `idx_page_page_title_is_new` (`page_title`,`page_is_new`)
) ENGINE=InnoDB AUTO_INCREMENT=44062999 DEFAULT CHARSET=utf8mb4;
Any other suggestions?
Looks like you have the following options:
Convert your binary column to a none binary text column, using a temp column because binary columns cannot be case in-sensitive
Use the Convert function as the link you mentioned
Use the Lower or Upper methods
If you really want the column be always case in-sensitive, I'd say go for option 1.
In mysql there is a collation for each column in addition to the overall collation of the table. You will need to change the collation for each individual column.
(I believe the overall table collation determines the default collation if you create a new column, but don't quote me on that.)

MySQL INSERT results in duplicate key, but no duplicate exists

I have read through many entries that people have claimed to have this problem and they've solved their issue but none of their answers solve MY issue. I am using phpMyAdmin to update the email address of a user. The "user_email" field is marked as UNIQUE. Whenever I update the email address to his actual email, I get:
Duplicate entry 'user#example.com' for key 'user_email'
I have Analyzed, Optimized, and Repaired the table and no errors appear -- everything comes up as OK.
I have run several SQL statements to find any duplication only to find out that none exists.
I even exported the table and imported the records again. I add the indexes and try and update... duplicate entry message. Here's the table structure:
CREATE TABLE IF NOT EXISTS `users` (
`id` bigint(20) NOT NULL,
`md5_id` varchar(200) NOT NULL DEFAULT '',
`full_name` tinytext,
`user_name` varchar(200) DEFAULT NULL,
`user_email` varchar(220) DEFAULT NULL,
`user_level` tinyint(4) NOT NULL DEFAULT '1',
`pwd` varchar(220) NOT NULL DEFAULT '',
`address` text COLLATE latin1_general_ci,
`country` varchar(200) DEFAULT NULL,
`tel` varchar(200) NOT NULL DEFAULT '',
`fax` varchar(200) DEFAULT NULL,
`website` text,
`date` date NOT NULL DEFAULT '0000-00-00',
`users_ip` varchar(200) NOT NULL DEFAULT '',
`approved` int(1) NOT NULL DEFAULT '0',
`activation_code` int(10) NOT NULL DEFAULT '0',
`banned` int(1) NOT NULL DEFAULT '0',
`ckey` varchar(220) NOT NULL DEFAULT '',
`ctime` varchar(220) NOT NULL DEFAULT '',
`location` tinyint(4) NOT NULL DEFAULT '9'
) ENGINE=MyISAM AUTO_INCREMENT=210 DEFAULT CHARSET=latin1;
ALTER TABLE `users` ADD PRIMARY KEY (`id`);
MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT,AUTO_INCREMENT=210;
Even now that I have REMOVED the UNIQUE index from the 'user_email' field, the error is STILL coming up. I REALLY don't understand that (Maybe something residual...? I'm just guessing).
Picture me with wads of hair in my hands. Can anyone please help?
UPDATE:
Here's the output from SHOW CREATE TABLE users
Here's the output from SHOW INDEX FROM users
Error message while editing:
Error message without using database name:
Output of: SHOW CREATE TABLE proctor.users

Union 2 tables inner join on a third with 2 legacy applications

I have a legacy Access front end connected to a mySQL database. The legacy app has a lot of dangerous macros assigned to onclose triggers. I also have a web application under development running on the same database. There are a couple of modules in the web app that are in production use. My testing is being done on a separate development machine with a separate dedicated development version of the database.
A new module I'm installing into my web app comes with it's own set of tables. It will happily exist in the same database but want's it's own copy of the data in it's own tables. I hesitate to extensively modify the new tables or code base for that module.
There are a total of 6 tables that hold similar data for different objects in the legacy database. I am only working on the 2 most important of those tables now. The below represents only a very small subset of the columns in these 2 tables.
CREATE TABLE IF NOT EXISTS `agent` (
`age_id` int(11) NOT NULL AUTO_INCREMENT,
`age_agent_email_address` varchar(255) DEFAULT NULL,
`age_welcome_email_sent_y_or_n` varchar(255) DEFAULT 'No',
`age_status` varchar(255) DEFAULT 'Active',
PRIMARY KEY (`age_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=1854 ;
And
CREATE TABLE IF NOT EXISTS `prospecting_contacts` (
`psp_prospect_id` varchar(255) NOT NULL DEFAULT '',
`psp_prospecting_status` varchar(255) DEFAULT 'Active',
`psp_prospect_email_address` varchar(255) DEFAULT NULL,
`psp_remove_from_email_marketing` varchar(255) DEFAULT 'No',
`psp_id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`psp_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=2050793 ;
There are several related tables that came with the new module. I believe only one of them needs to be updated.
CREATE TABLE IF NOT EXISTS `phplist_user_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) CHARACTER SET latin1 NOT NULL,
`confirmed` tinyint(4) DEFAULT '0',
`blacklisted` tinyint(4) DEFAULT '0',
`bouncecount` int(11) DEFAULT '0',
`entered` datetime DEFAULT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`uniqid` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`htmlemail` tinyint(4) DEFAULT '0',
`subscribepage` int(11) DEFAULT NULL,
`rssfrequency` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`password` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`passwordchanged` date DEFAULT NULL,
`disabled` tinyint(4) DEFAULT '0',
`extradata` text CHARACTER SET latin1,
`foreignkey` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`optedin` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `foreignkey` (`foreignkey`),
KEY `idx_phplist_user_user_uniqid` (`uniqid`),
KEY `emailidx` (`email`),
KEY `enteredindex` (`entered`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
The php_list_user_user table would include data that is a result of this query:
SELECT `age_agent_email_address` AS `email` FROM `agent`
WHERE `age_status` = 'Active'
UNION DISTINCT
SELECT `psp_prospect_email_address` FROM `prospecting_contacts`
WHERE `psp_prospecting_status` = 'Active'
The legacy access application updates the agent and prospecting_contacts tables. The new module updates the php_list_user_user table. I believe I can copy the information back and forth using TRIGGER. But, I'm looking for a way that doesn't duplicate data.
I had thought of CREATE VIEW, but the mysql manual says that unions and joins break it's update ability. http://dev.mysql.com/doc/refman/5.1/en/view-updatability.html
So, is there a way to update these 3 tables without duplicating data? Or should I just duplicate the email addresses and use TRIGGERs on INSERT and UPDATE?
You might be able to do something clever with foreign keys though they are more attuned to keeping tables consistent rather than preventing duplicates. http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
It may seem counter-intuitive but another solution would be to maintain a lookup table that indicated where a specific value could be found. You could join with all three of the (sub)tables to prevent duplicates.
A trigger would work too.

Very odd characters in a mysql dump -- what to do?

I've got weird data mangling my data migration. These weird characters are embedded as-is in the actual mysql dump file:
北京东方å›悦大酒店<br />\n<br />\n“The impetus
I've been given mysql data dumps with those kinds of chars in them. I'm importing the data into Drupal, by first recreating the mysql tables, and then querying against them using Drupal's Migrate module.
Code looks like this:
DROP TABLE IF EXISTS `news`;
SET #saved_cs_client = ##character_set_client;
SET character_set_client = utf8;
CREATE TABLE `news` (
`id` int(11) NOT NULL auto_increment,
`uid` int(11) NOT NULL,
`pid` int(11) default NULL,
`puid` int(11) default NULL,
`headline` varchar(255) NOT NULL,
`teaser` varchar(500) NOT NULL,
`status` char(1) default NULL,
`date` datetime NOT NULL,
`url` varchar(255) default NULL,
`url_title` varchar(255) default NULL,
`body` text,
`caption` varchar(255) default NULL,
`gid` int(11) default NULL,
`feature` text,
`related` varchar(255) default NULL,
`change1_time` int(11) default NULL,
`change2_time` int(11) default NULL,
`change1_user` varchar(255) default NULL,
`change2_user` varchar(255) default NULL,
`expires` datetime default NULL,
`rank` char(1) default NULL,
PRIMARY KEY (`id`),
KEY `uid` (`uid`),
KEY `status` (`status`),
KEY `expires` (`expires`),
KEY `rank` (`rank`),
KEY `puid` (`puid`),
FULLTEXT KEY `headline` (`headline`,`teaser`,`body`)
) ENGINE=MyISAM AUTO_INCREMENT=6976 DEFAULT CHARSET=utf8;
SET character_set_client = #saved_cs_client;
Fastest solution is the winner here -- I'm on a tight deadline, and really suffering over here! I've tried a search and replace solution, but there appear to be too many different types of weird data. I can orchestrate a new data dump, if I know what to tell them (how to do the data dump).
Thanks,
John
This isn't a direct answer to your question, but I played a bit with the mojibake you quoted in your post. It looks like it was originally Chinese text in UTF-8 encoding, which was interpreted as Latin text in Windows-1252 encoding, re-encoded in UTF-8 and again interpreted as Windows-1252 (and finally once more encoded as UTF-8 when you posted it here). So it's not just mojibake, it's double mojibake.
Also, at some point, a byte was lost from the middle of the string (probably because it was one of the undefined code points in Windows-1252), mangling one of the original characters. Running the text through the encoding chain in reverse (encode as Windows-1252, decode as UTF-8, repeat), I get the output:
北京东方�悦大酒店<br />\n<br />\n“The impetus
where the replacement character � stands for the mangled character.

Create Table Syntax Error

I received a MySQL data dump and am trying to insert the data into a set of temporary tables. The creation statement for the first table is shown below. When I run this I receive the error: "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''temp_books'( 'ID'int( 11 ) NOT NULL AUTO_INCREMENT, 'start'varchar( 20 ) ' at line 1". I've checked the documentation for MySQL syntax, and I don't see that the problem is.
CREATE TABLE 'temp_books' (
'ID' int(11) NOT NULL AUTO_INCREMENT,
'start' varchar(20) NOT NULL,
'customer_id' int(11) NOT NULL DEFAULT '0',
'total_num' int(11) NOT NULL,
'amount' double(5,2) NOT NULL DEFAULT '0.00',
'changed' timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY ('ID'),
UNIQUE KEY 'start' ('start')
) ENGINE=MyISAM AUTO_INCREMENT=4853 DEFAULT CHARSET=latin1;
You shouldn't put single-quotes on your identifiers. If you're going to quote them use the "back tick" character (“`”). You can also use double-quotes but you have to specify that mode:
SET sql_mode='ANSI_QUOTES';
http://dev.mysql.com/doc/refman/5.0/en/identifiers.html
I've ALWAYS had issues with CREATE TABLE. Not sure why. Takes some trial-and-error.
Try this:
CREATE TABLE temp_books (
ID int(11) NOT NULL AUTO_INCREMENT,
start varchar(20) NOT NULL,
customer_id int(11) NOT NULL DEFAULT '0',
total_num int(11) NOT NULL,
amount double(5,2) NOT NULL DEFAULT '0.00',
changed timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (ID),
UNIQUE KEY start (start)
) ENGINE=MyISAM AUTO_INCREMENT=4853 DEFAULT CHARSET=latin1;
I had to delete the quote marks, as well as the default for the changed field, as well as the default charset. Hopefully that won't affect the data.
Here's another way of writing it that might work for some: (left away most of the columns for brevity)
create table temp_books
(
id int not null,
start varchar(255) null,
constraint six_cb_datasource_pk
primary key (id)
);