mysql 0 terminated string showing up in XML as Unicode 0x0 - mysql

In the process of moving some applications from PHP and Delphi to Java-RS some column entries are spoiling the show at this time.
After reading the data from mySQL with JPA into Java Pojos converting the result to XML using JaxB and trying to read back the result to JQuery / jqGrid a failure happens.
In the browser the problem simply shows at "not well formed".
Looking at the details with the Eclipse XML editor gives the error message:
An invalid XML character (Unicode 0x0) was found in the element content of the document
Now I'd like to proceeed and fix the original data.
How would an SQL query look like that looks for the rows that have
invalid entries?
How would an SQL query look like that fixes these
rows?
Let's assume the column with the problem is "name" in Table "Customer"
For question #1 I found:
SELECT name from customer where hex(name) like "%00";
to work. For question #2 I am assuming that update with a left substring might work. I am not sure about the length in this case. It looks like length(name) will return the length including the terminating zero character. Will the update with left(length(name)-1) work correctly in this case?
I am not sure whether backup and restore of the database would keep the current somewhat corrupted database in shape. So it would also be helpful to know how the situation can be reproduced with an insert statement that creates null terminated strings on purpose.

I think you should be able to transform the HEX() with something like
UPDATE customer SET name=UNHEX(REPLACE(HEX(name), '00', ''));
or simply
UPDATE customer SET name=REPLACE(name, CHAR(0), '');

update customer set name=left(name,length(name)-1) where hex(name) like "%00";

Related

Syntax error in date in query expression for non-date fields

I'm having trouble building a query in Access 2013. The database isn't mine and the only thing I really have control over is this query. There is a table, I'm pulling 7 fields from it and eventually adding an 8th field to the query to do some string manipulation.
However, I keep getting getting "Syntax error in date in query expression 'fieldname'." error whenever I click on the arrow to sort the fields. The odd thing is these errors pop up when sorting non-date fields. When sorting the date field I get "Syntax error (missing operator) in query expression 'Release Date'."
This happens after a fresh build. I have no WHERE conditions, just SELECT and FROM. Ideas?
Here's the sql query, though I'm mainly working in the query design view:
SELECT Transmissions.[Job#], Transmissions.[Part#], Transmissions.TransmissionSN, Transmissions.Status, Transmissions.[Release Date], Transmissions.[Build Book Printed], Transmissions.[ID Tags Required]
FROM Transmissions;
Well... it seems you are the lucky inheritor of a poorly designed database.
Using special characters in a field name is just asking for trouble. And you've found what that trouble is.
Access uses the # sign to designate a Date type for query comparisons. Such as:
dtSomeDate = #2/20/2017#
You surround the date with the # signs.
In your case, the query thinks [Job#] and [Part#] are trying to wrap dates. But of course, that's not the case and thus it fails.
You can try a couple of work arounds. (I leave it to you to experiment.)
1) You can try to rename the problem fields within your query. So that:
Transmissions.[Job#] becomes Transmissions.[Job#] as JobNum
and
Transmissions.[Part#] becomes Transmissions.[Part#] as PartNum
2) You can try to copy the [Transmissions] table to a new table that you create
that does not have the naming problems.
3) Export the [Transmissions] table to a CSV file and re-import it to a new
table (or possibly new database) without the naming problems.
Here is a link to a microsoft article that tells you why to avoid special characters in Access:
Big Bad Special Chars.
Hope that puts you on the right track. :)
Typically, this means that the field names are missing or misspelled.
Try running this to see:
SELECT * FROM Transmissions;

ms access ascii conversion unable to filter

I'm using the "ASC" function in Access (Version 365 Proplus, 32 bit).
Have created a query that uses a table with a post code that needs validating. I'm looking at the first character in the postcode, converting it to ASCII character then planning on filtering out the ones I don't want.
The formula looks like this:-
Site_PostCode_String_Validation_P1: Asc(Left([site_postcode],1))
This works fine and converts as expected. However, when I try sorting or filtering using the Query Criteria on my Ascii list I get the following message:-
"Data Type Mismatch in Criteria Expression"
I have tried converting to a string, for example, using the below:-
Str(Asc(Left([site_postcode],1)))
But this has made no difference, get the same error message when applying criteria or sorting.
I have tried filtering using text and numbers but get the same error.
I have searched here and have Googled but can not see anything relating to the above.
Thanks for any suggestions.
You might simplify it a bit:
Site_PostCode_String_Validation_P1: Asc(Nz([site_postcode], Chr(0))
I tried creating a new table based on the results, thinking that this would then enable me to apply my filtering, during this process I got a more detailed error message stating error due to Null type conversion failure. So then realized that I need to make sure there were no Null entries before doing the ascii conversion. Hopefully this will be of help to someone else. Final formula now works as below:
IIf(IsNull([site_postcode])=-1,Null,Asc(Left([site_postcode],1)))

MySQL does not identify character '?' on select

I got a table in MySQL with the following columns:
id name email address borningDate
I have a form in a HTML page that submits this data to a servlet, responsible for saving it at the database. Due to charset issues (already fixed), I saved a row like this, when trying to store letters with accents:
19 ? ? ? 2015-03-01
and now I want to delete this row.
Yeah, doing this:
DELETE FROM table WHERE id=19;
works nice. My didatic question is: why, if I try something like this:
DELETE FROM table WHERE name='?';
it returns 0 rows affected, like if it can't see ? as a valid character?
Try doing
SELECT id, HEX(name), HEX(email), HEX(address), borningDate FROM table
This will tell you what's actually in the database. It probably isn't actually ASCII question marks. The question marks are probably substitution characters applied when MySQL tries to convert the column's character set to the connection's character set.
To manage this more specifically, do SHOW CREATE TABLE table and look for the character set being used for the text columns. This probably shows up at the end of the table definition as something like DEFAULT CHARSET utf8 or some such thing. But it might be specified in the column definition.
Once you know the character set, issue the command SET NAMES charset, for example, SET NAMES utf8. Then reissue your commands and see if you get better results than the ? substitution character. That assumes, of course, that the client program you are using can handle the character set mentioned.

Why would error say Column 'n' does not belong to table when it clearly does?

I am running a Query from ASP.NET on a Microsoft Access database. The error states: Column 'Area1' does not belong to table.
It looks like the error occurs here:
cmd.Parameters.AddWithValue("#" + columnName, progressRow[columnName]);
The sql command looks like this:
Extract.exe Warning: 0 : 03/09/2014 10:48:28 | CSUtilities | serengeti | Add Query : INSERT INTO Property (Address1,Address2,Address3,Address4,Area1,Area10,Area11,Area12,Area13,Area14,Area15,Area16,Area17,Area18,Area19,Area2,Area20,Area3,Area4,Area5,Area6,Area7,Area8,Area9,BlockFlag,CapitalValue,ConstructionYear,CurrentStock,Department,DepartmentFIS,DisposalDate,DisposalReason,DoubleBeds,DwellingType,DwellingTypeCode,EastingGridRef,ElevationGridRef,HeatingCode,HouseNumber,KeyBlockProperty,KeyCurrTcyId,KeyProperty,KeyPropertyTypeCode,KeyStreet,LocalPerm,NorthingGridRef,OwningDepartment,Postcode,PropertyClassCode,PropertyGroup,QLX,RepairResponsibility,RiskLevel,RTBCode,SingleBeds,StatPerm,Suffix,Telephone,TenureCode,UserCode,Void,WardCode,WaterCode, UpdateFlag) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
Area1 is clearly defined in the Design View as a Short Text field.
I ran the following test manually and it works:
INSERT INTO Property (Address1,Address2,Address3,Address4,Area1) VALUES ('Test1','Test','Test','Test','Test')
What is it playing at?
It MIGHT be getting confused with the named parameters being the same as that of the column names (I've seen it before, but cant remember which engine). Try changing your function of add parameters with value to "#x" + column name... this way the OleDb driver won't be trying to look for a column Area1 to match with #Area1, but look for #xArea1 (and likewise for the rest).
One OTHER possibility is the data you are inserting too. Might there be something with special characters like an "#" or "?" or similar in such fields that are confusing the post?

MSSQL to MySQL migration - char encoding issues with UCS-2 surrogate pairs, how can I remove these from MSSQL database?

I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)