Converting Ms -Sql db from Ascii to unicode - sql-server-2008

My SQL Server 2008 R2 database has string columns (nvarchar). And some of the old data is showing Ascii. I need to show it to the user in my site and I prefer to convert the data in the database to Unicode. Is there a quick way to do this? are there downsides that I should be aware of?
Examples to my issue:
In the database, I see special chars instead of regulars chars. I have a name of a user which is supposed to be Amédée, and instead it shows Am?d??.
In other cases I see " instead of Quotation mark ("), or the chars &# instead of the word "and".

Related

smart solution of SQL injection

These is one keyword confliction issue in the query module of my application,please see if you can tell me a smart solution.
First,In query module,each query condition contains three parts in UI:
1.field name,its value is fixed,e.g origin,finalDest...
2.operator,it is a select list which includes "like","not like","in","not in","=","!="
3.value,this part is input by user.then in back-end,it will assemble the SQL statement according to UI's query criteria,e.g if user type/select following stuff in UI
Field Name Operator Value
origin like CHI
finalDest in SEL
In back-end,it will generate following SQL:
select * from Booking where origin like '%CHI%' and finalDest in ('SEL').
But there is a bug,e.g if user type some of special symbol in "value",e.g "'","_" etc,it will lead to the generated SQL also contain ' or _ ,e.g:
select * from Booking where origin like '%C_HI%' and finalDest in ('S'EL').
you could see as there is special symbol in "where" block,the SQL can't be executed
For this problem,my solution is add escape character "/" in front of the special symbol before executing it,but what i know is just ' or _ that would conflict with the SQL keywords,do you know if there is any others similar symbol that i need to handle or do you guys have any better idea that can avoid the injection
Sorry,forgot told you what language i am using,i am using java,the DB is mysql,i also use hibernate,there are a lot of people said why i didn't use PreparedStatement,this is a little complex,simply speaking,in my company,we had a FW called dynamic query,we pre-defined the SQL fragment in a XML file,then we will assemble the SQL according to the UI pass in criteria with the jxel expression,as the SQL is kinda of pre-defined stuff,i afraid if change to use PreparedStatement,it will involve a lot of change for our FW,so what we care is just on how to fix the SQL injection issue with a simple way.
The code should begin attempting to stop SQL injection on the server side prior to sending any information to the database. I'm not sure what language you are using, but this is normally accomplished by creating a statement that contains bind variables of some sort. In Java, this is a PreparedStatement, other languages contains similar features.
Using bind variables or parameters in a statement will leverage built in protection against SQL injection, which honestly is going to be better than anything you or I write on the database. If your doing any String concatenation on the server side to form a complete SQL statement, this is an indicator of a SQL injection risk.
0 An ASCII NUL (0x00) character.
' A single quote (“'”) character.
" A double quote (“"”) character.
b A backspace character.
n A newline (linefeed) character.
r A carriage return character.
t A tab character.
Z ASCII 26 (Control+Z). See note following the table.
\ A backslash (“\”) character.
% A “%” character. See note following the table.
_ A “_” character. See note following the table
Reference
Stack Similar Question
You should use bind variables in your SQL statement. As already mentioned this is done with PreparedStatements in Java.
To make sure, only valid column names are used, you can validate the input against the database. MySQL provides schema information like columns of each table as part of the INFORMATION_SCHEMA. For further information, check the MySQL documentation:
"The INFORMATION_SCHEMA COLUMNS Table"

MSSQL to MySQL migration - char encoding issues with UCS-2 surrogate pairs, how can I remove these from MSSQL database?

I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)

Spanish characters in SQL select

I'm working on a Spanish language website where some text is stored in a MS SQL 2008 database table.
The text is stored in the db table with characters such as á, í and ñ.
When I retrieve the data, the characters don't display on the page.
This is probably a very simple fix but please educate me.
You must use Unicode instead of ANSI strings and functions, and must choose a web page encoding that has the required character set. Some searches on those terms will yield all you need. Look up content type 1252 and 8859 as well in case you get stuck (examples, not answers).

Escape characters in MySQL, in Ruby

I have a couple escaped characters in user-entered fields that I can't figure out.
I know they are the "smart" single and double quotes, but I don't know how to search for them in mysql.
The characters in ruby, when output from Ruby look like \222, \223, \224 etc
irb> "\222".length => 1
So - do you know how to search for these in mysql? When I look in mysql, they look like '?'.
I'd like to find all records that have this character in the text field. I tried
mysql> select id from table where field LIKE '%\222%'
but that did not work.
Some more information - after doing a mysqldump, this is how one of the characters is represented - '\\xE2\\x80\\x99'. It's the smart single quote.
Ultimately, I'm building an RTF file and the characters are coming out completely wrong, so I'm trying to replace them with 'dumb' quotes for now. I was able to do a gsub(/\222\, "'").
Thanks.
I don't quite understand your problem but here is some info for you:
First, there are no escaped characters in the database. Because every character being stored as is, with no escaping.
they don't "look ilke ?". I's just wrong terminal settings. SET NAMES query always should be executed first, to match client encoding.
you have to determine character set and use it on every stage - in the database, in the mysql client, in ruby.
you should distinguish ruby strings representation from character itself.
To enter character in the mysql query, you can use char function. But in terminal only. In ruby just use the character itself.
smart quotes looks like 2-byte encoded in the unicode. You have to determine your encoding first.

REmove invalid Characters in Unicode in SQL SERVER 2008

I want to Remove Invalid unicode Characters from a field in sql server.
How to achieve that?
What exactly do you count as "invalid" characters?
What is the data-type of the field (char/varchar) or (nchar/nvarchar)
What you may find is that this is an instance where a SQLCLR function to take a SqlString as an input and return a SqlString.
You can then use the more powerful .NET String-manipulation, Encoding, and Globalization features.