I'm trying to write a CREATE TABLE statement for Microsoft Access (to be executed via a C# / .NET app using an OleDbConnection), utilizing the WITH COMPRESSION attribute to cause character columns (TEXT) to be created using single-byte characters rather than Unicode double-byte characters, as documented on MSDN here.
The WITH COMPRESSION attribute can be used only with the CHARACTER and MEMO (also known as TEXT) data types and their synonyms.
The WITH COMPRESSION attribute was added for CHARACTER columns because of the change to the Unicode character representation format. Unicode characters uniformly require two bytes for each character. For existing Microsoft® Jet databases that contain predominately character data, this could mean that the database file would nearly double in size when converted to the Microsoft Access database engine format. However, Unicode representation of many character sets, those formerly denoted as Single-Byte Character Sets (SBCS) can easily be compressed to a single byte. If you define a CHARACTER column with this attribute, data will automatically be compressed as it is stored and uncompressed when retrieved from the column.
When I try to execute the following statement (which I believe to be syntactically correct per MSDN) via an OleDbConnection, I get a syntax error.
CREATE TABLE [Foo] ([COL1] TEXT(255) WITH COMPRESSION)
Likewise, executing the same statement directly within MS Access 2013 as a query gives a syntax error at WITH.
Executing
CurrentProject.Connection.Execute("CREATE TABLE [Foo1] ([COL1] TEXT(255) WITH COMPRESSION)")
from Access VBA does work, however.
If I take out the WITH COMPRESSION attribute, the statement executes without error both via OleDb and directly in MS Access.
Any ideas what I'm doing wrong?
My problem turned out to be a syntax error that wasn't reflected properly in my original question.
However, solving that problem revealed that the documentation for MS Access CREATE TABLE on MSDN https://msdn.microsoft.com/en-us/library/office/ff837200.aspx is incorrect regarding the sequence of attributes for the CREATE TABLE statement. According to the documentation, the syntax is:
CREATE [TEMPORARY] TABLE table (field1 type [(size)] [NOT NULL] [WITH COMPRESSION | WITH COMP] [index1] [, field2 type [(size)] [NOT NULL] [index2] [, …]] [, CONSTRAINT multifieldindex [, …]])
but in fact, [WITH COMPRESSION | WITH COMP] must appear before [NOT NULL] or you get a syntax error.
Additionally, it's not possible to execute the CREATE TABLE statement using the WITH COMPRESSION attribute from a query directly within MS Access. You have to either use VBA or (as in my case) an external program via OleDbConnection.
My experience with "WITH COMPRESSION" and MS-ACCESS 2013
Impossible to run such a script from query window.
Possible from VBA but with limitations:
currentdb.Execute "... WITH COMPRESSION" -> "Syntax error in CREATE
TABLE" CurrentProject.Connection.Execute " ..." - > Ok
I confirm what "Mr. T" says: WITH COMPRESSION must appear before NOT NULL
Related
I would like to know how to handle Japanese characters in a query to a Microsoft Access database. I am trying to use a query selecting variable names written in Japanese using the function odbcQuery from RODBC package in R.
I am working with Windows. My version of RStudio is 1.1.383, and my version of Access is 14.0.7015.1000 (32-bit).
I think R understands the Japanese characters in my query, but when I try to actually carry out the query I get the following error message:
> query <- "SELECT [LOA-FTD_1_5_1_CALCULATE_LOA_query].月日 FROM [LOA-FTD_1_5_1_CALCULATE_LOA_query]"
> sqlQuery(channel,query)
[1] "42000 -3100 [Microsoft][ODBC Microsoft Access Driver] Syntax error in query expression '[LOA-FTD_1_5_1_CALCULATE_LOA_query].<U+6708><U+65E5>'."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT [LOA-FTD_1_5_1_CALCULATE_LOA_query].<U+6708><U+65E5> FROM [LOA-FTD_1_5_1_CALCULATE_LOA_query]'"
Here, 月日 was converted into U+6708 and U+65E5 in the error message. These are the UTF-8 codes for the two characters, so I guess the string is sent encoded in UTF-8 to MS Access, but MS Access is then unable to read it? Is MS Access even part of the process of carrying out the query?
So it must be an encoding issue, where RStudio and MS Access do not understand each other. When I looked at similar issues with Japanese characters, the problem was usually to display values in a table. Here the variable names are in Japanese, so the query does not work at all.
I am quite lost, so I am open to any idea or remark.
Thank you.
I found an answer that works for me in this post.
The trick (at least in my case) was to set locale to Japanese_Japan.932 before any data importing.
Here is the code for this command:
Sys.setlocale("LC_ALL", locale = "Japanese_Japan.932")
Then I imported my data from Access without having to change encoding, and the Japanese characters are displayed correctly in the resulting data frame. Moreover, this allows Japanese characters in the query to be understood.
I have a MySQL table with a VARCHAR(100) column, using the utf8_general_ci collation.
I can see rows where this column contains arbitrary byte sequences (i.e. data that contains invalid UTF8 character sequences), but I can't figure out how to write an UPDATE or INSERT statement that allows this type of data to be entered.
For example, I've tried the following:
UPDATE DataTable SET Data = CAST(BINARY(X'16d7a4fca7442dda3ad93c9a726597e4') AS CHAR(100)) WHERE Id = 1;
But I get the error:
Incorrect string value: '\xFC\xA7D-\xDA:...' for column 'Data' at row 1
How can I write an INSERT or UPDATE statement that bypasses the destination column's collation, allowing me to insert arbitrary byte sequences?
Have you considered using one of the Blob data types instead of varchar? I believe that this'd take a lot of the pain away from your use-case.
EDIT: Alternatively, there is the HEX and UNHEX functions, which MySQL supports. Hex takes either a str or a numeric argument and returns the hexadecimal representation of your argument as a string. Unhex does the inverse; taking a hexadecimal string and returning a binary string.
The short answer is that it shouldn't be possible to insert values with invalid UTF8 characters into VARCHAR column declared to use UTF8 characterset.
That's the design goal of MySQL, to disallow invalid values. When there's an attempt to do that, MySQL will return either an error or a warning, or (more leniently?) silently truncate the supplied value at the first invalid character encountered.
The more usual variety of characterset issues are with MySQL performing a characterset conversion when a characterset conversion isn't required.
But the issue you are reporting is that invalid characters were inserted into a UTF8 column. It's as if a latin1 (ISO-8859) encoding was supplied, and a characterset conversion was required, but was not performed.
As far as working around that... I believe it was possible in earlier versions of MySQL. I believe it was possible to cast a value to BINARY, and then warp that in CONVERT( ... USING UTF8), and MySQL wouldn't perform a validation of the characterset. I don't know if that's still possible with the current MySQL Connectors.
If it is possible, then that's (IMO) a bug in the Connector.
The only way I can think of getting around that characterset check/validation would be to get the MySQL sever to trust the client, and determine that no check of the characterset is required. (That would also mean the MySQL server wouldn't be doing a characterset conversion, the client lying to the server, the client telling the server that it's supplying valid UTF8 characters.
Basically, the client would be telling the server "Hey server, I'm going to be sending UTF8 character encodings".
And the server says "Okay. I'll not do any characterset conversion then, since we match. And I'll just trust that what you send is valid UTF8".
And then the client mischievously chuckles to itself, "Heh, heh, I lied. I'm actually sending character encodings that aren't valid UTF8".
And I think it's much more likely to be able to achieve such mischief using prepared statements with the old school MySQL C API (mysql_stmt_prepare, mysql_stmt_execute), supplying nvalid UTF8 encodings as values for string bind parameters. (The onus is really on the client to supply valid values for bind parameters.)
You should base64 encode your value beforehand so you can generate a valid SQL with it:
UPDATE DataTable SET Data = from_base64('mybase64-encoded-representation-of-my-value') WHERE Id = 1;
These is one keyword confliction issue in the query module of my application,please see if you can tell me a smart solution.
First,In query module,each query condition contains three parts in UI:
1.field name,its value is fixed,e.g origin,finalDest...
2.operator,it is a select list which includes "like","not like","in","not in","=","!="
3.value,this part is input by user.then in back-end,it will assemble the SQL statement according to UI's query criteria,e.g if user type/select following stuff in UI
Field Name Operator Value
origin like CHI
finalDest in SEL
In back-end,it will generate following SQL:
select * from Booking where origin like '%CHI%' and finalDest in ('SEL').
But there is a bug,e.g if user type some of special symbol in "value",e.g "'","_" etc,it will lead to the generated SQL also contain ' or _ ,e.g:
select * from Booking where origin like '%C_HI%' and finalDest in ('S'EL').
you could see as there is special symbol in "where" block,the SQL can't be executed
For this problem,my solution is add escape character "/" in front of the special symbol before executing it,but what i know is just ' or _ that would conflict with the SQL keywords,do you know if there is any others similar symbol that i need to handle or do you guys have any better idea that can avoid the injection
Sorry,forgot told you what language i am using,i am using java,the DB is mysql,i also use hibernate,there are a lot of people said why i didn't use PreparedStatement,this is a little complex,simply speaking,in my company,we had a FW called dynamic query,we pre-defined the SQL fragment in a XML file,then we will assemble the SQL according to the UI pass in criteria with the jxel expression,as the SQL is kinda of pre-defined stuff,i afraid if change to use PreparedStatement,it will involve a lot of change for our FW,so what we care is just on how to fix the SQL injection issue with a simple way.
The code should begin attempting to stop SQL injection on the server side prior to sending any information to the database. I'm not sure what language you are using, but this is normally accomplished by creating a statement that contains bind variables of some sort. In Java, this is a PreparedStatement, other languages contains similar features.
Using bind variables or parameters in a statement will leverage built in protection against SQL injection, which honestly is going to be better than anything you or I write on the database. If your doing any String concatenation on the server side to form a complete SQL statement, this is an indicator of a SQL injection risk.
0 An ASCII NUL (0x00) character.
' A single quote (“'”) character.
" A double quote (“"”) character.
b A backspace character.
n A newline (linefeed) character.
r A carriage return character.
t A tab character.
Z ASCII 26 (Control+Z). See note following the table.
\ A backslash (“\”) character.
% A “%” character. See note following the table.
_ A “_” character. See note following the table
Reference
Stack Similar Question
You should use bind variables in your SQL statement. As already mentioned this is done with PreparedStatements in Java.
To make sure, only valid column names are used, you can validate the input against the database. MySQL provides schema information like columns of each table as part of the INFORMATION_SCHEMA. For further information, check the MySQL documentation:
"The INFORMATION_SCHEMA COLUMNS Table"
I have been tasked with migrating a Microsoft SQL Server 2005 database to MySQL 5.6 (these are both database servers runnig locally) and would really appreciate some help.
-MSSQL source database has latin1 collation (so has ISO 8859-1 character set right?) but doesn't have any char/varchar fields (any string field is nvarchar/nchar) so all this data should be using the UCS-2 character set.
-MySQL target database wants the character set UTF-8
I decided to use the database migration toolkit in the latest version of the MySQL workbench. at first it worked fine and migrated everything as expected. But I have been totally tripped up upon encountering UCS-2 surrogate pair characters in the MSSQL database.
The migration toolkit copytable program did not provide a very useful error message: "Error during charset conversion of wstring: No error". It also did not provide any field/row information on the problem-causing data and would fail within chunks of 100 rows. So after searching through the 100 rows after the last successful insert I found that the issue seemed to be caused by two UCS-2 characters in one of the nvarchar fields. They are listed as surrogate pairs in the UCS-2 character set. They were specifically the characters DBC0 and DC83 (I got this by looking at the binary data for the field and comparing byte pairs (little endian) with data that was being migrated successfully).
When this surrogate pair was removed from the MSSQL database the row was migrated successfully to MySQL.
Here is the problem:
I have tried to search for these characters in a test MSSQL table (this chartest table is just various test strings an nvarchar field) to prepare a replacement script and keep getting strange results... I must be doing something incorrectly.
Searching for
SELECT * FROM chartest WHERE text LIKE NCHAR(0xdc83)
Will return any surrogate pair character (whether or not it uses DC83), but obviously, only if it is the only character (or part of the pair) in that field. This isn't a big deal since I would like to remove any instance of these anyway (I dont like to remove data like this but I think we can afford it).
Searching for
SELECT * FROM chartest WHERE text LIKE '%' + (NCHAR(0xdc83))+ '%'
Will return every row! Regardless of whether it even has a unicode character present in the field let alone the DC83 character. Is there a better way to find and replace these characters? Or something else I should try?
I have also tried setting the target databse, table, and field character set to UCS-2 but it seems as though it does not make a difference.
I should also mention that this migration is using live data (~50GB database!) while one of the sites that feeds it is taken offline so any solutions to this need to have a quick running time...
I would appreciate any suggestions very much! Please let me know if there is any information I have left out.
I had this error, and now I have discovered the source of the problem. I had a hard time finding out, so maybe this will be useful to someone, even though I realize, my problem and workaround may not be spot on matching op's original trouble.
I am migrating data from MSSQL to MySQL, and the content being migrated is html-content from Sitecore CMS (target CMS is Drupal, btw).
I've found, that I get this error when converting the database and hitting records, that contain Instagram-embeds. Instagram-embeds work in the way, that the embedded post data is copied to the embed code (instead of being loaded async., et.c. - even the image is included as base64-css...), and the young people nowadays tend to put a lot of emoji's in their image-descriptions (using their iPhones with Emoji keyboard). Emoji's are represented by 4-byte encoded characters, but MySQL utf8 only allows for 3-byte encoded unicode characters.
My initial error from running wbcopytables.exe (which is the non-GUI way of doing Migration Wizard in MySQL Workbench) was the
Error during charset conversion of wstring: No error
but upgrading MySQL Workbench to recent version (from 5.something to 6.x) makes the error a bit more descriptive, hinting table and column (alas, not row):
ERROR: Could not successfully convert UCS-2 string to UTF-8 in table
[MyDatabase].[dbo].[MyTable] (column MyColumn).
Original string: ...
Anyway - a solution *could* be to use utf8mb4 which would allow for the emoji's. Read more here.
But it looks like, it's a bad idea to do this in e.g. my case with Drupal.
So - the solution I ended up with was simply to strip these characters in my migrate-script. There is no point in keeping these for users of the site in question, since they are being displayed as rectangles on the webpage anyway. Since you can't search-and-replace with regex in SQL Server, I processed the data using a DAL and c# .NET, and I found the help here (thanks a ton, Jon Skeet) - turns out there is a regex-pattern for matching one half of a surrogate pair in UTF-16. See below (and use the pattern in another language if needed).
var noUcs2SurrogatePairsString = Regex.Replace(stringWithUcs2SurrogatePairs, #"\p{Cs}", string.Empty);
I had a very similar problem today, and I found that it was caused by empty strings, replaced them with NULLs or a value representing no data and the migration worked fine.
I solved just editing the "import data script.cmd" where it reads columns "As NVARCHAR" by replacing those with "VARCHAR" only.
Note: My table columns was VARCHAR type already, so... for some stupid reason the migration script improperly cast it to UNICODE (NVARCHAR) type.
This issue has now been resolved. I used user Remus Rusanu's suggestion here for finding the rows with these surrogate pair characters using CHARINDEX and have decided to use SUBSTRING to exclude the troublesome characters like so:
UPDATE test
SET a = SUBSTRING(a, 1, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 - 1) -- string before the unwanted character
+ SUBSTRING(a, (CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000)))+1)/2 +1, LEN(a) ) -- string after the unwanted character
WHERE CHARINDEX(0x83dc, CAST(a AS VARBINARY(8000))) % 2 = 1 -- only odd numbered charindexes (to signify match at beginning of byte pair character)
I'm building a PHP/HTML front end to a MySQL database.
The table I'm attempting to work with defined with a column that is varchar(15). I can run (without error) an insert statement with a character string that is 20 characters long. The resulting record's column is truncated to 15 characters, but no error is generated.
How do I get this to generate an error?
I know that the interface can do the error checking, but I want to know how to get the database to reject the data as well.
MySQL's fairly forgiving and will try to gracefully accept anything you pass it as best you can, silently converting/truncating/nulling if need be.
Since you don't want that, you need to enable the various "strict" mode options: http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html