RODBC string getting truncated - mysql

I am fetching data from MySql Server into R using RODBC.
So in one column of the database is a character vector
SELECT MAX(CHAR_LENGTH(column)) FROM reqtable;
RETURNS 26566
Now I will show you an example how I am running into the problem
`library(RODBC)
con <- odbcConnect("mysqlcon")
rslts <- as.numeric(sqlQuery(con,
"SELECT CHAR_LENGTH(column) FROM reqtable LIMIT 10",
as.is=TRUE)[,1])
`
returns
> rslts
[1] 62 31 17 103 30 741 28 73 25 357
where as
rslts <- nchar(as.character(sqlQuery(con,
"SELECT column FROM reqtable LIMIT 10",
as.is=TRUE)[,1]))
returns
> rslts
[1] 62 31 17 103 30 255 28 73 25 255
So strings with length > 255 is getting truncated at 255. Is there a way I can get the full string.
Thanks

The PostgreSQL ODBC driver has a variable called MaxLongVarcharSize that I have found set to 8190 by default (I've used it both on Windows and Ubuntu). It is possible that the MySQL ODBC driver has a similar variable set to 255.

You could try to use another db driver such as JDBC. In my experience this has sometimes solved the problem.
Also, try the RMySQL package (current binaries need to be compiled. if you do compile them yourself, request you to please share with the community)
Probably the source of the RODBC package "could" provide insights into the default length limitations if any. (I haven't looked at it yet, but I will soon and post an update here)

Another possibility why the retrieved number of characters might be limited is a 'sanity' check restriction to 65535 bytes in the RODBC package itself -- as mentioned here.

Related

MySQL CONCAT() Returns Unreadable Text

I'm having some problems with the MySQL CONCAT() function. I ran the following:
SELECT CONCAT(now(),now())
And this is what I got back:
323031342d30352d30352031343a33393a3535323031342d30352d30352031343a33393a3535
Not sure what exactly what is going on? Has anyone seen this before? This happens when concatenating anything (columns, strings, mysql functions like now())
My server version is 5.1.63 - SUSE MySQL RPM" and the client version is libmysql - mysqlnd 5.0.7-dev - 091210 - $Revision: 304625 $
Looks like a hexadecimal representation of printable ASCII characters:
hex: 32 30 31 34 2d 30 35 2d 30 35 20 31 34 3a 33 39 3a 35 35
char: 2 0 1 4 - 0 5 - 0 5 1 4 : 3 9 : 5 5
I can't explain why the client is displaying character data as hexadecimal; I'd investigate the possibility that there's a mismatch in character set encoding.
Possibly the MySQL client library is using latin1, but the application is using a different encoding; but we'd expect this would affect all character expressions, not just CONCAT() expressions.
Actually, it's more likely the client is displaying hexadecimal for binary strings, and the value returned from CONCAT() is being reported as a binary string.
Here is an excerpt from MySQL 5.1 documentation for CONCAT() function:
Returns the string that results from concatenating the arguments. May
have one or more arguments. If all arguments are nonbinary strings,
the result is a nonbinary string. If the arguments include any binary
strings, the result is a binary string. A numeric argument is
converted to its equivalent binary string form; if you want to avoid
that, you can use an explicit type cast, as in this example:
SELECT CONCAT(CAST(int_col AS CHAR), char_col);
So, the workaround might be to CAST the value of NOW() as character, either using a CAST or possibly using the DATE_FORMAT function, e.g.
CONCAT(DATE_FORMAT(NOW(),'%Y-%m-%d %h:%i:%s'),DATE_FORMAT(NOW(),'%Y-%m-%d %h:%i:%s'))

W-apriori in Rapidminer

I need to create association rules using apriori algorithm in Rapidminer, but I can't seem to make it work. I'm using the 5.3.1 weka extension.
I've already created the association rules using built-in FP-Growth and Create Associations operators, and it worked as expected. This is how the process looks like:
Because all my attributes are already of binomial type I could use the FP-Growth directly. But if i use the same approach for apriori (confidence=0.1, support=0.1):
As a result I'm not getting what I was looking for:
Minimum support: 0.1 (26 instances)
Minimum metric <confidence>: 0.1
Number of cycles performed: 18
(...)
Best rules found:
1. A=FALSE 53 ==> E=FALSE 26 conf:(0.49)
2. H=FALSE 74 ==> E=FALSE 30 conf:(0.41)
3. E=FALSE 75 ==> H=FALSE 30 conf:(0.4)
4. C=FALSE 68 ==> E=FALSE 27 conf:(0.4)
5. D=FALSE 67 ==> H=FALSE 26 conf:(0.39)
6. E=FALSE 75 ==> C=FALSE 27 conf:(0.36)
7. H=FALSE 74 ==> D=FALSE 26 conf:(0.35)
8. E=FALSE 75 ==> A=FALSE 26 conf:(0.35)
When you try to run the algorithm w - apriori in RapidMiner, your data set on which you are making the process must not contain numeric attributes.
A solution would be as follows:
Add this operator to your process. After you load the data:
Data Transformation > Type Conversion > Numerical to Polynomial
On the operator, select
attribute type filter = single
name of your attribute
Here's a pictorial example of what I mean:

is hex-encoding standard in MySQL

I was just asking myself if this is standard, because I was setting a column to Type "Char 40" to store a SHA1 value. Is this true? or do I have to pay more attention when I do this in case I work with other then my own mysql database.
Thanks
EDIT
the best possible answer is, that SHA1 just works that way. I thought it was returning 160 bits and some other config setting converted it into a 40 char string, but it always returns that 40 digit string. see doc
SHA1 returns 40 characters, yes.

Does MySQL DECIMAL datatype force decimal points for integers?

Bit of an unusual question, but I have setup a field inside a MySQL table that is of the datatype "DECIMAL(5,2)". As far as I understand this, what I have done is to only allow numbers from -999.99 up to 999.99 to be inserted into this field.
However, when I insert an integer to the value of 26 (which is valid) I am shown this inside the database as 26 not 26.00 - is this normal MySQL behaviour? I (perhaps naively) thought that because I have set the scale to 2, my numbers would always be shown with 2 decimal places?
My question is - do integers inside MySQL "DECIMAL" datatypes always display without any decimals places? Or is this my database manager tool formatting 26.00 to 26 for me?
This may seem like a bit of a weird question but I am still trying to get my head around MySQL DECIMALs. Thanks.
I have found the answer out, it seems my MySQL database manager is helpfully (or not so helpfully) hiding the decimals from me. For anyone who finds this vaguely useful I am using EMS SQL Manager 2010 on Windows 7. Credit due to Cylindric who prompted me to check my tools!
In answer to my original question - yes, even integers inside a "DECIMAL" datatype will display the decimal places. For example:
DECIMAL(3,1)
42 = 42.0
531.2 = 99.9
5 = 5.0
Hope this helps someone somewhere!

AES Encryption in Oracle and MySQL are giving different results

I am in need to compare data between an Oracle database and a MySQL database.
In Oracle, the data is first encrypted with the AES-128 algorithm, and then hashed. Which means it is not possible to recover the data and decrypt it.
The same data is available in MySQL, and in plain text. So to compare the data, I tried encrypting and then hashing the MySQL data while following the same steps done in Oracle.
After lots of tries, I finally found out that the aes_encrypt in MySQL returns different results than the one in Oracle.
-- ORACLE:
-- First the key is hashed with md5 to make it a 128bit key:
raw_key := DBMS_CRYPTO.Hash (UTL_I18N.STRING_TO_RAW ('test_key', 'AL32UTF8'), DBMS_CRYPTO.HASH_MD5);
-- Initialize the encrypted result
encryption_type:= DBMS_CRYPTO.ENCRYPT_AES128 + DBMS_CRYPTO.CHAIN_CBC + DBMS_CRYPTO.PAD_PKCS5;
-- Then the data is being encrypted with AES:
encrypted_result := DBMS_CRYPTO.ENCRYPT(UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'), encryption_type, raw_key);
The result for the oracle code will be: 8FCA326C25C8908446D28884394F2E22
-- MySQL
-- While doing the same with MySQL, I have tried the following:
SELECT hex(aes_encrypt('test-data', MD5('test_key'));
The result for the MySQL code will be: DC7ACAC07F04BBE0ECEC6B6934CF79FE
Am I missing something? Or are the encryption methods between different languages not the same?
UPDATE:
According to the comments below, I believe I should mention the fact that the result of DBMS_CRYPTO.Hash in Oracle is the same as the result returned by the MD5 function in MySQL.
Also using CBC or CBE in Oracle gives the same result, since the IV isn't being passed to the function, thus the default value of the IV is used which is NULL
BOUNTY:
If someone can verify my last comment, and the fact that if using same padding on both sides, will yield same results gets the bounty:
#rossum The default padding in MySQL is PKCS7, mmm... Oh.. In Oracle
it's using PKCS5, can't believe I didn't notice that. Thanks. (Btw
Oracle doesn't have the PAD_PKCS7 option, not in 11g at least)
MySQL's MD5 function returns a string of 32 hexadecimal characters. It's marked as a binary string but it isn't the 16 byte binary data one would expect.
So to fix it, this string must be converted back to the binary data:
SELECT hex(aes_encrypt('test-data', unhex(MD5('test_key'))));
The result is:
8FCA326C25C8908446D28884394F2E22
It's again a string of 32 hexadecimal characters. But otherwise it's the same result as with Oracle.
And BTW:
MySQL uses PKCS7 padding.
PKCS5 padding and PKCS7 padding are one and the same. So the Oracle padding option is correct.
MySQL uses ECB block cipher mode. So you'll have to adapt the code accordingly. (It doesn't make any difference for the first 16 bytes.)
MySQL uses no initialization vector (the same as your Oracle code).
MySQL uses a non-standard folding a keys. So to achieve the same result in MySQL and Oracle (or .NET or Java), only use keys that are 16 byte long.
Just would like to give the complete solution for dummies based on #Codo's very didactic answer.
EDIT:
For being exact in general cases, I found this:
- "PKCS#5 padding is a subset of PKCS#7 padding for 8 byte block sizes".
So strictly PKCS5 can't be applied to AES; they mean PKCS7 but use their
names interchangeably.
About PKCS5 and PKCS7
/* MySQL uses a non-standard folding a key.
* So to achieve the same result in MySQL and Oracle (or .NET or Java),
only use keys that are 16 bytes long (32 hexadecimal symbols) = 128 bits
AES encryption, the MySQL AES_encrypt default one.
*
* This means MySQL admits any key length between 16 and 32 bytes
for 128 bits AES encryption, but it's not allowed by the standard
AES to use a non-16 bytes key, so do not use it as you won't be able
to use the standard AES decrypt in other platform for keys with more
than 16 bytes, and would be obligued to program the MySQL folding of
the key in that other platform, with the XOR stuff, etc.
(it's already out there but why doing weird non-standard things thay
may change when MySQL decide, etc.).
Moreover, I think they say the algorithm chosen by MySQL for those
cases is a really bad choose on a security level...
*/
-- ### ORACLE:
-- First the key is hashed with md5 to make it a 128 bit key (16 bytes, 32 hex symbols):
raw_key := DBMS_CRYPTO.Hash (UTL_I18N.STRING_TO_RAW ('test_key', 'AL32UTF8'), DBMS_CRYPTO.HASH_MD5);
-- MySQL uses AL32UTF8, at least by default
-- Configure the encryption parameters:
encryption_type:= DBMS_CRYPTO.ENCRYPT_AES128 + DBMS_CRYPTO.CHAIN_ECB + DBMS_CRYPTO.PAD_PKCS5;
-- Strictly speaking, it's really PKCS7.
/* And I choose ECB for being faster if applied and
#Codo said it's the correct one, but as standard (Oracle) AES128 will only accept
16 bytes keys, CBC also works, as I believe they are not applied to a 16 bytes key.
Could someone confirm this? */
-- Then the data is encrypted with AES:
encrypted_result := DBMS_CRYPTO.ENCRYPT(UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'), encryption_type, raw_key);
-- The result is binary (varbinary, blob).
-- One can use RAWTOHEX() for if you want to represent it in hex characters.
In case you use directly the 16 bytes hashed passphrase in hex characters representation or 32 hex random chars:
raw_key := HEXTORAW(32_hex_key)
encryption_type := 6 + 768 + 4096 -- (same as above in numbers; see Oracle Docum.)
raw_data := UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8')
encrypted_result := DBMS_CRYPTO.ENCRYPT( raw_data, encryption_type, raw_key )
-- ORACLE Decryption:
decrypted_result := UTL_I18N.RAW_TO_CHAR( CRYPTO.DECRYPT( raw_data, encryption_type, raw_key ), 'AL32UTF8' )
-- In SQL:
SELECT
UTL_I18N.RAW_TO_CHAR(
DBMS_CRYPTO.DECRYPT(
UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'),
6 + 768 + 4096,
HEXTORAW(32_hex_key)
) , 'AL32UTF8') as "decrypted"
FROM DUAL;
-- ### MySQL decryption:
-- MySQL's MD5 function returns a string of 32 hexadecimal characters (=16 bytes=128 bits).
-- It's marked as a binary string but it isn't the 16 bytes binary data one would expect.
-- NOTE: Note that the kind of return of MD5, SHA1, etc functions changed in some versions since 5.3.x. See MySQL 5.7 manual.
-- So to fix it, this string must be converted back from hex to binary data with unHex():
SELECT hex(aes_encrypt('test-data', unhex(MD5('test_key')));
P.S.:
I would recommend to read the improved explanation in MySQL 5.7 Manual, which moreover now allows a lot more configuration.
MySQL AES_ENCRYPT improved explanation from v5.7 manual
Could be CBC vs ECB. Comment at the bottom of this page: http://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html says mysql function uses ECB