MySQL MD5 LONGTEXT with Binary Data - mysql

Is there a way to get the md5 hash of a binary value that's stored in a MySQL LONGTEXT field?
Example:
CREATE TABLE table_name (
data_path VARCHAR(255) NOT NULL;
data_column LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci,
PRIMARY KEY (path)
)
Then in application code (example PHP):
function insert($path) {
// ...
$raw_file_data = file_get_contents($path);
$stmt = $dbh->prepare(
"INSERT INTO REGISTRY (data_path, data_column) VALUES (:path, :data)"
);
$stmt->bindParam('path', $path);
$stmt->bindParam('data', $raw_file_data);
$stmt->execute();
// ...
}
insert('/path/to/binary_file.jpg');
insert('/path/to/text_file.text');
Later, we query the md5 hashes of the inserted rows:
SELECT md5(data_column) WHERE data_path = '/path/to/binary_file.jpg';
SELECT md5(data_column) WHERE data_path = '/path/to/text_file.txt';
However hashes from MySQL do not match the md5sum of the actual files for any non-plaintext files.
md5sum /path/to/binary_file.jpg # does not match!
md5sum /path/to/text_file.txt # matches!
As far as I understand this has to do with the way MySQL will encode the data for the column's character set.
I also understand this field should be a binary field (BLOB, LONGBLOG, etc.) but this is in a legacy system which uses the same table to store binary and text files and depends on being able to search those text files.
My question is: Is there a way to get the md5 hash of the binary value of what is stored in the data_column?

Related

inner join two datasets but return nothing without any error (date format issue)?

I'm new to SQL, currently I'm doing a task about join two datasets, one of the dataset was created by myself, here's the query I used:
USE `abcde`;
CREATE TABLE `test_01`(
`ID` varchar(50) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`NUMBER01` bigint(20) NOT NULL DEFAULT '0',
`NUMBER02` bigint(20) NOT NULL,
`date01` date DEFAULT NULL,
PRIMARY KEY (`ID`, `date01`))
Then I load the data from a csv file to this table, the csv file looks like this:
ID NUMBER01 NUMBER02 DATE01
aaa=ee 12345678 235896578 **2009-01-01T00:00:00**
If I query this newly-created table, it looks like this(the format of the 'DATE01' changes):
ID NUMBER01 NUMBER02 DATE01
aaa=ee 12345678 235896578 **2009-01-01**
Another dataset, I queried and exported to a csv file, the format of the date01 column is like 01/12/1979 and in SQL the format looks like 1979-12-01.
I also usedselect * from information_schema.columns to check the datatype of the columns I need to join, for the newly-created dataset:
The date column for another dataset is:
The differences are:
1. The format of the date column in csv appears different
2. The COLUMN_DEFAULT are different, one is 0000-00-00, another one is NULL.
I wonder the reason why I got empty output is probably because the difference in the 'date' format, but I'm not sure how to make them the same so that I can get something in the output, can someone gave me some hint? Thank you.
the format of the 'DATE01' changes
Of course, DATE datatype does not contain timezone info/component.
I wonder the reason why I got empty output is probably because the difference in the 'date' format
If input value have some disadvantage (like wrong data format) than according value is truncated or is set to NULL. See - you must obtain a bunch of warnings during the importing similar to "truncate incorrect value".
If the date field in CSV have wrong format then you must use intermediate user-defined variable for accepting raw value, and apply proper converting expression to it in SET clause. Like
LOAD DATA INFILE ...
INTO TABLE tablename (field1, ..., #date01)
SET date01 = STR_TO_DATE(#date01, '%d/%m/%Y');

How to store a .docx file into MySQL - and open it?

MySQL
CREATE TABLE document_control (
id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
person VARCHAR(40),
dateSent TIMESTAMP,
fileAttachment MEDIUMBLOB
);
MySQL Insert record query
INSERT INTO DOCUMENT_CONTROL (fileattachment) values (load_file('C:\Users\<user>\Desktop\test.docx'));
Retrieving record
If I run this query here: SELECT * FROM document_control - Everything is null - even after the insert query above.
Question
Why is the values null? and also.. how can I properly store a .docx file into MySQL and open the file?
You need to look into SQL blob data type
You could also read the file as bytes, convert it into a string or base64 encoding or something, and then save that as string in database.
You could also choose to save the file-reference (file path of file) to refer to it.

Why is AES_DECRYPT returning null?

I've found similar questions, but no clear answer for this question. I have this table:
CREATE DATABASE testDB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE testTable
(
firstName binary(32) not null,
lastName binary(32) not null
/* Other non-binary fields omitted */
)
engine=INNODB DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
This statement executes just fine:
INSERT INTO testTable (firstName) VALUES (AES_ENCRYPT('Testname', 'test'));
But, this returns NULL:
SELECT AES_DECRYPT(firstName, 'test') FROM testTable;
Why does this return NULL?
Fwiw, this returns "testValue" as expected:
SELECT AES_DECRYPT(AES_ENCRYPT('testValue','thekey'), 'thekey');
The answer is that the columns are binary when they should be varbinary. This article explains it:
Because if AES_DECRYPT() detects invalid data or incorrect
padding, it will return NULL.
With binary column types being fixed length, the length of the input value must be known to ensure correct padding. For unknown length values, use varbinary to avoid issues with incorrect padding resulting from differing value lengths.
When you insert binary data into a VARCHAR field there are some binary characters that a VARCHAR can't handle and they will mess up in the inserted value. And then the inserted value will not be the same when you retrieve it.
1.select hex(aes_encrypt(file,'key'));
2.select aes_decrypt(unhex(file),'key');
Check if type of your field is blob instead binary(32)
Did you try different values other than 'Testname'?
Do other values work?
I ask because I had a situation while testing 2 test credit card numbers where one decrypted fine and the other returned null.
The answer was to hex and unhex as suggested by "abhinai raj"

String compare exact in query MySQL

I created table like that in MySQL:
DROP TABLE IF EXISTS `barcode`;
CREATE TABLE `barcode` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(40) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `barcode` VALUES ('1', 'abc');
INSERT INTO `barcode` VALUES ('2', 'abc ');
Then I query data from table barcode:
SELECT * FROM barcode WHERE `code` = 'abc ';
The result is:
+-----+-------+
| id | code |
+-----+-------+
| 1 | abc |
+-----+-------+
| 2 | abc |
+-----+-------+
But I want the result set is only 1 record. I workaround with:
SELECT * FROM barcode WHERE `code` = binary 'abc ';
The result is 1 record. But I'm using NHibernate with MySQL for generating query from mapping table. So that how to resolve this case?
There is no other fix for it. Either you specify a single comparison as being binary or you set the whole database connection to binary. (doing SET NAMES binary, which may have other side effects!)
Basically, that 'lazy' comparison is a feature of MySQL which is hard coded. To disable it (on demand!), you can use a binary compare, what you apparently already do. This is not a 'workaround' but the real fix.
from the MySQL Manual:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces
Of course there are plenty of other possiblities to achieve the same result from a user's perspective, i.e.:
WHERE field = 'abc ' AND CHAR_LENGTH(field) = CHAR_LENGTH('abc ')
WHERE field REGEXP 'abc[[:space:]]'
The problem with these is that they effectively disable fast index lookups, so your query always results in a full table scan. With huge datasets that makes a big difference.
Again: PADSPACE is default for MySQLs [VAR]CHAR comparison. You can (and should) disable it by using BINARY. This is the indended way of doing this.
You can try with a regular expression matching :
SELECT * FROM barcode WHERE `code` REGEXP 'abc[[:space:]]'
i was just working on case just like that when using LIKE with wildcard (%) resulting in an unexpected result. While searching i also found STRCMP(text1, text2) under string comparison feature of mysql which compares two string. however using BINARY with LIKE solved the problem for me.
SELECT * FROM barcode WHERE `code` LIKE BINARY 'abc ';
You could do this:
SELECT * FROM barcode WHERE `code` = 'abc '
AND CHAR_LENGTH(`code`)=CHAR_LENGTH('abc ');
I am assuming you only want one result, you could use LIMIT
SELECT * FROM barcode WHERE `code` = 'abc ' LIMIT 1;
To do exact string matching you could use Collation
SELECT *
FROM barcode
WHERE code COLLATE utf8_bin = 'abc';
The sentence right after the one quoted by Kaii basically says "use LIKE" :
β€œComparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant
and the example below shows that 'Monty' = 'Monty ' is true, but not 'Monty' LIKE 'Monty '.
However, if you use LIKE, beware of literal strings containing the '%', '_' or '\' characters : '%' and '_' are wildcard characters, '\' is used to escape sequences.

Importing data from geonames.org database into MySQL DB

Does anyone how to import a geonames.org data into my database? The one i'm trying to import is http://download.geonames.org/export/dump/DO.zip, and my DB its a MySQL db.
I found the following by looking in the readme file included in the zip file you linked to in the section called "The main 'GeoName' table has the following fields:"
First create the database and table on your MySQL instance. The type of fields are given in each row of the section I just quoted the title of above.
CREATE DATABASE DO_test;
CREATE TABLE `DO_test`.`DO_table` (
`geonameid` INT,
`name` varchar(200),
`asciiname` varchar(200),
`alternatenames` varchar(5000),
`latitude` DECIMAL(10,7),
`longitude` DECIMAL(10,7),
`feature class` char(1),
`feature code` varchar(10),
`country code` char(2),
`cc2` char(60),
`admin1 code` varchar(20),
`admin2 code` varchar(80),
`admin3 code` varchar(20),
`admin4 code` varchar(20),
`population` bigint,
`elevation` INT,
`gtopo30` INT,
`timezone` varchar(100),
`modification date` date
)
CHARACTER SET utf8;
After the table is created you can import the data from the file. The fields are delimited by tabs, rows as newlines:
LOAD DATA INFILE '/path/to/your/file/DO.txt' INTO TABLE `DO_test`.`DO_table`;
I have made recently a shell script that downloads the latest data from geonames site and imports them into a MySQL database. It is based on the knowledge at GeoNames Forum and saved me a lot of time.
It is in its first version but is fully functional. Maybe it can help.
You can access it at http://codigofuerte.github.com/GeoNames-MySQL-DataImport/
For every one in the future :
On geonames.org forum in the year 2008, this is "import all geonames dump into MySQL"
http://forum.geonames.org/gforum/posts/list/732.page
Also google this : import dump into [postgresql OR SQL server OR MySQL] site:forum.geonames.org
To find more answers even from the year 2006
Edited to provide a synopsis:
In the geoname official read me : http://download.geonames.org/export/dump/. We will find a good description about the dump files and contents of them.
Dump files will be imported to the MySQL datatable directly. for example :
SET character_set_database=utf8;
LOAD DATA INFILE '/home/data/countryInfo.txt' INTO TABLE _geo_countries IGNORE 51 LINES(ISO2,ISO3,ISO_Numeric,FIPSCode,AsciiName,Capital,Area_SqKm,Population,ContinentCode,TLD,CurrencyCode,CurrencyName,PhoneCodes,PostalCodeFormats,PostalCodeRegex,Languages,GeonameID,Neighbours,EquivalentFIPSCodes);
SET character_set_database=default;
be careful about the characterset because if we use the CSV LOAD DATA ready importer of an old phpmyadmin of 2012 we may lose the utf characters even if the collation of columns was set to utf8_general_ci
Currently there are 4 essential datatables : continents, countries(countryInfo.txt), divisions(admin1), cities or locations(geonames)
admin1, 2, 3, 4 dump files are the different levels of internal divisions of countries such as admin 1 which is the states of US or provinces of other countries. admin 2 is more detailed and is the internal divisions of the state or the province. and so on for the 3 and 4
The countries dump files have been listed there contain not only cities but all the locatoins in that country even including a store center. Also there is a huge file as "allCountries.txt" will be more than 1GB after extracting from zip file. If we want only the cities we should choose one of the dump files : cities1000.txt , cities5000.txt , cities15000.txt which the numbers represent the min population of the listed cities. We store cities in the geonames datatable(you may call it geo locations or geo cities).
Before importing *.txt dump files take a few research about the LOAD DATA syntax in the MySQL documentation.
The read me text file(also in the footer of dump page) provides enough description for example :
The main 'geoname' table has the following fields :
---------------------------------------------------
geonameid : integer id of record in geonames database
name : name of geographical point (utf8) varchar(200)
asciiname : name of geographical point in plain ascii characters, varchar(200)
alternatenames : alternatenames, comma separated varchar(5000)
latitude : latitude in decimal degrees (wgs84)
longitude : longitude in decimal degrees (wgs84)
feature class : see http://www.geonames.org/export/codes.html, char(1)
feature code : see http://www.geonames.org/export/codes.html, varchar(10)
country code : ISO-3166 2-letter country code, 2 characters
cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 60 characters
admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)
admin3 code : code for third level administrative division, varchar(20)
admin4 code : code for fourth level administrative division, varchar(20)
population : bigint (8 byte int)
elevation : in meters, integer
dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone : the timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format
Also about the varchar(5000) we should know about the 64kb size of each row in MySQL 5.0 or later:
Is a VARCHAR(20000) valid in MySQL?
This is my note after I imported successfully.
As of writing I was testing with MySQL 5.7.16 on Windows 7. Follow these steps to import:
Download desired data file from the official download page. In my case I chose cities1000.zip because it's much smaller in size (21MB) than the all-inclusive allcountries.zip (1.4GB).
Create the following schema and table according to readme.txt on the download page, where the fields are specified below the text "the main 'geoname' table has the following fields".
CREATE SCHEMA geonames DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE TABLE geonames.cities1000 (
id INT,
name VARCHAR(200),
ascii_name VARCHAR(200),
alternate_names VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
latitude DECIMAL(10, 7),
longitude DECIMAL(10, 7),
feature_class CHAR(1),
feature_code VARCHAR(10),
country_code CHAR(2),
cc2 CHAR(60),
admin1_code VARCHAR(20),
admin2_code VARCHAR(80),
admin3_code VARCHAR(20),
admin4_code VARCHAR(20),
population BIGINT,
elevation INT,
dem INT,
timezone VARCHAR(100),
modification_date DATE
)
CHARACTER SET utf8;
Field names are arbitrary as long as the column size and field types are the same as specified. alternate_names are specially defined with the character set utf8mb4 because the values for this column in the file contain 4-byte unicode characters which are not supported by the character set utf8 of MySQL.
Check the values of these parameters: character_set_client, character_set_results, character_set_connection. 7
SHOW VARIABLES LIKE '%char%';
If they are not utf8mb4, then change them:
SET character_set_client = utf8mb4;
SET character_set_results = utf8mb4;
SET character_set_connection = utf8mb4;
Import data from file using LOAD DATA INFILE ...
USE geonames;
LOAD DATA INFILE 'C:\\ProgramData\\MySQL\\MySQL Server 5.7\\Uploads\\cities1000.txt' INTO TABLE cities1000
CHARACTER SET utf8mb4 (id, name, ascii_name, alternate_names, latitude, longitude, feature_class, feature_code,
country_code, cc2, admin1_code, admin2_code, admin3_code, admin4_code, population, #val1,
#val2, timezone, modification_date)
SET elevation = if(#val1 = '', NULL, #val1), dem = if(#val2 = '', NULL, #val2);
Explanation for the statement:
The file should be placed under a designated location by MySQL for importing data from files. You can check the location with SHOW VARIABLES LIKE 'secure_file_priv';. In my case it's C:\ProgramData\MySQL\MySQL Server 5.7\Uploads. In Windows you need to use double slashes to represent one slash in the path. This error would be shown when the path is not given correctly: [HY000][1290] The MySQL server is running with the --secure-file-priv option so it cannot execute this statement.
With CHARACTER SET utf8mb4 you're telling MySQL what encoding to expect from the file. When this is not given explicitly, or the column encoding is not utf8mb4, an error prompt like this will be seen: [HY000][1300] Invalid utf8 character string: 'Gorad Safija,SOF,Serdica,Sofi,Sofia,Sofiae,Sofie,Sofii,Sofij,Sof'. 5 In my case I found it's due to the existence of Gothic letters in the alternate names, such as πƒπ‰π†πŒΉπŒ° (id 727011), πŒΊπŒΏπ‚πŒΉπ„πŒΉπŒ±πŒ° (id 3464975), and πŒΊπ‰πŒ½πŒΈπŒ΄π€πŒΈπŒΉπ‰πŒ½ (id 3893894). These letters need to be stored as 4-byte characters (utf8mb4) while my then encoding was utf8 which only supports up to 3-byte characters. 6 You can change column encoding after the table is created:
ALTER TABLE cities1000 MODIFY alternate_names VARCHAR(10000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
To check the encoding of a column:
SELECT character_set_name, COLLATION_NAME FROM information_schema.COLUMNS WHERE table_schema = 'geonames' AND table_name = 'cities1000' AND column_name = 'alternate_names';
To test if the characters can be stored:
UPDATE cities1000 SET alternate_names = 'πƒπ‰π†πŒΉπŒ°' WHERE id = 1;
Values for some columns need to be "improved" before they are inserted, such as elevation and dem. They are of type INT and values for them from the file could be empty strings, which can't be stored by an INT type column. So you need to convert those empty strings to null for those columns. The latter part of the statement just serves this purpose. This error would be shown when the values are not property converted first: [HY000][1366] Incorrect integer value: '' for column 'elevation' at row 1. 3, 4
References
http://www.geonames.org/
http://download.geonames.org/export/dump/
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
https://dba.stackexchange.com/a/111044/94778
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
https://stackoverflow.com/a/35156926/4357087
https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html