SELECT on varbinary column using string representation of binary search term - mysql

I am inserting IPs into varbinary column. Sequel PRO represents this as values as gibberish ( Ü×L%>¨€NóP). When I am manually searching, I'd like to use that gibberish to find the matching rows:
SELECT * FROM `IP_MAP` WHERE `ip` = BINARY(" Ü×L%>¨€NóP")
Such query does not return any rows, although I copy pasted the varbinary from Sequel PRO interface. What is the correct way to search varbinary columns when given string representation of the varbinay?
Sample table:
CREATE TABLE `IP_MAP` (
`id` bigint(11) unsigned NOT NULL,
`ip` varbinary(16) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ip` (`ip`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Use INET6_NTOA() and INET6_ATON() functions.
SELECT *
FROM `IP_MAP`
WHERE `ip` = INET6_ATON('48f3::d432:1431:ba23:846f')

Related

Improve order by json field performance on mysql

I have this table:
CREATE TABLE `mytable` (
`session_id` mediumint(8) UNSIGNED NOT NULL,
`data` json NOT NULL,
`jobname` varchar(100) COLLATE utf8_unicode_ci GENERATED ALWAYS AS
(json_unquote(json_extract(`data`,'$.jobname'))) VIRTUAL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY HASH (session_id)
PARTITIONS 10;
ALTER TABLE `mytable`
ADD KEY `session` (`session_id`),
ADD KEY `jobname` (`jobname`);
It has 2 million rows.
When execute this query, it takes around 23 sec to get the result.
SELECT JSON_EXTRACT(f.data, '$.jobdesc') AS jobdesc
FROM mytable f
WHERE f.session_id = 1
ORDER BY jobdesc DESC
I understand that it is slow because there is no index for jobdesc field.
On data's column, I have 12 fields. I want to let user to be able to sort all fields. If I add index for each field, is it good approach?
Is there any way to improve it?
I am using MYSQL 5.7.13.
You would have to create a virtual column with an index for each of your 12 fields, if you want the user to have the option of sorting by.
CREATE TABLE `mytable` (
`session_id` mediumint(8) UNSIGNED NOT NULL,
`data` json NOT NULL,
`jobname` varchar(100) AS (json_unquote(json_extract(`data`,'$.jobname'))),
`jobdesc` varchar(100) AS (json_unquote(json_extract(`data`,'$.jobdesc'))),
...other extracted virtual fields...
KEY (`jobname`),
KEY (`jobdesc`),
...other indexes on virtual columns...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY HASH (session_id)
PARTITIONS 10;
This makes me wonder: why bother using JSON? Why not just declare 12 conventional, non-virtual columns with indexes?
CREATE TABLE `mytable` (
`session_id` mediumint(8) UNSIGNED NOT NULL,
...no `data` column needed...
`jobname` varchar(100),
`jobdesc` varchar(100),
...
KEY (`jobname`),
KEY (`jobdesc`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
PARTITION BY HASH (session_id)
PARTITIONS 10;
JSON is best when you treat it as a single atomic document, and don't try to use SQL operations on fields within it. If you regularly need to access fields within your JSON, make them into conventional columns.

Hash of two columns in mysql

I have a MYSQL table, with 5 columns in it:
id bigint
name varchar
description varchar
slug
Can I get MySQL to automatically generate the value of slug as a 256 Bit Hash of name+description?
I am now using PHP to generate an SHA256 value of the slug prior to saving it.
Edit:
By automatic, I mean see if it's possible to change the default value of the slug field, to be a computed field that's the sha256 of name+description.
I already know how to create it as part of an insert operation.
MySQL 5.7 supports generated columns so you can define an expression, and it will be updated automatically for every row you insert or update.
CREATE TABLE IF NOT EXISTS MyTable (
id int NOT NULL AUTO_INCREMENT,
name varchar(50) NOT NULL,
description varchar(50) NOT NULL,
slug varchar(64) AS (SHA2(CONCAT(name, description), 256)) STORED NOT NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
If you use an earlier version of MySQL, you could do this with TRIGGERs:
CREATE TRIGGER MySlugIns BEFORE INSERT ON MyTable
FOR EACH ROW SET slug = SHA2(CONCAT(name, description));
CREATE TRIGGER MySlugUpd BEFORE UPDATE ON MyTable
FOR EACH ROW SET slug = SHA2(CONCAT(name, description), 256);
Beware that concat returns NULL if any one column in the input is NULL. So, to hash in a null-safe way, use concat_ws. For example:
select md5(concat_ws('', col_1, .. , col_n));
Use MySQL's CONCAT() to combine the two values and SHA2() to generate a 256 bit hash.
CREATE TABLE IF NOT EXISTS `mytable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL,
`description` varchar(50) NOT NULL,
`slug` varchar(64) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `mytable` (`name`,`description`,`slug`)
VALUES ('Fred','A Person',SHA2(CONCAT(`name`,`description`),256));
SELECT * FROM `mytable`
OUTPUT:
COLUMN VALUE
id 1
name Fred
description A Person
slug ea76b5b09b0e004781b569f88fc8434fe25ae3ad17807904cfb975a3be71bd89
Try it on SQLfiddle.

MySQL + PHP ensure unique username (auto increment suffix)

Hi i am looking for the most performant way to ensure a unique username.
I did already check similar questions but none of them made me happy.
So here I came up with my solution. I appreciate your comments.
CREATE TABLE IF NOT EXISTS `user` (
`guid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`firstname` varchar(48) NOT NULL,
`lastname` varchar(48) NOT NULL,
`username` varchar(128) NOT NULL,
`unique_username` varchar(128) NOT NULL,
PRIMARY KEY (`guid`),
KEY `firstname` (`firstname`),
KEY `lastname` (`lastname`),
KEY `username` (`username`),
UNIQUE KEY `unique_username` (`unique_username`),
UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
username contains firstname.lastname without numeric suffix while unique_username contains firstname.lastname.(count of equal usernames)
to get the count of equal usernames I am performing following query against the user table (in advance to the insert).
SELECT COUNT(*) FROM user WHERE username = 'username'
Unfortunately I can't use a lookup against firstname and lastname since they are case sensitive.
The docs say “nonbinary strings (CHAR, VARCHAR, TEXT), string searches use the collation of the comparison operands… nonbinary string comparisons are case insensitive by default”, so you should be able to do this:
SELECT COUNT(*) FROM user WHERE CONCAT_WS('.', `firstname`, `lastname`) = 'username`
To get around the case sensitivity you can use LCASE(column) to compare lower case values:
SELECT COUNT(*) FROM user
WHERE LCASE(lastname) = LCASE('Lastname')
AND LCASE(firstname) = LCASE('firstName');
You could also use LIKE to check the username field:
SELECT COUNT(*) FROM user WHERE username LIKE 'username%';
That way 'this.name', 'this.name.1' and 'this.name.2' would all get counted together.
I think both of these solutions will not let the optimizer take advantage of the indexes, so the performance might go down, but might be a non-issue.

MySQL SHA1 hash does not match

I have a weird problem with a MySQL users table. I have quickly created a simplified version as a testcase.
I have the following table
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`identity` varchar(255) NOT NULL,
`credential` varchar(255) NOT NULL,
`credentialSalt` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=ucs2 AUTO_INCREMENT=2 ;
INSERT INTO `users` (`id`, `identity`, `credential`, `credentialSalt`) VALUES
(1, 'test', '7288edd0fc3ffcbe93a0cf06e3568e28521687bc', '123');
And I run the following query
SELECT id,
IF (credential = SHA1(CONCAT('test', credentialSalt)), 1, 0) AS dynamicSaltMatches,
credentialSalt AS dynamicSalt,
SHA1(CONCAT('test', credentialSalt)) AS dynamicSaltHash,
IF (credential = SHA1(CONCAT('test', 123)), 1, 0) AS staticSaltMatches,
123 AS staticSalt,
SHA1(CONCAT('test', 123)) AS staticSaltHash
FROM users
WHERE identity = 'test'
Which gives me the following result
The dynamic salt does NOT match while the static salt DOES match.
This is blowing my mind. Can someone help me point out the cause of this?
My MySQL version is 5.5.29
It's because of the default character set of your table. You appear to be running this on a UTF8 database and something in SHA1() is having problems with the differing character sets.
If you change your table declaration to the following it will match again:
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`identity` varchar(255) NOT NULL,
`credential` varchar(255) NOT NULL,
`credentialSalt` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
SQL Fiddle
As robertklep commented explicitly casting your string to a character will also work, basically ensure you're using the same characterset when doing comparisons using SHA1()
As the encryption functions documentation says:
Many encryption and compression functions return strings for which the result might contain arbitrary byte values. If you want to store these results, use a column with a VARBINARY or BLOB binary string data type. This will avoid potential problems with trailing space removal or character set conversion that would change data values, such as may occur if you use a nonbinary string data type (CHAR, VARCHAR, TEXT).
This was changed in version 5.5.3:
As of MySQL 5.5.3, the return value is a nonbinary string in the connection character set. Before 5.5.3, the return value is a binary string; see the notes at the beginning of this section about using the value as a nonbinary string.

Optimising a slow MySQL query

I have a MySQL query as follows:
SELECT KeywordText, SUM(Frequency) AS Frequency FROM Keyword, Keyword_Polling_Frequency_Index
WHERE Keyword.KeywordText
IN ('deal', 'obama' and other keywords...)
AND RSSFeedNo IN (106, 107 and other RSS feeds)
AND PollingDateTime
BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
AND Keyword.KeywordNo = Keyword_Polling_Frequency_Index.KeywordNo
GROUP BY Keyword.KeywordText
ORDER BY Keyword.KeywordText ASC
The query is used by an hourly batch program which involves two tables and is meant to get the frequencies of a list of keywords from a list of RSS feeds for a given hour. The Keyword_Polling_Frequency_Index table has a composite primary key of KeywordNo, RSSFeedNo and PollingDateTime. The query joins this table to the Keyword table which contains the KeywordText. column keywordText has a MySQL MyISAM full text index.
In testing this was found to perform satisfactorily but has now started running very slowly and affects the interactive speed of pages of the application. When I check the MySQL logs, I find that MySQL is creating temporary tables.
So, my question is, given that this query has to handle dozens of keywords in dozens of RSS feeds to calculate the frequencies, can anyone suggest an optimisation?
I have thought of breaking the query up by keyword but am not convinced of the practicality of this.
Can anyone help?
I am using MySQL Community Edition 5.X and an EXTENDED EXPLAIN of a version of this query is shown above.
SQL for the tables is as follows:
CREATE TABLE `keyword` (
`KeywordNo` int(10) unsigned NOT NULL AUTO_INCREMENT,
`KeywordText` varchar(64) NOT NULL,
`UserOriginated` enum('TRUE','FALSE') NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
`StopWord` enum('TRUE','FALSE') NOT NULL,
`CreatedDate` date NOT NULL,
`CreatedTime` time NOT NULL,
PRIMARY KEY (`KeywordNo`),
FULLTEXT KEY `KEYWORDTEXT` (`KeywordText`)
) ENGINE=MyISAM AUTO_INCREMENT=44047 DEFAULT CHARSET=latin1$$
CREATE TABLE `keyword_polling_frequency_index` (
`KeywordNo` int(10) unsigned NOT NULL,
`RSSFeedNo` int(10) unsigned NOT NULL,
`PollingDateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Frequency` int(10) NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
PRIMARY KEY (`KeywordNo`,`RSSFeedNo`,`PollingDateTime`),
KEY `FK_keyword_polling_frequency_index_1` (`UserNo`),
CONSTRAINT `FK_keyword_polling_frequency_index_1` FOREIGN KEY (`UserNo`) REFERENCES `user` (`UserNo`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$
As mentioned previously, add an index to the PollingDateTime field in the order mentioned as well. This is my suggestion:
SELECT
K.KeywordText,
SUM(F.Frequency) AS Frequency
FROM
Keyword K, Keyword_Polling_Frequency_Index F
WHERE
EXISTS
(
SELECT 1
FROM Keyword K1
WHERE
MATCH K1.KeywordText AGAINST ('deal obama "another keyword" yetanother' IN BOOLEAN MODE)
AND K1.KeywordNo = K.KeywordNo
)
AND K.KeywordNo = F.KeywordNo
AND F.PollingDateTime BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
AND F.RSSFeedNo IN (106, 107, 110)
GROUP BY K.KeywordText
ORDER BY K.KeywordText ASC
This will probably reduce the number of records for the comparison (SQL inside-out parsing) instead of directly matching two tables (N x N).
If you don't have any indexes you should create relevant indexes.
The minimum index is on keyword_polling_frequency_index.PollingDateTime