MySQL index based on last digit of INT column - mysql

Is it possible to create an index in MySQL for the last digit of an int column?
Based on this answer i have created partitions based on last digit of an int column
CREATE TABLE partition_test(
textfiled INT,
cltext TEXT,
reindexedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
indexedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
status TINYINT(2),
postId INT)
PARTITION BY HASH(MOD(postId, 10))
PARTITIONS 10;
I'm trying to create an index for the last digit of postId for optimizing queries time. Is there any way to do this or a simple index on postId is enough?
Some failed tries:
CREATE INDEX postLastDigit USING HASH ON partition_test (MOD(postId, 10));
(1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'MOD(postId, 10))' at line 1")
and
CREATE INDEX postLastDigit ON partition_test (MOD(postId, 10));
(1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'MOD(postId, 10))' at line 1")
UPDATE:
The table has more than 100M row.
My goal is to optimize queries like:
1)
SELECT cltext FROM partition_tables
WHERE postId in (<INT>, <INT>)
AND status IS NOT NULL
2)
SELECT cltext FROM partition_tables
WHERE postId in (<INT>, <INT>)
AND status IS NOT NULL
AND reindexedAt BETWEEN (<DATE>, <DATE>)
MariaDB version: 10.1.23-MariaDB-9+deb9u1

What query are you trying to speed up? Without any indexes on the table, any query will have to scan the entire table! If you want speed, first look to indexing.
If your query is SELECT ... WHERE post_id = 123, your Partitioning might make it run about 10 times as fast. But INDEX(post_id), with or without partitioning, will make it run hundreds of times as fast.
Please provide the SELECTs so we can help you speed them up.
(OK, if you are just playing around with partitioning, the others have given you viable answers.)
"Partition Pruning" is rarely faster than a suitable index that starts with the pruning column.
After you solve your stated hashing problem, please report back whether the queries any faster than using an index. Even pitted against an index, I predict partitioning will not run faster, and may even run a little slower.

You have tagged your question with mariadb and mysql. If you are using a resonably recent version of MariaDB, you can use generated columns for indexing. If you are using MySQL, you can do the same if your MySQL version is at least 5.7.
If you are using a lower version of MySQL, you could create an additional column in your table where you store the last digit of postId for each row, and use that column for indexing / partitioning.
This would mean minimal changes to your application code: Before inserting or updating, get the last digit of postId first, and insert / update one more field. As an alternative, you eventually could use triggers to automatically fill that additional column.

Use virtual columns. In MariaDB 10.2, you can create index on virtual aka generated column, like this
CREATE TABLE t (
num int,
last_digit int(1) AS (num % 10) VIRTUAL,
KEY index_last_digit (last_digit)
)
Then you can use last_digit in your queries, i.e SELECT ... WHERE last_digit=1
In older versions of MariaDB, 5.2 to 10.1 , you'd need to specify PERSISTENT attribute rather than VIRTUAL, because non-persistent generated columns could not be indexed.

Related

Indexing an array in a MySQL JSON column with variable length elements

I am trying to index json arrays where the contents are variable length strings and I can't figure out if its possible, let alone scalable.
A very similar question about indexing JSON data using the new multi value index is here: Indexing JSON column in MySQL 8
The syntax from that question executes, but using CHAR isn't right for me and ends in an error anyway. After changing names and adjusting the CHAR length for my data:
ALTER TABLE catalog ADD INDEX idx_30144( (CAST( j_data->>'$."30144"' AS char(250) ARRAY)) );
I get this error
1034 - Incorrect key file for table 'catalog'; try to repair it
Trying this:
ALTER TABLE catalog ADD INDEX idx_30144( (CAST( j_data->>'$."30144"' AS varchar(250) ARRAY)) );
Gives this error:
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'varchar(250) ARRAY)) )' at line 1
This is an InnoDB table so obviously the 1034 error isn't accurate. It completes in around 2 seconds so while it could be running out of space, it happens too fast to see that, and there's 350 GB free on the drive.
I have over 200 JSON nodes like this that I would like to index, ideally. If this is a huge storage suck I can be happy with a subset of them, but I need to know if its possible in the first place.
You can only index such values, by generating a column which you index
Like
CREATE TABLE jempn (
id BIGINT(20) NOT NULL AUTO_INCREMENT PRIMARY KEY,
j_data JSON DEFAULT NULL,
g varchar(250) GENERATED ALWAYS AS (j_data->'$."30144"' ) STORED,
INDEX idx_30144 (g)
) ENGINE=INNODB;

How to optimise mysql query as Full ProcessList is showing Sending Data for over 24 hours

I have the following query that runs forever and I am looking to see if there is anyway that I can optimise it. This is running on a table that has in total 1,406,480 rows of data but apart from the Filename and Refcolumn, the ID and End_Date have both been indexed.
My Query:
INSERT INTO UniqueIDs
(
SELECT
T1.ID
FROM
master_table T1
LEFT JOIN
master_table T2
ON
(
T1.Ref_No = T2.Ref_No
AND
T1.End_Date = T2.End_Date
AND
T1.Filename = T2.Filename
AND
T1.ID > T2.ID
)
WHERE T2.ID IS NULL
AND
LENGTH(T1.Ref_No) BETWEEN 5 AND 10
)
;
Explain Results:
The reason for not indexing the Ref_No is that this is a text column and therefore I get a BLOB/TEXT error when I try and index this column.
Would really appreciate if somebody could advise on how I can quicken this query.
Thanks
Thanks to Bill in regards to multi column indexes I have managed to make some headway. I first ran this code:
CREATE INDEX I_DELETE_DUPS ON master_table(id, End_Date);
I then added a new column to show the length of the Ref_No but had to change it from the query Bill mentioned as my version of MySQL is 5.5. So I ran it in 3 steps:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED;
UPDATE master_table SET Ref_No_length = LENGTH(Ref_No);
ALTER TABLE master_table ADD INDEX (Ref_No_length);
Last step was to change my insert query with the where clause for the length. This was changed to:
AND t1.Ref_No_length between 5 and 10;
I then ran this query and within 15 mins I had 280k worth of id's inserted into my UniqueIDs table. I did go change my insert script to see if I could add more values to the length by doing the following:
AND t1.Ref_No_length IN (5,6,7,8,9,10,13);
This was to bring in the values where length was also equal to 13. This query took a lot longer, 2hr 50 mins to be precise but the additional ask of looking for all rows that have length of 13 gave me an extra 700k unique ids.
I am looking at ways to optimise the query with the IN clause, but a big improvement where this query kept running for 24 hours. So thank you so much Bill.
For the JOIN, you should have a multi-column index on (Ref_No, End_Date, Filename).
You can create a prefix index on a TEXT column like this:
ALTER TABLE master_table ADD INDEX (Ref_No(10));
But that won't help you search based on the LENGTH(). Indexing only helps search by value indexed, not by functions on the column.
In MySQL 5.7 or later, you can create a virtual column like this, with an index on the values calculated for the virtual column:
ALTER TABLE master_table
ADD COLUMN Ref_No_length SMALLINT UNSIGNED AS (LENGTH(Ref_No)),
ADD INDEX (Ref_No_length);
Then MySQL will recognize that your condition in your query is the same as the expression for the virtual column, and it will automatically use the index (exception: in my experience, this doesn't work for expressions using JSON functions).
But this is no guarantee that the index will help. If most of the rows match the condition of the length being between 5 and 10, the optimizer will not bother with the index. It may be more work to use the index than to do a table-scan.
the ID and End_Date have both been indexed.
You have PRIMARY KEY(id) and redundantly INDEX(id)? A PK is a unique key.
"have both been indexed" -- INDEX(a), INDEX(b) is not the same as INDEX(a,b) -- they have different uses. Read about "composite" indexes.
That query smells a lot like "group-wise" max done in a very slow way. (Alas, that may have come from the online docs.)
I have compiled the fastest ways to do that task here: http://mysql.rjweb.org/doc.php/groupwise_max (There are multiple versions, based on MySQL version and what issues your code can/cannot tolerate.)
Please provide SHOW CREATE TABLE. One important question: Is id the PRIMARY KEY?
This composite index may be useful:
(Filename, End_Date, Ref_No, -- first, in any order
ID) -- last
This, as others have noted, is unlikely to be helped by any index, hence T1 will need a full-table-scan:
AND LENGTH(T1.Ref_No) BETWEEN 5 AND 10
If Ref_No cannot be bigger than 191 characters, change it to a VARCHAR so that it can be used in an index. Oh, did I ask for SHOW CREATE TABLE? If you can't make it VARCHAR, then my recommended composite index is
INDEX(Filename, End_Date, ID)

Will MySQL use Multiple-column index if I use columns in different order?

Reading the MySQL docs we see this example table with multiple-column index name:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
It is explained with examples in which cases the index will or will not be utilized. For example, it will be used for such query:
SELECT * FROM test
WHERE last_name='Widenius' AND first_name='Michael';
My question is, would it work for this query (which is effectively the same):
SELECT * FROM test
WHERE first_name='Michael' AND last_name='Widenius';
I couldn't find any word about that in the documentation - does MySQL try to swap columns to find appropriate index or is it all up to the query?
Should be the same because (from mysql doc) the query optiminzer work looking at
Each table index is queried, and the best index is used unless the
optimizer believes that it is more efficient to use a table scan. At
one time, a scan was used based on whether the best index spanned more
than 30% of the table, but a fixed percentage no longer determines the
choice between using an index or a scan. The optimizer now is more
complex and bases its estimate on additional factors such as table
size, number of rows, and I/O block size.
http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
In some cases, MySQL can read rows from the index without even
consulting the data file.
and this should be you case
Without ICP, the storage engine traverses the index to locate rows in
the base table and returns them to the MySQL server which evaluates
the WHERE condition for the rows. With ICP enabled, and if parts of
the WHERE condition can be evaluated by using only fields from the
index, the MySQL server pushes this part of the WHERE condition down
to the storage engine. The storage engine then evaluates the pushed
index condition by using the index entry and only if this is satisfied
is the row read from the table. ICP can reduce the number of times the
storage engine must access the base table and the number of times the
MySQL server must access the storage engine.
http://dev.mysql.com/doc/refman/5.7/en/index-condition-pushdown-optimization.html
For the two queries you stated, it will work the same.
However, for queries which have only one of the columns, the order of the index matters.
For example, this will use the index:
SELECT * FROM test WHERE last_name='Widenius';
But this wont:
SELECT * FROM test WHERE first_name='Michael';

Creating Unique Compound Index where one field can be null in Mysql

I created Unique Compound Index:
Alter Table TableX Add Unique Index `UniqueRecord` (A,B,C,D)
The issue is that sometimes C can be NULL.
I noticed that
`Insert IGNORE`
Was still in some cases adding duplicate records and this turned out to be when those incoming records had C as NULL.
I tested the hypothesis that this was an issue by doing:
Select concat(A,B,C,D) as Index from TableA where C is NULL
And Index in each of those cases was in fact NULL. Once I remove the null field from the select:
Select concat(A,B,D) as Index from TableA where C is NULL
I get the expected string values vs nulls.
So the question is, other than doing an update like set C='' where C is NULL is there some way to set up the Index so that it works? I am loathe to simply make the Index A,B,D as that might introduce unwanted dupes when C in fact is not NULL.
Update:
I did try using IfNull in the Index creation but Mysql did not like that:
Alter Table TableA Add Unique Index UniqueLocator (A,B,IfNull(C,''),D
Mysql said:
[Err] 1064 - You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version
for the right syntax to use near 'C,''),D)' at line 1
Yes MySQL allows NULLs in unique indexes, which is the right thing to do. But you can define column C as NOT NULL if you don't like that.
MySQL -- but not all databases -- allow duplicate NULL values in unique indexes. I believe the ANSI standard is rather ambiguous on this point (or perhaps even contradictory). You basically have two choices.
The first is to define a default value for the column. This may not be appealing in terms of code, but it will at least generate an error on duplicate insert. For instance, if "C" is a foreign key reference to an auto-incremented id, then you might use -1 or 0 as the default value. If it is a date, you might use the zero date.
The other solution is a trigger, where you manually check for the duplicate values before doing an insert (or update).

MYSQL indexing issue

I am having some difficulties finding an answer to this question...
For simplicity lets create use this situation.
I create a table like this..
CREATE TABLE `test` (
`MerchID` int(10) DEFAULT NULL,
KEY `MerchID` (`MerchID`)
) ENGINE=InnoDB AUTO_INCREMENT=32769 DEFAULT CHARSET=utf8;
I will insert some data into the column of this table...
INSERT INTO test
SELECT 1
UNION
SELECT 2
UNION
SELECT null
Now I examine the query using MYSQL's explain feature...
EXPLAIN
SELECT * FROM test
WHERE merchid IS NOT NULL
Resting in ID=1
,select_type=SIMPLE
,table=test
,type=index
,possible_keys=MerchID
,key=MerchID
,key_len=5
,ref=NULL
,rows=3
,Extra= Using where
;Using index
In production in my real procedure something like this takes a long time with this index. If I re declare the table with the index line reading "KEY MerchID (MerchID) USING BTREE' I get much better results. The explain feature seems to return the same results too. I have read some basics about the BTREE, HASH and RTREE storage types for indexes/keys. When no storage type is specified I was unded the assumption that BTREE would be assumed. However I am kinda stumped why when modifying my index to use this storage type my procedure seems to fly. Any ideas?
I am using MYSQL 5.1 and coding in MYSQL Workbench. The part of procedure that appears to be help up is like the one I illustrated above where the column of a joined table is tested for NULL.
I think you are on the wrong path. For InnoDB storage the only available index method is the BTREE so if you are safe to omit the BTREE keyword from you table create script.Supported index types here along with other useful information.
The performance issue is coming from a different place.
Whenever testing performance, be sure to always use the SQL_NO_CACHE directive, otherwise, with query caching, the second time you run a query, your results may be returned a lot faster simply due to caching.
With a covering index (all of the selected and filtered columns are in the index), the query is rather efficient. Using index in the EXPLAIN result shows that it's being used as a covering index.
However, if the index were not a covering index, MySQL would have to perform a seek for each row returned by the index in order to grab the actual table data. While this would still be fast for a small result set, with a result set of 1 million rows, that would be 1 million seeks. If the number of NULL rows were a high percentage, MySQL would abandon the index altogether to avoid the seeks.
Ensure that your real "production" index is a covering index as well.