Optimize index on my table - mysql

I've a table friends :
CREATE TABLE IF NOT EXISTS `friends` (
`fr_id` int(11) NOT NULL AUTO_INCREMENT,
`fr_sender` int(11) NOT NULL,
`fr_receiver` int(11) NOT NULL,
`fr_validate` enum('0','1','2') NOT NULL DEFAULT '0',
PRIMARY KEY (`fr_id`),
KEY `fr_sender` (`fr_sender`),
KEY `fr_receiver` (`fr_receiver`),
KEY `fr_validate` (`fr_validate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=2397953 ;
fr_id => index
fr_sender => sender of the friend request
fr_receiver => the receiver
fr_falidate => 0 = no reply, 1 = request accepted, 2 = request refused.
My mysql-slow.log have many lines for this query:
SELECT fr_id FROM friends WHERE (fr_sender = '113405' OR fr_receiver = '113405') && fr_validate = "1";
# Query_time: 5.607869 Lock_time: 0.000052 Rows_sent: 106 Rows_examined: 833517
How can I optimise my index for this query ?
Thank you.

It's tricky to optimize when you use OR between conditions for two different columns. It often results in a costly table-scan.
Here's a workaround:
ALTER TABLE friends
ADD INDEX (fr_validate, fr_sender),
ADD INDEX (fr_validate, fr_receiver);
SELECT fr_id FROM friends WHERE fr_validate = '1' AND fr_sender = '113405'
UNION
SELECT fr_id FROM friends WHERE fr_validate = '1' AND fr_receiver = '113405'
The reason to create two indexes is so that each subquery has a respective index to minimize the rows examined as much as possible. Then the results from each subquery are combined, which gives an equivalent result set as your original query.
PS: Please use single-quotes for string literals and date literals. MySQL allows double-quotes to serve the same role by default, but if you use another RDBMS brand, or if you SET SQL_MODE=ANSI_QUOTES in MySQL, you'll find the standard meaning of double-quotes is for delimiting table names and column names, not strings.

Related

MySQL - Select only the rows that have not been selected in the last read

Problem description
I have a table, say trans_flow:
CREATE TABLE trans_flow (
id BIGINT(20) AUTO_INCREMENT PRIMARY KEY,
card_no VARCHAR(50) DEFAULT NULL,
money INT(20) DEFAULT NULL
)
New data is inserted into this table constantly.
Now, I want to fetch only the rows that have not been fetched in the last query. For example, at 5:00, id ranges from 1 to 100, and I read the rows 80 - 100 and do some processing. Then, at 5:01, the id comes to 150, and I want to get exactly the rows 101 - 150. Otherwise, the processing program will read in old and already processed data. Note that such queries are committed continuously. From a certain perspective, I want to implement "streaming process" on MySQL.
A tentative idea
I have a simple but maybe ugly solution. I create an auxiliary table query_cursor which stores the beginning and end ids of one query:
CREATE TABLE query_cursor (
task_id VARCHAR(20) PRIMARY KEY COMMENT 'Specify which task is reading this table',
first_row_id BIGINT(20) DEFAULT NULL,
last_row_id BIGINT(20) DEFAULT NULL
)
During each query, I first update the query range stored in this table by:
UPDATE query_cursor
SET first_row_id = (SELECT last_row_id + 1 FROM query_cursor WHERE task_id = 'xxx'),
last_row_id = (SELECT MAX(id) FROM trans_flow)
WHERE task_id = 'xxx'
And then, doing query on table trans_flow using stored cursors:
SELECT * FROM trans_flow
WHERE id BETWEEN (SELECT first_row_id FROM query_cursor WHERE task_id = 'xxx')
AND (SELECT last_row_id FROM query_cursor WHERE task_id = 'xxx')
Question for help
Is there a simpler and more elegant implementation that can achieve the same effect (the best if no need to use an auxiliary table)? The version of MySQL is 5.7.

Serious MySQL query performance issues after adding condition

My problem is that I have a mysql query that runs really fast (0.3 seconds) even though it has a large amount of left joins and a few conditions on the joined columns, but when I add one more condition the query takes upwards of 180 seconds! I understand that the condition means the execution plan has to adjust to pull all potential records first and then apply the condition in a loop, but what's weird to me is that the fast query without the additional condition only returns 16 rows, and even just wrapping the query with the condition on the outer query takes a crazy amount of time when you would think it would only just add an additional loop through 16 rows...
If it matters this is using Amazon Aurora serverless which should align with mysql 5.7
Here's what the query looks like. You can see the additional condition is commented out. (The general table structure of the DB itself cannot change currently so please refrain from suggesting a full database restructuring)
select
e1.entityId as _id,
v1.Value,
v2.Value
v3.Value,
v4.Value,
v5.Value,
v6.Value,
v7.Value,
v8.Value,
v9.Value,
v10.Value,
v11.Value,
v12.Value
from entity e1
left join val as v1 on (v1.entityId = e1.entityId and v1.attributeId = 1189)
left join val as v2 on (v2.entityId = e1.entityId and v2.attributeId = 1190)
left join entity as e2 on e2.entityId = (select entityId from entity where code = v1.Value and type = 88 limit 1)
left join val as v3 on (v3.entityId = e2.entityId and v3.attributeId = 507)
left join val as v4 on (v4.entityId = e2.entityId and v4.attributeId = 522)
left join val as v5 on (v5.entityId = e2.entityId and v5.attributeId = 558)
left join val as v6 on (v6.entityId = e2.entityId and v6.attributeId = 516)
left join val as v7 on (v7.entityId = e2.entityId and v7.attributeId = 518)
left join val as v8 on (v8.entityId = e2.entityId and v8.attributeId = 1384)
left join val as v9 on (v9.entityId = e2.entityId and v9.attributeId = 659)
left join val as v10 on (v10.entityId = e2.entityId and v10.attributeId = 519)
left join val as v11 on (v11.entityId = e2.entityId and v11.attributeId = 1614)
left join entity as e3 on e3.entityId = (select entityId from entity where code = v9.Value and type = 97 limit 1)
left join val as v12 on (v12.entityId = e3.entityId and v12.attributeId = 661)
where e1.type = 154
and v2.Value = 'foo'
and v5.Value = 'bar'
and v10.Value = 'foo2'
-- and v11`.Value = 'bar2'
order by v3.Value asc;
And wrapping that in something like this still takes forever...
select *
from (
<query from above>
) sub
where sub.v11 = 'bar2';
query execution plan with the condition commented out (fast)
query execution plan with the condition included (slow)
I'm going to fiddle around with indexing on the "entity" tables to improve the execution plan regardless which will likely help... but can someone explain what's going on here and what I should be looking at in the execution plan that would indicate such bad performance? And why wrapping the fast query in a subquery so that the outer query should only loop over 16 rows takes a really long time?
EDIT: I noticed in the slow query that the far left execution is using a non-unique key lookup (which is on val.entityId) for "68e9145e-43eb-4581-9727-4212be41bef5" (v11) instead of the unique key lookup the rest are using (which is a composite index on entityId,attributeId). I presume this might be part of the issue, but why can't it use the the composite index there like it does for the rest?
PS: For now since we know the result set will be small, we are implementing that last condition server side with a filter on the result set in our nodeJS server.
Here's the results of "SHOW CREATE TABLE entity" and "SHOW CREATE TABLE val"
CREATE TABLE `entity` (
`entityId` int(11) NOT NULL AUTO_INCREMENT,
`UID` varchar(64) NOT NULL,
`type` int(11) NOT NULL,
`code` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY (`entityId`),
UNIQUE KEY `UID` (`UID`),
KEY `IX_Entity_Type` (`type`),
CONSTRAINT `FK_Entities_Types` FOREIGN KEY (`type`) REFERENCES `entityTypes` (`typeId`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=296138 DEFAULT CHARSET=latin1
CREATE TABLE `val` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`UID` varchar(64) NOT NULL,
`attributeId` int(11) NOT NULL,
`entityId` int(11) NOT NULL,
`Value` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY (`id`),
UNIQUE KEY `UID` (`UID`),
UNIQUE KEY `idx_val_entityId_attributeId` (`entityId`,`attributeId`),
KEY `IX_val_attributeId` (`attributeId`),
KEY `IX_val_entityId` (`entityId`)
) ENGINE=InnoDB AUTO_INCREMENT=2325375 DEFAULT CHARSET=latin1
Please provide SHOW CREATE TABLE.
I would hope to see these composite indexes:
`val`: (entityId, attributeId) -- order is not critical
Alas, because code is LONGTEXT, this is not possible for entity: INDEX(type, code, entityId). Hence this will not be very efficient:
SELECT entityId
from entity
where code = v9.Value
and type = 97
limit 1
I see LIMIT with an ORDER BY -- do you care which value you get?
Probably that would be better written as
WHERE EXISTS ( SELECT 1 FROM entity
WHERE entityID = e3.entityID
AND code = v9.Value
AND type = 97 )
(Are you sure about the mixture of e3 and v9?)
Wrapping...
This forces the LEFT JOIN to become JOIN. And it gets rid of the then inner ORDER BY.
Then the Optimizer probably decides it is best to start with 68e9145e-43eb-4581-9727-4212be41bef5, which I call val AS v11:
JOIN val AS v11 ON (v11.entityId = e2.id
and v11.attributeId = 1614)
AND v11.Value = 'bar2')
If this is an EAV table, then all it does is verify that [, 1514] has value 'bar2'. This does not seem like a sensible test.
in addition to my former recommendation.
I would prefer EXPLAIN SELECT ....
EAV
Assuming val is a traditional EAV table, this would probably be much better:
CREATE TABLE `val` (
`attributeId` int(11) NOT NULL,
`entityId` int(11) NOT NULL,
`Value` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
PRIMARY KEY(`entityId`,`attributeId`),
KEY `IX_val_attributeId` (`attributeId`),
) ENGINE=InnoDB AUTO_INCREMENT=2325375 DEFAULT CHARSET=latin1
The two IDs have no practical use (unless I am missing something). If you are forced to use them because of a framework, that is unfortunate. Promoting (entityId, attributeId) to be the PK makes fetching value a little faster.
There is no useful way to include a LONGTEXT in any index, so some of my previous suggestions need changing.

Using SQL to select records with a single "true" bit field from several bit fields

If I have a table like this:
CREATE TABLE `Suppression` (
`SuppressionId` int(11) NOT NULL AUTO_INCREMENT,
`Address` varchar(255) DEFAULT NULL,
`BooleanOne` bit(1) NOT NULL DEFAULT '0',
`BooleanTwo` bit(1) NOT NULL DEFAULT '0',
`BooleanThree` bit(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`SuppressionId`),
)
Is there a set-based way in which I can select all records which have exactly one of the three bit fields = 1 without writing out the field names?
For example given:
1 10 Pretend Street 1 1 1
2 11 Pretend Street 0 0 0
3 12 Pretend Street 1 1 0
4 13 Pretend Street 0 1 0
5 14 Pretend Street 1 0 1
6 14 Pretend Street 1 0 0
I want to return records 4 and 6.
You could "add them up":
where cast(booleanone as unsigned) + cast(booleantwo as unsigned) + cast(booleanthree as unsigned) = 1
Or, use tuples:
where ( (booleanone, booleantwo, booleanthree) ) in ( (0b1, 0b0, 0b0), (0b0, 0b1, 0b0), (0b0, 0b0, 0b1) )
I'm not sure what you mean by "set-based".
If your number of booleans can vary over time and you don't want to update your code, I suggest you make them lines and not columns.
For example:
CREATE TABLE `Suppression` (
`SuppressionId` int(11) NOT NULL AUTO_INCREMENT,
`Address` varchar(255) DEFAULT NULL,
`BooleanId` int(11) NOT NULL,
`BooleanValue` bit(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`SuppressionId`,`BooleanId`),
)
So with 1 query and a 'group by' you can check all values of your booleans, however numerous they are. Of course, this makes your tables bigger.
EDIT: Just came out with another idea: why don't you have a checksum column added, whose value would be the sum of all your bits? So you would update it at every write into your table, and just check this one in your select
If you
must use this denormalized way of representing these flags, and you
must be able to add new flag columns to your table in production, and you
cannot rewrite your queries by hand when you add columns,
then you must figure out how to write a program to write your queries.
You can use this query to retrieve a result set of boolean-valued columns, then you can use that result set in a program to write a query involving all those columns.
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = 'Suppression'
AND COLUMN_NAME LIKE 'Boolean%'
AND DATA_TYPE = 'bit'
AND NUMERIC_PRECISION=1
The approach you have proposed here will work exponentially more poorly as you add columns, unfortunately. Any time a software engineer says "exponential" it's time to run away screaming. Seriously.
A much more scalable approach is to build a one-to-many relationship between your Suppression rows and your flags. Add this table.
CREATE TABLE SuppressionFlags (
SuppressionId int(11) NOT NULL,
FlagName varchar(31) NOT NULL,
Value bit(1) NOT NULL DEFAULT '0',
PRIMARY KEY (SuppressionID, FlagName)
)
Then, when you want to insert a row with some flag variables, do this sequence of queries.
INSERT INTO Suppression (Address) VALUES ('some address');
SET #SuppressionId := LAST_INSERT_ID();
INSERT INTO SuppressionFlags (SuppressionId, FlagName, Value)
VALUES (#SuppressionId, 'BooleanOne', 1);
INSERT INTO SuppressionFlags (SuppressionId, FlagName, Value)
VALUES (#SuppressionId, 'BooleanTwo', 0);
INSERT INTO SuppressionFlags (SuppressionId, FlagName, Value)
VALUES (#SuppressionId, 'BooleanThree', 0);
This gives you one Suppression row with three flags set in the SuppressionFlags table. Note the use of #SuppressionId to set the Id values in the second table.
Then to find all rows with just one flag set, do this.
SELECT Suppression.SuppressionId, Suppression.Address
FROM Suppression
JOIN SuppressionFlags ON Suppression.SuppressionId = SuppressionFlags.SuppressionId
GROUP BY Suppression.SuppressionId, Suppression.Address
HAVING SUM(SuppressionFlags.Value) = 1
It gets a little trickier if you want more elaborate combinations. For example, if you want all rows with BooleanOne and either BooleanTwo or BooleanThree set, you need to do something like this.
SELECT S.SuppressionId, S.Address
FROM Suppression S
JOIN SuppressionFlags A ON S.SuppressionId=A.SuppressionId AND A.FlagName='BooleanOne'
JOIN SuppressionFlags B ON S.SuppressionId=B.SuppressionId AND B.FlagName='BooleanTwo'
JOIN SuppressionFlags C ON S.SuppressionId=C.SuppressionId AND C.FlagName='BooleanThree'
WHERE A.Value = 1 AND (B.Value = 1 OR C.Value = 1)
This common database pattern is called the attribute / value pattern. Because SQL doesn't easily let you use variables for column names (it doesn't really have reflection) this kind of way of naming your attributes is your best path to extensibility.
It's a little more SQL. But you can add as many new flags as you need, in production, without rewriting queries or getting a combinatorial explosion of flag-matching. And SQL is built to handle this kind of query.

MYSQL CSV column check for exclude

I need to find a record who dont have a specific value in CSV column. below is the table structure
CREATE TABLE `employee` (
`id` int NOT NULL AUTO_INCREMENT,
`first_name` varchar(100) NOT NULL,
`last_name` varchar(100) NOT NULL,
`keywords` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample record1: 100, Sam, Thompson, "50,51,52,53"
Sample record2: 100, Wan, Thompson, "50,52,53"
Sample record3: 100, Kan, Thompson, "53,52,50"
50 = sports
51 = cricket
52 = soccer
53 = baseball
i need to find the employees name who has the tags of "sports,soccer,baseball" excluding cricket
so the result should return only 2nd and 3rd record in this example as they dont have 51(cricket) but all other 3 though in diff pattern.
My query is below, but i couldnt get it worked any more.
SELECT t.first_name,FROM `User` `t` WHERE (keywords like '50,52,53') LIMIT 10
is there anything like unlike option? i am confused how to get this worked.
You could use FIND_IN_SET:
SELECT t.first_name
FROM `User` `t`
WHERE FIND_IN_SET('50', `keywords`) > 0
AND FIND_IN_SET('52', `keywords`) > 0
AND FIND_IN_SET('53', `keywords`) > 0
AND FIND_IN_SET('51', `keywords`) = 0;
Keep in mind it could be slow. The correct way is to normalize your table structure.
FIND_IN_SET will do the job for you but it does not use indexes. This is not a bug it's a feature.
SUBSTRING_INDEX can use an index and return the data as you wish. You don't have an index on it at the moment, But the catch here is that TEXT fields cannot be fully indexed and what you have is a TEXT field.
Normalize!
This is what you really should be doing. It's not a good idea to store comma separated values in a database. You really should be having a keywords table and since the keywords will be short, you can have a char or varchar narrow column which can be fully indexed.

Subtract from zero not working in query

I have this table:
CREATE TABLE `page` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`sortorder` SMALLINT(5) UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This is the data I have:
id sortorder
1 0
2 1
And I want to run this query:
select id from page where (sortorder = (select sortorder from page where id = 1) - 1)
(I'm trying to find the previous page, ie the one with the lower sortorder, if it exists. If none exists, I want an empty result set.)
The error I receive from mysql:
SQL Error (1690): BIGINT UNSIGNED value is out of range in '((select '0' from `page` where 1) - 1)'
And more specifically when I run:
select sortorder - 1 from page where id = 1
I get:
SQL Error (1690): BIGINT UNSIGNED value is out of range in '('0' - 1)'
What can I do to prevent this?
I usually use JOINs for this goal because they can be optimized better than the sub-queries. This query should produce the same result as yours but probably faster:
SELECT pp.*
FROM page cp # 'cp' from 'current page'
LEFT JOIN page pp # 'pp' from 'previous page'
ON pp.sortorder = cp.sortorder - 1
WHERE cp.id = 1
Unfortunately it fails running with the same error message about -1 not being UNSIGNED.
It can be fixed by writing the JOIN condition as:
ON pp.sortorder + 1 = cp.sortorder
I moved the -1 to the other side of the equal sign and it turned to +1.
You can also fix your original query by using the same trick: moving -1 to the other side of the equal sign; this way it becomes +1 and there is no error any more:
select id
from page
where (sortorder + 1 = (select sortorder from page where id = 1)
The problem with both queries now is that, because there is no index on column sortorder, MySQL is forced to check all the rows one by one until it finds one matching the WHERE (or ON) condition and this takes a lot of time and uses a lot of resources.
Fortunately, this can be fixed easily by adding an index on column sortorder:
ALTER TABLE page ADD INDEX(sortorder);
Now both queries can be used. The one using JOIN (and the ON condition with +1) is slightly faster.
The original query doesn't return any rows when the condition is not met. The JOIN query returns a row full of NULLs. It can be modified to return no rows by replacing LEFT JOIN with INNER JOIN.
You can circumvent the error altogether (and use any version of these queries) by removing the UNSIGNED attribute from column sortorder:
ALTER TABLE page
CHANGE COLUMN `sortorder` `sortorder` SMALLINT(5) UNSIGNED NOT NULL;
Try to set your SQL Mode in 'NO_UNSIGNED_SUBTRACTION'
SET sql_mode = 'NO_UNSIGNED_SUBTRACTION'