How to design a simple and efficient MySQL blacklist table - mysql

Suppose I have user table my_users in which there is a primary key id. Also, I wish to design (in MySQL) a simple blacklist table, whose declaration looks like this:
CREATE TABLE IF NOT EXISTS black_list (
user_id INT NOT NULL,
bad_string VARCHAR(100) NOT NULL,
FOREIGN KEY (user_id) REFERENCES my_users(id),
PRIMARY KEY (user_id, bad_string));
The interpretation of any row in the black_list is that a user with the ID user_id wants to blacklist the string bad_string. Obviously, user_id cannot be unique since a single user may have more than one blacklisted string. Other way around, bad_string cannot be unique since more than one users may have blacklisted the same string. However, the pair (user_id, bad_string) should be unique since it makes no sense for the user to black list the same string more than once.
When we select a black list via a user ID (SELECT * FROM black_list WHERE user_id = X) in the worst case, MySQL will have to scan the entire black_list table.
My question here is: is there a way for running the above SELECT statement in sublinear time with regard to the number of rows in the black_list table? If yes, how can I accomplish that?

Your assertion that SELECT * FROM black_list WHERE user_id = X will have to scan the entire black_list table is incorrect.
In this sql fiddle, you can see it's using an index:
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | black_list | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
+----+-------------+------------+------+---------------+---------+---------+-------+------+----------+-------------+

Related

MySQL UNIQUE for 2 columns and index [duplicate]

This question already has answers here:
Are composite unique keys indexed in MySQL?
(2 answers)
Closed 1 year ago.
I create the following table:
CREATE TABLE ta
(
id BIGINT NOT NULL auto_increment,
company_id BIGINT NOT NULL,
language VARCHAR(10) NOT NULL,
created_at DATETIME,
modified_at DATETIME,
version BIGINT,
PRIMARY KEY (id),
UNIQUE KEY unique_ta (company_id, language)
) engine = InnoDB;
CREATE INDEX ta_company_id on `ta` (company_id);
My question is if I need this line:
CREATE INDEX ta_company_id on `ta` (company_id);
?
Does UNIQUE create indexes on company_id, language automatically?
You probably don't need the extra index on company_id.
The UNIQUE KEY creates an index on the pair of columns (company_id, language) in that order. So any query you would run searching for a specific value of company_id would be able to use that index, even though it only references the first column of the unique key index.
You can see this in EXPLAIN:
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234;
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | ref | unique_ta | unique_ta | 8 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
You can see key_len: 8 meaning it is using 8 bytes of the index, and the first BIGINT for company_id is 8 bytes.
Whereas searching for both columns will use the full 50-byte size of the index (8 bytes for the BIGINT + 10 characters for the VARCHAR, 4 bytes per character using utf8mb4, plus a couple of bytes for the VARCHAR length):
mysql> EXPLAIN SELECT * FROM ta WHERE company_id = 1234 AND language = 'EN';
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | ta | NULL | const | unique_ta | unique_ta | 50 | const,const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+-----------+---------+-------------+------+----------+-------+
I said at the top "probably" because there is an exception case, for a specific form of query:
SELECT * FROM ta WHERE company_id = 1234 ORDER BY id;
This type of query would need id to be the second column of the index, so it could be assured of reading rows in primary key order. All indexes implicitly have the primary key column appended, even if you don't declare it. So your unique key index would really store the columns (company_id, language, id), and the single-column index really stores the columns (company_id, id). The latter index would optimize the query I show above, sorting by primary key efficiently.

Improving select statement performance

I have a table with around 500,000 rows, with a composite primary key index. I'm trying a simple select statement as follows
select * from transactions where account_id='1' and transaction_id='003a4955acdbcf72b5909f122f84d';
The explain statement gives me this
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
-------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | transactions | NULL | const | PRIMARY | PRIMARY | 94 | const,const | 1 | 100.00 | NULL
My primary index is on account_id and transaction_id.
My engine is InnoDB.
When I run my query it takes around 156 milliseconds.
Given that explain shows that only one row needs to be examined, I'm not sure how to optimize this further. What changes can I do to significantly improve this?
I'm going to speculate a bit, given the information provided: your primary key is composed of an integer field account_id and a varchar one called transaction_id.
Since they're both components of the PRIMARY index created when you defined them as PRIMARY KEY(account_id, transaction_id), as they are they're the best you can have.
I think the bottleneck here is the transaction_id: as a string, it requires more effort to be indexed, and to be searched for. Changing its type to a different, easier to search one (i.e. numbers) would probably help.
The only other improvement I see is to simplify the PRIMARY KEY itself, either by removing the account_id field (it seems useless to me, since the transaction_id tastes like being unique, but that depends on your design) or by substitute the whole key with an integer, AUTO INCREMENT value (not recommended).

Why are no keys used in this EXPLAIN?

I was expecting this query to use a key.
mysql> DESCRIBE TABLE Foo;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> EXPLAIN SELECT id FROM Foo WHERE name='foo';
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables |
+----+-------------+-------+------+---------------+------+---------+------+------+-----------------------------------------------------+
Foo has a unique index on name, so why isn't the index being used in the SELECT?
From the MySQL Manual page entitled EXPLAIN Output Format:
Impossible WHERE noticed after reading const tables (JSON property:
message)
MySQL has read all const (and system) tables and notice that the WHERE
clause is always false.
and the definition of const tables, from the Page entitled Constants and Constant Tables:
A MySQL constant is something more than a mere literal in the query.
It can also be the contents of a constant table, which is defined as
follows:
A table with zero rows, or with only one row
A table expression that is restricted with a WHERE condition,
containing expressions of the form column = constant, for all the
columns of the table's primary key, or for all the columns of any of
the table's unique keys (provided that the unique columns are also
defined as NOT NULL).
The second reference is a page and half long. Please refer to it.
const
const
The table has at most one matching row, which is read at the start of
the query. Because there is only one row, values from the column in
this row can be regarded as constants by the rest of the optimizer.
const tables are very fast because they are read only once.
const is used when you compare all parts of a PRIMARY KEY or UNIQUE
index to constant values. In the following queries, tbl_name can be
used as a const table:
SELECT * FROM tbl_name WHERE primary_key=1;
SELECT * FROM tbl_name WHERE primary_key_part1=1 AND
primary_key_part2=2;
It could be because that the said table Foo very less volume of data. In such case optimizer will choose to do table scan rather than looking through index.
As MySQL Documentation clearly says
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.

mysql does not use primary bigint index

iam fighting with some performance problems on a very simple table which seems to be very slow when fetching data by using its primary key (bigint)
I have this table with 124 million entries:
CREATE TABLE `nodes` (
`id` bigint(20) NOT NULL,
`lat` float(13,7) NOT NULL,
`lon` float(13,7) NOT NULL,
PRIMARY KEY (`id`),
KEY `lat_index` (`lat`),
KEY `lon_index` (`lon`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and a simple query which takes some id from another table using the IN clause to fetch data from the nodes tables, but it takes like 1 hour only to fetch a few rows from this table.
EXPLAIN shows me its not using the PRIMARY key as index, its simply scanning the whole table. Why that? id and the id column from the other table are both from type bigint(20).
mysql> EXPLAIN SELECT lat, lon FROM nodes WHERE id IN (SELECT node_id FROM ways_elements WHERE way_id = '4962890');
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| 1 | PRIMARY | nodes | ALL | NULL | NULL | NULL | NULL | 124035228 | Using where |
| 2 | DEPENDENT SUBQUERY | ways_elements | ref | way_id | way_id | 8 | const | 2 | Using where |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
The query SELECT node_id FROM ways_elements WHERE way_id = '4962890' simply returns two node ids, so the whole query should only return two rows, but it takes more or less 1 hour.
Using "force index (PRIMARY)" didnt help, even if it would help, why does MySQL not take that index since its a primary key? EXPLAIN doesnt even mention anything in the possible_keys columns but select_type shows PRIMARY.
Am i doing something wrong?
How does this perform?
SELECT lat, lon FROM nodes t1 join ways_elements t2 on (t1.id=t2.node_id) WHERE t2.way_id = '4962890'
I suspect that your query is checking each row in nodes against each item in the "IN" clause.
This is what is called a correlated subquery. You can see this as reference or this popular question posted on Stackoverflow. A better query to use is:
SELECT lat,
lon
FROM nodes n
JOIN ways_elements w ON n.id = w.node_id
WHERE way_id = '4962890'

mysql log_queries_not_using_indexes works wrong?

I have following procedure:
CREATE PROCEDURE getProjectTeams(IN p_idProject INTEGER)
BEGIN
SELECT idTeam, name, workersCount, confirmersCount, isConfirm
FROM Teams JOIN team_project USING (idTeam)
WHERE idProject = p_idProject;
END $$
And here are CREATE TABLE script for tables Teams and team_project:
CREATE TABLE Teams (
idTeam INT PRIMARY KEY auto_increment,
name CHAR(20) NOT NULL UNIQUE,
isConfirm BOOL DEFAULT 0,
workersCount SMALLINT DEFAULT 0,
confirmersCount SMALLINT DEFAULT 0
) engine = innodb DEFAULT CHARACTER SET=utf8 COLLATE=utf8_polish_ci;
CREATE TABLE team_project (
idTeam INT NOT NULL,
idProject INT NOT NULL,
FOREIGN KEY(idTeam) REFERENCES Teams(idTeam)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (idProject) REFERENCES Projects(idProject)
ON DELETE CASCADE
ON UPDATE CASCADE,
PRIMARY KEY(idTeam, idProject)
) engine = innodb DEFAULT CHARACTER SET=utf8 COLLATE=utf8_polish_ci;
I have few databases with identical schema on the server, but this procedure is being logged only if it is called by one database. Calles done by those othere databases are not being logged. It's not a question of being slow or not slow query (it always takes about 0.0001s). It's about why it is logged as not using indexes. How is that possible?
As Zagor23 suggested I run that EXPLAIN an and here are the results.
a) in database where procedure is logged:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | team_project | ref | PRIMARY,idProject | idProject | 4 | const | 3 | Using index |
| 1 | SIMPLE | Teams | ALL | PRIMARY | NULL | NULL | NULL | 4 | Using where; Using join buffer |
b) database, where procedure is not logged:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | team_project | ref | PRIMARY,idProject | idProject | 4 | const | 1 | Using index |
| 1 | SIMPLE | Teams | eq_ref | PRIMARY | PRIMARY | 4 | ecovbase.team_project.idTeam | 1 | |
The fact is - the data are a bit different, but not that much. The GoodDB (the one that is not logging proc) has 11 rows in Teams and 420 rows in team_project, the BadDB - 4 rows in Teams and about 800 in team_project. It doesn't seem like a bid difference. Is there a way to avoid logging that procedure?
Maybe it's not being logged because it uses indexes in those cases.
Try running
EXPLAIN SELECT idTeam, name, workersCount, confirmersCount, isConfirm
FROM Teams JOIN team_project USING (idTeam)
WHERE idProject = p_idProject;
on database where you feel it shouldn't use index and see if it really does.
MySql will use index if there is one available and suitable for the query, and if the returning result set is up to about 7-8% of the entire result set.
You say that information_schema is identical, but if the data isn't, that could be a reason for different behavior.
#Zagor23 explains why this can happen. Your tables are probbaly much bigger in this database and you haven't the appropriate indices.
My advice would be to add a UNIQUE index on table team_project, at (idProject, idTeam)
After the added EXPLAIN outputs, it seems that in the logged case, MySQL optimizer chooses a plan that doesn't need to use any index from the Team table and just scans that whole (4 rows!) table. Which is most probably faster as the table has only 4 rows.
Now, slow-log has some default settings, if I remember well, that adds in the log any query that doesn't use an index, even if the query takes 0.0001 sec to finish.
You can simply ignore this logging or change the slow-log settings to ignore queries that don't use an index. See the MySQL documenation: The Slow Query Log