which table is faster in MYSQL? - mysql

Two tables:
CREATE TABLE `htmlcode_1` (
`global_id` int(11) NOT NULL,
`site_id` int(11) NOT NULL,
PRIMARY KEY (`global_id`),
KEY `k_site` (`site_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `htmlcode_2` (
`global_id` int(11) NOT NULL,
`site_id` int(11) NOT NULL,
PRIMARY KEY (`site_id`,`global_id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
which one should be faster for selects and why?
'select * from table where site_id=%s'

The latter table is probably slightly faster for that SELECT query, assuming the table has a nontrivial number of rows.
When querying InnoDB by primary key, the lookup is against the clustered index for the table.
Secondary key lookups require a lookup in the index, then that reveals the primary key value, which is then used to do a lookup by primary key. So this uses two lookups.

The reason to use a PRIMARY KEY is to allow for either quick access OR REFERENTIAL INTEGRITY (CONSTRAINT ... FOREIGN KEY ...)
In your second example, you do not have the proper key for referential integrity if any other table refers to your table. In that case, other operations will be very very slow.
The differences in speed in either case for your particular case should be too small and trivial, but the proper design will dictate the first approach.

The first table represents many "globals" in each "site". That is, a "many-to-one" relationship. But it is the "wrong" way to do it. Instead the Globals table should have a column site_id to represent such a relationship to the Sites table. Meanwhile, the existence of htmlcode_1 is an inefficient waste.
The second table may be representing a "many-to-many" relationship between "sites" and "globals". If this is what you really want, then see my tips . Since you are likely to map from globals to sites, another index is needed.

Related

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

Is it good practice to add primary keys to a tables during the alter statement, or it makes no deference to add them when I cam creating the table

I created/defined an admin table, now I have seen other programmers alter the table and add keys to the tables
CREATE TABLE `admin` (
`admin_id` int(11) NOT NULL AUTO_INCREMENT,
`admin_name` varchar(255) NOT NULL,
`admin_surname` varchar(255) NOT NULL,
`phone` CHAR(10) NOT NULL,
`admin_email` varchar(255) NOT NULL,
`password` varchar(255) NOT NULL,
PRIMARY KEY (`admin_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `admin`
ADD PRIMARY KEY (`admin_id`),
ADD UNIQUE KEY `admin_email` (`admin_email`);
If I have already defined the table why should I alter the definition again here?
In InnoDB there exists clustered index always.
When primary key exists in a table then it is used as clustered index.
When no primary key but unique index(es) which's expression does not innclude NULLable columns exists then the most upper unique index from them in the table definition is clustered.
When no such unique index then inner hidden row number is used as an expression for clustered index.
Hence, if you create a table (and some expression is used for clustered index) and then use ALTER TABLE for to add primary key then the table must be rebuilt. It doesn't matter when the table is empty, but when there is data in it the process may be long enough (because COPY method is used).
If you create primary key as a part of CREATE TABLE then this is always fast.
I like to put all the index definitions inside the CREATE TABLE, and put them at the end instead of sitting on the column definitions.
Potential problem 1:
But I notice that some dump utilities like to add the indexes later. This may be a kludge to handle FOREIGN KEY definitions. Those have trouble if the tables are not created in just the right order.
It would seem better to simply ADD FOREIGN KEY... after all the tables are created and indexed.
Potential problem 2:
If you will be inserting a huge number of rows, it is usually more efficient to make the secondary keys after loading the data. This is more efficient than augmenting the indexes as it goes. For small tables (under, say, a million rows), this is not a big deal.
I do not understand why they ADD PRIMARY KEY after loading the data. That requires (as Akina points out) tossing the fabricated PK, sorting the data, and adding the real PK. That seems like extra work, even for a huge table.
If the rows are sorted in PK order, the loading is more efficient. The table is ordered by the PK (for InnoDB); inserting in that order is faster than jumping around. (mysqldump will necessarily provide them in PK order, so it is usually a non-issue.)

Declaring MySQL PK as only Unique Key

Can creating a UNIQUE index on an Id as shown in the create table below be enough to make the id a Primary Key? TO be more specific, can you say that the table below has a Primary Key?
test` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`role` varchar(32) NOT NULL,
`resources_name` varchar(32) NOT NULL,
`access_name` varchar(32) NOT NULL,
`allowed` int(3) NOT NULL,
UNIQUE KEY `id` (`id`),
UNIQUE KEY `roles_name` (`role`,`resources_name`,`access_name`)
) ENGINE=InnoDB AUTO_INCREMENT=32 DEFAULT CHARSET=utf8
What query can you use to prove that this has or has no PK?
Logically speaking if a relational table has at least one candidate key enforced in it (minimally unique and non-nullable) then de facto it has a "primary" key. There is no absolute need to single out any one key using a special "primary" label because in principle all keys are equal (historically the term "primary key" used to be used for any and all candidate keys and not just one key per table).
There is a constraint in SQL called a PRIMARY KEY constraint. Logically speaking, the SQL PRIMARY KEY isn't much more than syntactical sugar because the NOT NULL UNIQUE syntax achieves essentially the same thing. Technically the PRIMARY KEY constraint doesn't have to refer to the same thing as the relational concept of a "primary key" but clearly if you are going to designate any one key as primary and if you feel you need a syntactical way of indicating that choice then the PRIMARY KEY constraint is the generally recognised way to do it.
So perhaps the best answer is "it depends". It depends to a large extent on your motivation for defining a primary key in the first place. If you intend to single out one key to developers and users of the database then maybe the NOT NULL UNIQUE syntax won't achieve that for you. If you don't find the need to do that using SQL syntax then maybe NOT NULL UNIQUE is just as good a way to define your keys as the PRIMARY KEY constraint is.
This is either too long or too short for a comment: No.
A primary key and a unique key -- although similar -- are not the same. So, no your table does not have a primary key. The biggest functional difference is that primary keys cannot be NULL whereas unique keys can be.
Primary keys are also typically clustered (if the underlying storage engine supports clustered indexes). This means that the data is actually physically stored on the page in the order of the primary key. Unique keys are just another index with the characteristic of having no repeated values.
EDIT:
Interesting. SHOW COLUMNS documents this behavior:
A UNIQUE index may be displayed as PRI if it cannot contain NULL
values and there is no PRIMARY KEY in the table.
I wasn't aware of this.

Mysql design for logtable

I would like to have advices about a mysql table design for a event logger.
Our needs :
- track a lot of action
- 10 000 actions / second
- 1 billion row at this time
Our hardware :
- 2*Xeon (seen as 32 CPU by the system)
- 128 GB RAM
- 6*600 SSD with Raid 10
Our table design :
CREATE TABLE IF NOT EXISTS `log_event` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_event` smallint(6) NOT NULL,
`id_user` bigint(20) NOT NULL,
`date` int(11) NOT NULL,
`data` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_event_2` (`id_event`,`data`),
KEY `id_inscri` (`id_inscri`),
KEY `date` (`date`),
KEY `id_event_4` (`id_event`,`date`,`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
ALTER TABLE `log_event`
ADD CONSTRAINT `log_event_ibfk_1` FOREIGN KEY (`id_inscri`) REFERENCES `inscription` (`id_inscri`) ON DELETE CASCADE ON UPDATE CASCADE;
Our problem :
- We have an auto-increment as primary, but it is not really used. Is it a problem to remove it ? We will no have primary key if we remove it => How to identify a line ?
We would like to do partionning, but with the foreign it seems to be impossible ?
We don't do bulk insert. Is it a good idea to insert in a Memory table without index and copy data every 5 minutes ?
Do you have any idea to optimize ? Do you have best practice for this kind of system ?
Thanks !
François
Primary keys of relational tables (relations) might have two types:
Natural - exists in subject area to completely determine each row of relational table.
Natural primary keys might be simple (if consists of only one column), or complex (if consists more than one column). It is not recomended to set a natural primary key on large string column.
Artificial - special column, injected by database designer / developer to boost table performance, if natural key is complex, and have to be used in related table (is foreign key for something), or if it is simple, but is large and will produce data overhead while copied in related table as a foreign key, or if it is complex to search (for example, CRUD operations on VARCHAR IDs might be slower, than on INT IDs). There might be other reasons. TL;DR: Artificial key - one special column, serving to completely determine each row of relational table and boost it's performance for CRUD operations.
We have an auto-increment as primary, but it is not really used. Is it
a problem to remove it ? We will no have primary key if we remove it
=> How to identify a line ?
If you do not need to reference your table to another tables (as source), then you may probably remove artificial key without any consequences. Still, I recomend you set any other PRIMARY KEY in this table to avoid data duplication, and for obviosity (if it matters).
Your table by itself (if properly normalized) will have natural key as one of "key candidates". It might be complex one (consist of few columns). It is normal. But don't set primary for strings, because PRIMARY always have index, which will produce data overhead. If it is combination of INT or "small" VARCHAR columns, then it is normal.
Consider as an option: id_event + id_user + date.
We don't do bulk insert. Is it a good idea to insert in a Memory table
without index and copy data every 5 minutes ?
It is not a bad idea. But it is not good idea, until it properly tested. Try to perform load-test, before real use.
If you not reference MEMORY table to others, then you still may join it with any other InnoDB table. But you will loose InnoDB functionality (referential integrity). If lose of parent table ON DELETE CASCADE ON UPDATE CASCADE is not a concern, then it might be done. As for me, InnoDB is not so slow to switch table engine, in your case.

MySQL - One To One Relationship?

I'm trying to achieve a "One to one" relationship in a MySQL database. For example, let's say I have a Users table and an Accounts table. I want to be sure that a User can have only one Account. And that there can be only one Account per User.
I found two solutions for this but don't know what to use, and are there any other options.
First solution:
DROP DATABASE IF EXISTS test;
CREATE DATABASE test CHARSET = utf8 COLLATE = utf8_general_ci;
USE test;
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
user_name VARCHAR(45) NOT NULL,
PRIMARY KEY(id)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
CREATE TABLE accounts(
id INT NOT NULL AUTO_INCREMENT,
account_name VARCHAR(45) NOT NULL,
user_id INT UNIQUE,
PRIMARY KEY(id),
FOREIGN KEY(user_id) REFERENCES users(id)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
In this example, I define the foreign key in accounts pointing to the primary key in users.
And then I make foreign key UNIQUE, so there can't be two identical users in accounts.
To join tables I would use this query:
SELECT * FROM users JOIN accounts ON users.id = accounts.user_id;
Second solution:
DROP DATABASE IF EXISTS test;
CREATE DATABASE test CHARSET = utf8 COLLATE = utf8_general_ci;
USE test;
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
user_name VARCHAR(45) NOT NULL,
PRIMARY KEY(id)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
CREATE TABLE accounts(
id INT NOT NULL AUTO_INCREMENT,
account_name VARCHAR(45) NOT NULL,
PRIMARY KEY(id),
FOREIGN KEY(id) REFERENCES users(id)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;
In this example, I create a foreign key that points from the primary key to a primary key in another table. Since Primary Keys are UNIQUE by default, this makes this relation One to One.
To join tables I can use this:
SELECT * FROM users JOIN accounts ON users.id = accounts.id;
Now the questions:
What is the best way to create One to One relation in MySQL?
Are there any other solutions other than these two?
I'm using MySQL Workbench, and when I design One To One relation in EER diagram and let MySQL Workbench produce SQL code, I get One to Many relation :S That's what's confusing me :S
And if I import any of these solutions into MySQL Workbench EER diagram, it recognizes relations as One to Many :S That's also confusing.
So, what would be the best way to define One to One relation in MySQL DDL. And what options are there to achieve this?
Since Primary Keys are UNIQUE by default, this makes this relation One to One.
No, that makes the relation "one to zero or one". Is that what you actually need?
If yes, then then your "second solution" is better:
it's simpler,
takes less storage1 (and therefore makes cache "larger")
hes less indexes to maintain2, which benefits data manipulation,
and (since you are using InnoDB) naturally clusters the data, so users that are close together will have their accounts stored close together as well, which may benefit cache locality and certain kinds of range scans.
BTW, you'll need to make accounts.id an ordinary integer (not auto-increment) for this to work.
If no, see below...
What is the best way to create One to One relation in MySQL?
Well, "best" is an overloaded word, but the "standard" solution would be the same as in any other database: put both entities (user and account in your case) in the same physical table.
Are there any other solutions other than these two?
Theoretically, you could make circular FKs between the two PKs, but that would require deferred constraints to resolve the chicken-and-egg problem, which are unfortunately not supported under MySQL.
And if I import any of these solutions into MySQL Workbench EER diagram, it recognizes relations as One to Many :S Thats also confusing.
I don't have much practical experience with that particular modeling tool, but I'm guessing that's because it is "one to many" where "many" side was capped at 1 by making it unique. Please remember that "many" doesn't mean "1 or many", it means "0 or many", so the "capped" version really means "0 or 1".
1 Not just in the storage expense for the additional field, but for the secondary index as well. And since you are using InnoDB which always clusters tables, beware that secondary indexes are even more expensive in clustered tables than they are in heap-based tables.
2 InnoDB requires indexes on foreign keys.
Your first approach creates two candidate keys in the accounts table: id and user_id.
I therefore suggest the second approach i.e. using the foreign key as the primary key. This:
uses one less column
allows you to uniquely identify each row
allows you to match account with user
What about the following approach
Create Table user
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Create Table account with a unique index on user_id and account_id with a foreign key relation to user/account and a primary key on user_id and account_id
CREATE TABLE `account` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Create Table user2account
CREATE TABLE `user2account` (
`user_id` int(11) NOT NULL,
`account_id` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`account_id`),
UNIQUE KEY `FK_account_idx` (`account_id`),
UNIQUE KEY `FK_user_idx` (`user_id`),
CONSTRAINT `FK_account` FOREIGN KEY (`account_id`) REFERENCES `account` (`id`),
CONSTRAINT `FK_user` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
While this solution has the largest footprint in the database, there are some advantages.
Putting the FK_Key in either the user table or the account table is something that I expect to be a one to many releation (user has many accounts ...)
While this user2account approach is mainly used to define a many to many relationship, adding a UNIQUE constraint on user_id and on account_id will prevent creating something else than a one to one relation.
The main advantage I see in this solution is that you can divide the work in different code layers or departements in a company
Department A is responsible for creating users, this is possible even without write permission to accounts table
Departement B is responsible for creating accounts, this is possible even without write permission to user table
Departement C is responsible for creating the mapping, this is possible even without write permission to user or account table
Once Departement C has created a mapping neither the user nor the account can be deleted by departement A or B without asking departement C to delete the mapping first.