mysql insert vs update performance - mysql

all:
I have a table to record the number of some requests on some dimensions every ten minutes. Here is my table:
CREATE TABLE IF NOT EXISTS `mydb`.`realtime_bid_traffic` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT COMMENT '',
`owner_id` BIGINT(20) NOT NULL COMMENT '',
`log_time` DATETIME NOT NULL COMMENT '',
`bid_num` BIGINT(10) NOT NULL DEFAULT 0 COMMENT '',
`v_bid_num` BIGINT(10) NOT NULL DEFAULT 0 COMMENT '',
PRIMARY KEY (`id`) COMMENT '',
UNIQUE INDEX `dim_key` USING BTREE (`owner_id` ASC, `log_time` ASC) COMMENT '')
ENGINE = InnoDB;
As you can see, id is an auto increment big integer without any particular meaning. owner_id and log_time is the dimension key while bid_num and v_bid_num is what to be updated. Limited by the business logic it's impossible for me to collect all data before inserting into database, i.e. I may have to insert into database where owner_id=10 and log_time='2015-11-11 11:00:00' two times. Since the table may be quite large (millions of rows) and need to be updated constantly, I have two options:
Insert or update on duplicate key. In this way for each dimension
there will only one row but it involves updates and in order to
improve performance I have built unique key for owner_id and
log_time.
Just insert. In this case I'll remove the unique key for
owner_id and log_time and just insert into database. Since id is the
primary key it will never duplicate, but it may increase table rows
significantly.
I have no idea which may be better from the view of performance.

This is a bit long for a comment.
If you only care about inserting into the table, then the second option is generally faster. Under most circumstances, inserting a new row is faster than a check-for-duplicates-and-insert/update approach. Even as the table grows really big, this remains true. This will remain true as long as the indexes fit into memory.
However, often data has other uses than merely being put into a table. For many querying purposes, not having duplicates might significantly help queries. If you are querying by user_id/log_time (as suggested by the index), then handling the duplicates on the querying side should be trivial -- two rows versus one row has minimal impact and order by id desc limit 1 takes very few resources on two rows.
(Hmmm, I suppose there is an edge case where inserting into a table with billions of rows with an index would be slower than inserting into a table with 10 rows while checking for duplicates, because the index update would be slower than the check-for-duplicates query. However, your use-case is sufficiently far from this situation because you are only talking about 2 duplicates per row.)

Plan A
PRIMARY KEY(id),
UNIQUE(owner_id, log_time)
Every insert must check both keys for dups; this slows down inserts.
Plan B
PRIMARY KEY(id),
INDEX(owner_id, log_time)
This requires that your SELECT code do some type of GROUP BY and aggregation.
Plan C
PRIMARY KEY(owner_id, log_time)
and no id. Why do you have id, anyway? While Plans A and B are always inserting into the data at the "end" of the table (because of AUTO_INCREMENT), Plan C will have multiple "hot spots", one per owner_id. This is OK.
Plan D
INDEX(id),
PRIMARY KEY(owner_id, log_time)
If Plan C is not acceptable, Plan D lets you keep id. No, an AUTO_INCREMENT does not have to be the PRIMARY KEY. IODKU is needed.
Which?
All but Plan B need IODKU (Insert on duplicate key update). But I don't see this as a serious drawback.
Plans C and D probably improve performance of SELECTs, especially if you select by one owner_id.
I prefer the Plans in this order: C, D, B, A. You pick, based on the constraints you can/cannot live with.

Related

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

Should I create index on a SQL table column used frequently in WHERE select clause?

So I wonder should I add non-clustered index to a non-unique values column in SQL 2008 R2 table.
Simplified Example:
SELECT Id, FirstName, LastName, City
FROM Customers
WHERE City = 'MyCity'
My understanding is that the primary key [Id] should be the clustered index.
Can non-clustered index be added to the non-unique column [City] ?
Is this going to improve performance or I should not bother at all.
Thanks.
I was thinking to do clustered index as:
CREATE UNIQUE CLUSTERED INDEX IDX_Customers_City
ON Customers (City, Id);
or non-clustered, assuming there is already clustered index on that table.
CREATE NONCLUSTERED INDEX IX_Customers_City
ON Customers (City, Id);
In reality I am dealing with millions of records table. The Select statement returns 0.1% to 5% of the records
Generally yes - you would usually make the clustered index on the primary key.
The exception to this is when you never make lookups based on the primary key, in which case putting the clustered index on another column might be more pertinent.
You should generally add non-clustered indexes to columns that are used as foreign keys, providing there's a reasonably amount of diversity on that column, which I'll explain with an example.
The same applies to columns being used in where clauses, order by etc.
Example
CREATE TABLE Gender (
GenderId INT NOT NULL PRIMARY KEY CLUSTERED
Value NVARCHAR(50) NOT NULL)
INSERT Gender(Id, Value) VALUES (1, 'Male'), (2, 'Female')
CREATE TABLE Person (
PersonId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Name NVARCHAR(50) NOT NULL,
GenderId INT NOT NULL FOREIGN KEY REFERENCES Gender(GenderId)
)
CREATE TABLE Order (
OrderId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
OrderDate DATETIME NOT NULL DEFAULT GETDATE(),
OrderTotal DECIMAL(14,2) NOT NULL,
OrderedByPersonId INT NOT NULL FOREIGN KEY REFERENCES Person(PersonId)
)
In this simple set of tables it would be a good idea to put an index on The OrderedByPersonId Column of the Order table, as you are very likely to want to retrieve all the orders for a given person, and it is likely to have a high amount of diversity.
By a high amount of diversity (or selectiveness) I mean that if you have say 1000 customers, each customer is only likely to have 1 or 2 orders each, so looking up all the values from the order table with a given OrderedByPersonId will result in only a very small proportion of the total records in that table being returned.
By contrast there's not much point in putting an index on the GenderId column in the Person table, as it will have a very low diversity. The query optimiser would not use such an index, and INSERT/UPDATE statements would be a marginally slower because of the extra need to maintain the index.
Now to go back to your example - the answer would have to be "it depends". If you have hundreds of cities in your database then yes, it might be a good idea to index that column
If however you only have 3 or 4 cities, then no - don't bother. As a guideline I might say if the selectivity of the column is 0.9 or higher (ie a where clause selecting a single value in the column would result in only 10% or less of the rows being returned) an index might help, but this is by no means a hard and fast figure!
Even if the column is very selective/diverse you might not bother indexing it if queries are only made very infrequently on it.
One of the easiest things to do though is try your queries with the execution plan displayed in SQL management studio. It will suggest indexes for you if the query optimiser thinks that they'll make a positive impact.
Hope that helps!
If you use the query frequently or if you sort by city regularly in on-line applications specially if your table is dense or has a large row size, it makes sense to add an index. Too many indexes slow down your insert and update. An evaluation of the actual value would only be appreciated when you have significant data in the table.

MySQL Reducing Storage Space

I have a table definition:
CREATE TABLE `k_timestamps` (
`id` bigint(20) NOT NULL,
`k_timestamp` datetime NULL DEFAULT NULL,
`data1` smallint(6) NOT NULL,
KEY `k_timestamp_key` (`k_timestamp`,`id`) USING BTREE,
CONSTRAINT `k_time_fk` FOREIGN KEY (`id`) REFERENCES `data` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Basically, I have a whole lot of id and data1 key-value pairs, and every few hours I either add new key-value pairs not seen before to the list, or the value of a previous id has changed. I want to track what all the values were for every id in time. Thus, the id column can contain duplicate id's and is not the primary key.
Side note, k_time_fk points to another, much smaller table that has common information for a particular id regardless of what the current time is or value it currently holds.
(id, k_timestamp) should be thought of as the (composite) primary key of the table.
For example,
id k_timestamp data1
1597071247 2012-11-15 12:25:47 4
1597355222 2012-11-15 12:25:47 4
1597201376 2012-11-15 12:25:47 4
1597071243 2012-11-15 13:25:47 4
1597071247 2012-11-15 13:25:47 3
1597071249 2012-11-15 13:25:47 3
Anyways, I ran this query:
SELECT concat(table_schema,'.',table_name),
concat(round(table_rows/1000000,2),'M') rows,
concat(round(data_length/(1024*1024*1024),2),'G') DATA,
concat(round(index_length/(1024*1024*1024),2),'G') idx,
concat(round((data_length+index_length)/(1024*1024*1024),2),'G') total_size,
round(index_length/data_length,2) idxfrac
FROM information_schema.TABLES ORDER BY data_length+index_length DESC LIMIT 20;
To pull space info on my table:
rows Data idx total_size idxfrac
11.25M 0.50G 0.87G 1.36G 1.76
I'm not really sure I understand this, how can the index be taking up so much space? Is there something obvious I did wrong here, or is this normal? I'm looking to try to reduce to footprint of this table if possible. I'm not even really sure what that k_timestamp_key really buys for me, can it be safely deleted?
The index is bigger because InnoDB tables will assign a 6 byte primary key when you have no unique column that it can treat as a unique index. All other indexes in the table also contain the primary key... see 14.2.3.12.2. Clustered and Secondary Indexes from the manual
Firstly, yes, this is pretty normal behaviour, as innvo writes.
Secondly, you can optimize the table and its index using OPTIMIZE TABLE. As your primary key is likely to be "fragmented" - i.e. it's not safe to assume that an inserted row is physically next to the previous row - there may be some gains there.
Finally, you may not need a primary key on the table, but you almost certainly need an index if you're querying across millions of rows...

A simple INSERT query on InnoDB taking too much

I have this simple query:
INSERT IGNORE INTO beststat (bestid,period,rawView) VALUES ( 4510724 , 201205 , 1 )
On the table:
CREATE TABLE `beststat` (
`bestid` int(11) unsigned NOT NULL,
`period` mediumint(8) unsigned NOT NULL,
`view` mediumint(8) unsigned NOT NULL DEFAULT '0',
`rawView` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`bestid`,`period`),
) ENGINE=InnoDB AUTO_INCREMENT=2020577 DEFAULT CHARSET=utf8
And it takes 1 sec to completes.
Side Note: actually it doesn't take always 1sec. Sometime it's done even in 0.05 sec. But often it takes 1 sec
This table (beststat) currently has ~500'000 records and its size is: 40MB. I have 4GB RAM and innodb buffer pool size = 104,857,600, with: Mysql: 5.1.49-3
This is the only InnoDB table in my database (others are MyISAM)
ANALYZE TABLE beststat shows: OK
Maybe there is something wrong with InnoDB settings?
I ran some simulations about 3 years ago as part of some evaluation project for a customer. They had a requirement to be able to search a table where data is constantly being added, and they wanted to be up to date up to a minute.
InnoDB has shown much better results in the beginning, but has quickly deteriorated (much before 1mil records), until I have removed all indexes (including primary). At that point InnoDB has become superior to MyISAM when executing inserts/updates. (I have much worse HW then you, executing tests only on my laptop.)
Conclusion: Insert will always suffer if you have indexes, and especially unique.
I would suggest following optimization:
Remove all indexes from your beststat table and use it as a simple dump.
If you really need these unique indexes, consider some programmable solution (like remembering the max bestid at all time, and insisting that the new record is above that number - and immediately increasing this number. (But do you really need so many unique fields - and they all sound to me just like indexes.)
Have a background thread move new records from InnoDB to another table (which can be MyISAM) where they would be indexed.
Consider dropping indexes temporarily and then after bulk update re-indexing the table, possibly switching two tables so that querying is never interrupted.
These are theoretical solutions, I admit, but is the best I can say given your question.
Oh, and if your table is planned to grow to many millions, consider a NoSQL solution.
So you have two unique indexes on the table. You primary key is a autonumber. Since this is not really part of the data as you add it to the data it is what you call a artificial primary key. Now you have a unique index on bestid and period. If bestid and period are supposed to be unique that would be a good candidate for the primary key.
Innodb stores the table either as a tree or a heap. If you don't define a primary key on a innodb table it is a heap if you define a primary key it is defined as a tree on disk. So in your case the tree is stored on disk based on the autonumber key. So when you create the second index it actually creates a second tree on disk with the bestid and period values in the index. The index does not contain the other columns in the table only bestid, period and you primary key value.
Ok so now you insert the data first thing myself does is to ensure the unique index is always unique. Thus it read the index to see if you are trying to insert a duplicate value. This is where the slow down comes into play. It first has to ensure uniqueness then if it passes the test write data. Then it also has to insert the bestid, period and primary key value into the unique index. So total operation would be 1 read index for value 1 insert row into table 1 insert bestid and period into index. A total of three operations. If you removed the autonumber and used only the unique index as the primary key it would read table if unique insert into table. In this case you would have the following number of operations 1 read table to check values 1 insert into tables. This is two operations vs three. So you do 33% less work by removing the redundant autonumber.
I hope this is clear as I am typing from my Android and autocorrect keeps on changing innodb to inborn. Wish I was at a computer.

Query optimization

SELECT nar.name, nar.reg, stat.lvl
FROM members AS nar
JOIN stats AS stat
ON stat.id = nar.id
WHERE nar.ref = 9
I have indexes on id in both tables and I have index referavo either. But still, it checks all rows in stats table (I use Explain to get this information), but in members table it checks only one row how it supposed to be. What's wrong with stats table? Thank you very much.
CREATE TABLE `members` (
`id` int(11) NOT NULL
`ref` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT
CREATE TABLE `stats` (
`id` int(11) NOT NULL AUTO_INCREMENT
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=37 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE stat ALL PRIMARY NULL NULL NULL 22
1 SIMPLE nar eq_ref PRIMARY PRIMARY 4 table_nme.stat.id 1 Using where
Your tables are ridiculously small - just 23 rows is tiny.
MySQL chooses different query plans depending on how many rows there are in the table and based on how many it estimates will be selected (from the statistics). You should performance test your queries with realistic data - both the amount of data and the distribution of values in the data should be as realistic as possible. Otherwise the query plan MySQL chooses in testing might not be the same the actual query plan for your live system.
Your tables are so small that using an index could be slower than just checking the table directly. Remember that checking data that is already in memory is fast, but reads are slow. Accessing an index can require an extra read - first the index has to be fetched and read to find which rows to select, then if your index isn't a covering index the relevant rows in the table have to be fetched and read to get the values that aren't in the index. MySQL is perfectly entitled to not use an index even if one is available if it believes that doing so will result in a slower plan.
Put some more rows in your table (thousands) and try running EXPLAIN again. You will probably find that when you have more rows that the PRIMARY KEY index will be used for the join.
MySQL can use only one index at a time per table, thus it sees the member row using the index, and then performs a sequential search for the ID.
You have to create a multi columns index for the members table
CREATE INDEX idref ON members(id,ref);
please try the reverse one as well if it doesn't get better (first: drop index idref on members)
CREATE INDEX idref ON members(ref,id);
(I cannot try it myself now)