I am almost done building a website that I target to have 10,000 users. It's free, so I'd like to keep the cost as low as possible.
All but two tables are less than 100,000 rows (read only). Off the remaining, one table will have about 5,200 rows per user in total and nothing less. The other I estimate about 1.5mn rows per user over two years, assuming they continue using it that long.
The latter table is as follows, and the former is the same except for col3...
CREATE TABLE `my_table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(11) NOT NULL,
`col2` mediumint(8) unsigned NOT NULL,
`col3` smallint(5) unsigned NOT NULL,
`col4` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `fk1_ix` (`col1`),
KEY `fk2_ix` (`col2`),
KEY `fk3_ix` (`col3`),
CONSTRAINT `fk1` FOREIGN KEY (`col1`) REFERENCES `pktbl1` (`id`) ON UPDATE CASCADE,
CONSTRAINT `fk2` FOREIGN KEY (`col2`) REFERENCES `pktbl2` (`id`) ON UPDATE CASCADE,
CONSTRAINT `fk3` FOREIGN KEY (`col3`) REFERENCES `pktbl3` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
Both the tables will on average be written to about 10-20 times a day, and read about 4-5 times, for every active user.
I'd like to estimate my running cost and have two primary questions and appreciate any other inputs.
1) Will MySQL reasonably be able to handle my load.
2) How much CPU / RAM do you reckon I'd need to handle my load with a response time / lag of up to one second.
My website is designed using PHP Yii2 framework, so I can just switch databases, if required. The queries are simple inserts and indexed select statements.
20 Billion rows, mostly in small tables? That sounds like 1 terabyte of disk space. Plan for that.
200K queries (write or read) per day? That's only a few per second. No problem on any server. This assumes, however, that the queries are not too complex.
Will MySQL handle it? Look around in this forum (better yet, dba.stackoverflow), you will see much bigger systems being discussed.
CPU -- Usually the least of the problems.
RAM -- Depends on the queries. These days, I would not start with anything less than 4GB.
Cloud -- It's a viable option. You pay extra, but have fewer hassles, especially if you need to upgrade.
Get 100 users on the system, then take stock of what you have. See what the numbers look like.
If you ever get into performance problems, first look at the queries and indexes to see if the slowest query can be improved.
There isn't really a way to estimate, your requirements are very low. If you are able to do some load testing that will help you.
However if you can tolerate a few minutes downtime, you are able to scale your Cloud SQL instance up or down. This might give you confidence that starting on an f1-micro or g1-small does not prevent you from upgrading should performance not meet your needs.
Related
I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.
I have multiple big tables for business data with smallest one having 38million rows(24G data, 26G index size). I have indexes setup to speed up the lookups and buffer pool set to 80% of total RAM(116G). Even after these settings, over time we have started observing performance issues. I have constraints with the disk size(1T) and sharding is not an option currently. The data growth has increased to 0.5M rows per day. This is leading to frequent optimisation and master switch exercises. Table schemas and indexes have already been optimised. Hence, I have started looking at partitioning the table to improve performance. My primary partitioning use case is to delete data on monthly basis by dropping partitions so that optimisations are not required and read/write latencies are improved. Following is the structure for one of the big tables(column names have been changed for legal reasons - assume that the columns where indexes are defined have lookup use cases):
CREATE TABLE `table_name` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data_1` int(11) NOT NULL,
`data_2` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
`data_3` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`data_4` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_data1` (`data_1`),
KEY `index_data2` (`data_2`)
) ENGINE=InnoDB AUTO_INCREMENT=100572 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I am planning to partition on the created_at column. However, the problem is that the partitioning column has to be part of all the unique keys. I can add the created_at column to the primary key but that would lead to increase in index size which in turn has its own side effects. Is there some workaround or any better solution?
Apart from solving this problem, there are few more questions whose answers couldn't be found in any documentation or articles present.
1. Why does mysql warrant partitioning column to be part of unique key?
2. The queries from the ORM don't have created_at clause present that means pruning is not possible with reads which we were okay with provided inserts are always pruned. However, doesn't look like this is the case. Why does mysql open all the partitions for inserts?
Mysql Version - 5.6.33-79.0-log Percona Server (GPL), Release 79.0, Revision 2084bdb
PRIMARY KEY(id, created_at) will take only an tiny bit more space than PRIMARY KEY(id). I estimate it at much less than 1% for your data. I can't tell about the index space -- can you show us the non-primary index(es)?
Explanation: The leaf nodes of the data (which is a BTree organized by the PK), will not change in size. The non-leaf nodes will have created_at added to each 'row'. As a rule of thumb in InnoDB, non-leaf nodes take up about 1% of the space for the BTree.
For the INDEX BTrees, the leaf nodes need an extra 4 bytes/row for created_at unless created_at is already in the index.
Let's say you currently have INDEX(foo) where foo is INT and id is also INT. That's a total of 8 bytes (plus overhead). Adding created_at (a 4-byte TIMESTAMP) expands each leaf 'row' to 12+overhead. So, that index may double in size.
A guess: Your 24G+26G might grow to 25G+33G.
It sounds like you have several indexes. You do understand that INDEX(a) is not useful if you also have INDEX(a,b)? And that INDEX(x,y) is a lot better than INDEX(x), INDEX(y) in some situations? Let's discuss your indexes.
The main benefit for PARTITIONing is your use case -- DROP PARTITION is a lot faster than DELETE. My blog on such.
Don't be lulled by partitioning. You are hoping for "read/write latencies are improved"; such is not likely to happen. If you would like further explanation please provide a SELECT where you think it might happen.
How many "months" will you partition on? I recommend not more than 50. PARTITIONing has some inefficiencies when there are lots of partitions.
Because of the need for the partition key to be in UNIQUE keys, the uniqueness constraint is almost totally useless. Having it on the end of an AUTO_INCREMENT id is not an issue.
Consider whether something other than id can be the PK.
Question 1: When INSERTing a row, all UNIQUE keys are immediately checked for "dup key". Without the partition key being part of the unique key, this would mean probing every partition. This is too costly to contemplate; so it was not done. (In the future, a 'global-to-the-table' UNIQUE key may be implemented. Version 8.0 has some hooks for such.)
Question 2a: Yes, if the SELECT's WHERE does not adequately specify the partition key, all partitions will be opened and looked at. This is another reason to minimize the number of partitions. Hmmm... If you do a SELECT on the 31st of the month and do the same SELECT the next day, you could get fewer rows (even without any deletes, just the DROP PARTITION); this seems "wrong".
Question 2b: "Why does mysql open all the partitions for inserts?" -- What makes you think it does? There is an odd case where the "first" partition is 'unnecessarily' opened -- the partition key is DATETIME.
I am building a website (LAMP stack) with an Amazon RDS MySQL instance as the back end (type db.m3.medium).
I am happy with database integrity, and it works perfectly with regards to SELECT/JOIN/ETC queries (everything is normalized, indexed, and foreign keyed, all tables have id primary keys and relevant secondary keys / unique keys).
I have a table 'df_products' with approx half a million products in it. The products need to be updated nightly. The process involves a PHP script reading over a large products data-file and inserting data into several tables (products table, product_colours table, brands table, etc), calling either INSERT or UPDATE depending on whether or not a row already exists. This is done as one giant transaction.
What I am seeing is the UPDATE commands are sufficiently fast (50/sec, not exactly lightning but it should do), however the INSERT commands are super slow (1/sec) and appear to be consuming 100% of the CPU. On a dual core instance we see 50% CPU use (i.e. one full core).
I assume that this is because indexes (1x PRIMARY + 5x INDEX + 1x UNIQUE + 1x FULLTEXT) are being rebuilt after every INSERT. However I though that putting the entire process into one transaction should stop indexes being rebuilt until the transaction is committed.
I have tried setting the following params via PHP but there is negligible performance improvement:
$this->db->query('SET unique_checks=0');
$this->db->query('SET foreign_key_checks=0;');
The process will take weeks to complete at this rate so we must improve performance. Google appears to suggest using LOAD DATA. However:
I would have to generate five files in order to populate five tables
The process would have to use UPDATE commands as opposed to INSERT since the tables already exist
I would still need to loop over the products and scan the database for what values already do and don't exist
The database is entirely InnoDB and I don't plan to move to MyISAM (I want transactions, foreign keys, etc). This means that I cannot disable indexes. Even if I did it would probably be a big performance drain as we need to check if a row already exists before we insert it, and without an index this will be super slow.
I have provided the products table defition below for information. Can you please provide advice to what process we should be using to achieve faster INSERT/UPDATE on multiple large related tables? Or what optimisations we can make to our existing process?
Thank you,
CREATE TABLE `df_products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_brand` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`id_gender` int(11) NOT NULL,
`id_colourSet` int(11) DEFAULT NULL,
`id_category` int(11) DEFAULT NULL,
`desc` varchar(500) DEFAULT NULL,
`seoAlias` varchar(255) CHARACTER SET ascii NOT NULL,
`runTimestamp` timestamp NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `seoAlias_UNIQUE` (`seoAlias`),
KEY `idx_brand` (`id_brand`),
KEY `idx_category` (`id_category`),
KEY `idx_seoAlias` (`seoAlias`),
KEY `idx_colourSetId` (`id_colourSet`),
KEY `idx_timestamp` (`runTimestamp`),
KEY `idx_gender` (`id_gender`),
FULLTEXT KEY `fulltext_title` (`title`),
CONSTRAINT `fk_id_colourSet` FOREIGN KEY (`id_colourSet`) REFERENCES `df_productcolours` (`id_colourSet`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_id_gender` FOREIGN KEY (`id_gender`) REFERENCES `df_lu_genders` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=285743 DEFAULT CHARSET=utf8
How many "genders" are there? If the usual 2, don't normalize it, don't index it, don't us a 4-byte INT to store it, use a CHAR(1) CHARACTER SET ascii (only 1 byte) or an ENUM (1 byte).
Each unnecessary index is a performance drain on the load, regardless of how it is done.
For INSERT vs UPDATE, look into using INSERT ... ON DUPLICATE KEY UPDATE.
Load the nightly data into a separate table (this could be MyISAM with no indexes). Then run one query to update existing rows and one to insert new rows. (Each needs a JOIN.) See http://mysql.rjweb.org/doc.php/staging_table, especially the 2 SQLs used for "normalizing". They can be adapted to your situation.
Any kind of multi-row query runs noticeably faster than 1-row at a time. (A 100-row INSERT runs 10 times as fast as 100 1-row inserts.)
innodb_flush_log_at_trx_commit = 2 will let the individual write statements run much faster. (Batching them as I suggest won't speed up much.)
I have a client who has asked me to tune his MySQL database in order to implement some new features and to improve the performance of an already existing web app.
The biggest table (~90 GB) has over 200M rows, and is growing at periodic intervals (one per visit to any of the websites he owns). Having continuous INSERTs, each SELECT query performed from the backend page takes a while to complete, as indexes are regenerated each time.
I've done a simulation on my own server switching from BTREE indexes to HASH indexes. Both SELECTs and INSERTs are not running any faster. The table uses MyISAM as storage engine. There are only INSERTs and SELECTs, no UPDATEs or DELETEs.
I've came up with the idea of creating an auxiliary table updated together with each INSERT to speed up every SELECT query coming from the backend. I know this is bad practice, but, I'm sure the performance will improve for the statistics page.
I'm not a database performance expert, as you may have noticed... Is there a better approach for this?
By the way, from phpMyAdmin I've seen that most indexes on the table have a cardinality of 0. In my simulation, this didn't happen. I'm not sure why is this happening.
Thanks a lot.
1st update: I've just learned that hash index isn't available for MyISAM engine.
2nd update: OK. Here's the table schema.
CREATE TABLE `visits` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime` int(8) NOT NULL,
`webmaster_id` char(18) NOT NULL,
`country` char(2) NOT NULL,
`connection` varchar(15) NOT NULL,
`device` varchar(15) NOT NULL,
`provider` varchar(100) NOT NULL,
`ip_address` varchar(15) NOT NULL,
`url` varchar(300) NOT NULL,
`user_agent` varchar(300) NOT NULL,
PRIMARY KEY (`id`),
KEY `datetime` (`datetime`),
KEY `webmaster_id` (`webmaster_id`),
KEY `country` (`country`),
KEY `connection` (`connection`),
KEY `device` (`device`),
KEY `provider` (`provider`)
) ENGINE=InnoDB;
So, instead of performing queries like select count(*) from visits where datetime=20140715 and device="ios", won't it be best to fetch this from select count from visits_stats where datetime=20140715 and device="ios"?
INSERTs are, as said, much more frequent than SELECTs, but my client wants to improve the performance of the backend used to retrieve aggregated data. Using my approach, each visit would imply one INSERT and one INSERT/UPDATE (or REPLACE) which would increment one or more counters (I haven't decided the schema for the visits_stats table yet, the above query was just an example).
Apart from this, I've decided to replace some of the fields by their appropriate IDs from a foreign table. So far, data is stored in strings like connection=cable, device=android, and so on. I'm not sure how would this affect performance.
Thanks again.
Edit: I said before not to use partitions. But Bill is right that the way he described would work. Your only concern would be if you tried to select across the 101 partitions, then the whole thing would come to a standstill. If you don't intend to do this then partitioning would solve the problem. Fix your indexes first though.
Your primary problem is that MyISAM is not the best engine, neither is InnoDB. TokuDB would be your best bet, but you'd have to install that on the server.
Now, you need to prune your indexes. This is the major reason for the slowness. Remove an index on everything that isn't part of common SELECT statements. Add an multi-column index on exactly what is requested in the WHERE of your SELECT statements.
So (in addition to your primary key) you want an index on datetime, device only as a multi-column index, according to your posted SELECT statement.
If you change to TokuDB the inserts will be much faster, if you stick with MyISAM then you could speed the whole thing up by using INSERT DELAYED instead of INSERT. The only issue with this is that the inserts will not be live, but will be added whenever MySQL decides there is not too much load.
Alternatively, if the above still does not help, your final option would be to use two tables. One table that you SELECT from, and another that you INSERT to. Once an day or so you would then copy the insert table to the select table. Though this means the data in your select table could be up to 24 hours old.
Other than that you would have to completely change the table structure, for which I can't tell you how to do because it depends on what you are using it for exactly, or use something other than MySQL for this. However, my above optimizations should work.
I would suggest looking into partitioning. You have to add datetime to the primary key to make that work, because of a limitation of MySQL. The primary or unique keys must include the column by which you partition the table.
Also make the index on datetime into a compound index on (datetime, device). This will be a covering index for the query you showed, so the query can get its answer from the index alone, without having to touch table rows.
CREATE TABLE `visits` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`datetime` int(8) NOT NULL,
`webmaster_id` char(18) NOT NULL,
`country` char(2) NOT NULL,
`connection` varchar(15) NOT NULL,
`device` varchar(15) NOT NULL,
`provider` varchar(100) NOT NULL,
`ip_address` varchar(15) NOT NULL,
`url` varchar(300) NOT NULL,
`user_agent` varchar(300) NOT NULL,
PRIMARY KEY (`id`, `datetime`), -- compound primary key is necessary in this case
KEY `datetime` (`datetime`,`device`), -- compound index for the SELECT
KEY `webmaster_id` (`webmaster_id`),
KEY `country` (`country`),
KEY `connection` (`connection`),
KEY `device` (`device`),
KEY `provider` (`provider`)
) ENGINE=InnoDB
PARTITION BY HASH(datetime) PARTITIONS 101;
So when you query for select count(*) from visits where datetime=20140715 and device='ios', your query is only scanning one partition, with about 1% of the rows in the table. Then within that partition, it narrows down even further using the index.
Inserts should also improve, because they are updating much smaller indexes.
I use a prime number when doing hash partitioning, to help the partitions remain more evenly filled in case the dates inserted follow a regular pattern.
Converting a 90GB table to partitioning is going to take a long time. You can use pt-online-schema-change to avoid blocking your application.
You can even make more partitions if you want, in theory up to 1024 in MySQL 5.5 and 8192 in MySQL 5.6. Although with thousands of partitions, you may run into different bottlenecks, like the number of open files.
P.S.: HASH indexes are not support by either MyISAM or InnoDB. HASH indexes are only supported by MEMORY and NDB storage engines.
You are in the problem which is called Big Data Querying / Big Data handling now a days. For handling big data there are many solutions available unfortunately none of them are easy enough to be implemented. You always need a team to structure Big Data to fulfill your need. Some of The solution I may define here are as Under.
1. Big Table
Google uses this technique to create a whole lot big table with thousands of column.(To minimize records vertically). For which you will have to analyze your data and then partition on the basis of similarity and then tag those similarity with appropriate name. Now you must have to write Query that will be first analyzed by some algorithm to check what column space have to be queried. Not Simple enough
2. Distribute Database Across multiple Machine
Hadoop file system is an open source Apache project which is totally created for solving the problem of storing and querying big data. In early days Space was issue and system were capable enough to process small data but now space is not an issue.Even Small organization have tera bytes of data stored locally. But this terabytes of data can not be be processed in one go at one machine. Even a giant machine can take days to process aggregate operation. That is why hadoop is there.
If you are individual then definitely you are in trouble you will need resource for doing this painful task for You. But you can use the essence of these techniques without employing these technologies.
You are free to give a try to these technique. Just study articles about handling big data. Relational database queries are not gonna work in your case
I would like to have advices about a mysql table design for a event logger.
Our needs :
- track a lot of action
- 10 000 actions / second
- 1 billion row at this time
Our hardware :
- 2*Xeon (seen as 32 CPU by the system)
- 128 GB RAM
- 6*600 SSD with Raid 10
Our table design :
CREATE TABLE IF NOT EXISTS `log_event` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_event` smallint(6) NOT NULL,
`id_user` bigint(20) NOT NULL,
`date` int(11) NOT NULL,
`data` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_event_2` (`id_event`,`data`),
KEY `id_inscri` (`id_inscri`),
KEY `date` (`date`),
KEY `id_event_4` (`id_event`,`date`,`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
ALTER TABLE `log_event`
ADD CONSTRAINT `log_event_ibfk_1` FOREIGN KEY (`id_inscri`) REFERENCES `inscription` (`id_inscri`) ON DELETE CASCADE ON UPDATE CASCADE;
Our problem :
- We have an auto-increment as primary, but it is not really used. Is it a problem to remove it ? We will no have primary key if we remove it => How to identify a line ?
We would like to do partionning, but with the foreign it seems to be impossible ?
We don't do bulk insert. Is it a good idea to insert in a Memory table without index and copy data every 5 minutes ?
Do you have any idea to optimize ? Do you have best practice for this kind of system ?
Thanks !
François
Primary keys of relational tables (relations) might have two types:
Natural - exists in subject area to completely determine each row of relational table.
Natural primary keys might be simple (if consists of only one column), or complex (if consists more than one column). It is not recomended to set a natural primary key on large string column.
Artificial - special column, injected by database designer / developer to boost table performance, if natural key is complex, and have to be used in related table (is foreign key for something), or if it is simple, but is large and will produce data overhead while copied in related table as a foreign key, or if it is complex to search (for example, CRUD operations on VARCHAR IDs might be slower, than on INT IDs). There might be other reasons. TL;DR: Artificial key - one special column, serving to completely determine each row of relational table and boost it's performance for CRUD operations.
We have an auto-increment as primary, but it is not really used. Is it
a problem to remove it ? We will no have primary key if we remove it
=> How to identify a line ?
If you do not need to reference your table to another tables (as source), then you may probably remove artificial key without any consequences. Still, I recomend you set any other PRIMARY KEY in this table to avoid data duplication, and for obviosity (if it matters).
Your table by itself (if properly normalized) will have natural key as one of "key candidates". It might be complex one (consist of few columns). It is normal. But don't set primary for strings, because PRIMARY always have index, which will produce data overhead. If it is combination of INT or "small" VARCHAR columns, then it is normal.
Consider as an option: id_event + id_user + date.
We don't do bulk insert. Is it a good idea to insert in a Memory table
without index and copy data every 5 minutes ?
It is not a bad idea. But it is not good idea, until it properly tested. Try to perform load-test, before real use.
If you not reference MEMORY table to others, then you still may join it with any other InnoDB table. But you will loose InnoDB functionality (referential integrity). If lose of parent table ON DELETE CASCADE ON UPDATE CASCADE is not a concern, then it might be done. As for me, InnoDB is not so slow to switch table engine, in your case.