I made an app in which polls are sent to users via push notifications, and they have a short time to answer. We now have a deal with a news agency, and chances are that up to 100 000 people will answer to the polls sent by this company in a short period of time (5 minutes for example).
I have a MySQL database stored on Amazon RDS. Polls are stored in an innodb table:
CREATE TABLE `polls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`categoryId` int(11) NOT NULL,
`question` text CHARACTER SET utf8 NOT NULL,
`expiresAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sentAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` int(11) NOT NULL,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
`text1` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`text2` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`special` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
When people start voting, we increment the value of opt1 or opt2 by 1. For example if someone voted for option 1:
UPDATE polls SET opt1=opt1 +1 WHERE id=4644";
How can I configure MySQL to ensure it can support this load of traffic? I tried to go through the official docs but I can not find a clear overview of the steps I should take. Obviously I can buy a better database on AWS, but I want to be sure I am not making a mistake on scalability here.
By the way, all select queries (when people just read the polls) are sent to a replicated database on AWS.
Many thanks for your help, please ask for more information if I forgot something.
I'd create a separate table for the poll results in order to have rows has short as possible for the update statement to work with.
CREATE TABLE `pollResults` (
`poolId` int(11) NOT NULL AUTO_INCREMENT,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
PRIMARY KEY (`poolId`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
In your polls table, I would put all the text column at the end of the table, but this might not be a big deal.
Related
I'm experiencing a strange issue where my existing table rows (RDS MySQL) are being overwritten. Running a SPA (Vuetify). When a user POSTs data, it overwrites an existing table row, rather than creating a new row.
The weird thing is it happens only sometimes, seemingly at random. Sometimes it will function correctly, other times it overwrites existing data. I cannot link anything in the logs to these events, nor connect it to a specific error.
We have two DATETIME fields that sometimes give incorrect timestamps, other times the timestamp comes in blank 0000-00-00 00:00:00.
The issue seems to have come out of nowhere. Has anyone experienced anything like this?
CREATE TABLE media (
id int(11) NOT NULL AUTO_INCREMENT,
content_id int(11) DEFAULT NULL,
type enum('image','video','pdf','link') COLLATE utf8_unicode_ci DEFAULT NULL,
title varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
url varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
created_at datetime NOT NULL,
updated_at datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=132 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I have found myself looking after an old testlink installation, all the people responsible have left and it is years since I did any serious SQL work.
The underlying database is version 5.5.24-0ubuntu0.12.04.1
I do not have all the passwords, but I have enough rights to do a backup without locks;
mysqldump --all-databases --single-transaction -u testlink -p --result-file=dump2.sql
I really do not want to a attempt to restore the data!
We need to increase the length of the name field in testlink, various pages lead me to increasing the length of a field in the nodes_hierarchy table.
The backup yielded this;
CREATE TABLE `nodes_hierarchy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`node_type_id` int(10) unsigned NOT NULL DEFAULT '1',
`node_order` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pid_m_nodeorder` (`parent_id`,`node_order`)
) ENGINE=MyISAM AUTO_INCREMENT=184284 DEFAULT CHARSET=utf8;
I have only really one chance to get this right and cannot lose any data. Does this look exactly right?
ALTER TABLE nodes_hierarchy MODIFY name VARCHAR(150) DEFAULT NULL;
That is the correct syntax.
Backup
You should backup the database regardless how safe this operation is. It seems like you are already planning on it. It is unlikely you will have problems. Backup is just an insurance policy to account for unlikely occurrences.
Test table
You seem to have ~200K records. I'd recommend you make a copy of this table by just doing:
CREATE TABLE `test_nodes_hierarchy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`node_type_id` int(10) unsigned NOT NULL DEFAULT '1',
`node_order` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `test_pid_m_nodeorder` (`parent_id`,`node_order`)
) ENGINE=MyISAM AUTO_INCREMENT=184284 DEFAULT CHARSET=utf8;
Populate test table
Populate the test table with:
insert into test_nodes_hierarchy
select *
from nodes_hierarchy;
Run alter state on this test table
Find how long the alter statement will take on the test table.
ALTER TABLE test_nodes_hierarchy
MODIFY name VARCHAR(150) DEFAULT NULL;
Rename test table
Practice renaming the test table using:
RENAME TABLE test_nodes_hierarchy TO test2_nodes_hierarchy;
Once you know the time it takes, you know what to expect on the main table. If something goes awry, you can replace the drop the nodes_hierarchy table and just rename test_nodes_hierarchy table.
That'll just build confidence around the operation.
I have a MySQL query
SELECT * FROM table WHERE INET_ATON("10.0.0.1") BETWEEN INET_ATON(s_ip) AND INET_ATON(e_ip);
Here "10.0.0.1" comes dynamically when a user visits the website and s_ip is the starting ip address column which would probably have "10.0.0.0" as starting ip address range and e_ip is the ending IP address.
Now, the problem is I have almost ~350K records which do only one thing when this query is executed and that is to get me the country code of the visitor.
When this query is executed MySQL peaks CPU consumption at 1100% and multiply that by 1000 requests/minute and my server just cannot handle it.
My server is running CentOS 7 with 100 GB of RAM and 24 Cores clocked at 3.0 GHz but still the performance is becoming a nightmare for me to handle.
I was thinking of outsourcing this functionality to third party service but I just want to make sure that nothing can be done from my side to fix this issue.
(From Comments)
CREATE TABLE ip` (
ip_ip varbinary(16) NOT NULL,
ip_last_request_time timestamp(3) NULL DEFAULT NULL,
ip_min_timeSpan_get smallint(5) unsigned NOT NULL,
ip_min_timeSpan_post smallint(5) unsigned NOT NULL,
ip_violationsCount_get smallint(5) unsigned NOT NULL,
ip_violationsCount_post smallint(5) unsigned NOT NULL,
ip_maxViolations_get smallint(5) unsigned NOT NULL,
ip_maxViolations_post smallint(5) unsigned NOT NULL,
ip_bannedAt timestamp(3) NULL DEFAULT NULL,
ip_banSeconds mediumint(8) unsigned NOT NULL DEFAULT '300',
ip_isCapatchaResolved tinyint(1) NOT NULL DEFAULT '0',
ip_isManualBanned tinyint(1) NOT NULL DEFAULT '0',
ip_city varchar(45) DEFAULT '',
ip_region varchar(45) DEFAULT '',
ip_regionCode varchar(5) DEFAULT '',
ip_regionName varchar(45) DEFAULT '',
ip_countryCode varchar(3) DEFAULT '',
ip_countryName varchar(45) DEFAULT '',
ip_continentCode varchar(3) DEFAULT '',
ip_continentName varchar(45) DEFAULT '',
ip_timezone varchar(45) DEFAULT '',
ip_currencyCode varchar(4) DEFAULT '',
ip_currencySymbol_UTF8 varchar(5) DEFAULT '',
PRIMARY KEY (ip_ip),
KEY countryCode_index (ip_countryCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4`
CREATE TABLE country` ( co_id char(2) COLLATE utf8mb4_unicode_ci NOT NULL,
co_re_id smallint(6) DEFAULT NULL,
co_flag_id char(4) COLLATE utf8mb4_unicode_ci NOT NULL,
co_english_name varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (co_id),
KEY fk_country_region1_idx (co_re_id),
CONSTRAINT fk_country_region1 FOREIGN KEY (co_re_id)
REFERENCES region (re_id) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Currently you're doing a full table scan for every query. There are a couple of things you can try.
Store INET_ATON(s_ip) in the table so it's not computed during the query. Same for e_ip.
Add an index that has these two new columns, and the country code.
Change the query to select only the country code, and use the two new columns.
Use EXPLAIN to make sure the DB uses the index for the query.
The optimizer does not know that you have a set of non-overlapping ranges that it could do some optimizations based on it. So, you have work harder to optimize the queries.
Instead of doing table scans, the code described here will do typical queries 'instantly'.
To put it bluntly, you cannot optimize the query without restructuring the data. I'm speaking also to all who have provided Answers and Comments.
(critique of schema)
ip is awfully bulky. Suggest moving city and all the fields after it to another table in order to 'normalize' that data.
It is 'wrong' to have both a ..code and ..name in the same table (except for the normalization table).
Several fields can (and should) be ascii, not utf8mb4. Example: countryCode.
On another topic... How will you handle AOL IP addresses? As I understand it, these are shared among its customers. That is, a "violator" will move around, tainting all of the AOL IPs.
10., 11., 172.16., 192.168. all come from behind a NAT, and cannot be associated with a given country, nor a given computer.
I have a legacy Access front end connected to a mySQL database. The legacy app has a lot of dangerous macros assigned to onclose triggers. I also have a web application under development running on the same database. There are a couple of modules in the web app that are in production use. My testing is being done on a separate development machine with a separate dedicated development version of the database.
A new module I'm installing into my web app comes with it's own set of tables. It will happily exist in the same database but want's it's own copy of the data in it's own tables. I hesitate to extensively modify the new tables or code base for that module.
There are a total of 6 tables that hold similar data for different objects in the legacy database. I am only working on the 2 most important of those tables now. The below represents only a very small subset of the columns in these 2 tables.
CREATE TABLE IF NOT EXISTS `agent` (
`age_id` int(11) NOT NULL AUTO_INCREMENT,
`age_agent_email_address` varchar(255) DEFAULT NULL,
`age_welcome_email_sent_y_or_n` varchar(255) DEFAULT 'No',
`age_status` varchar(255) DEFAULT 'Active',
PRIMARY KEY (`age_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=1854 ;
And
CREATE TABLE IF NOT EXISTS `prospecting_contacts` (
`psp_prospect_id` varchar(255) NOT NULL DEFAULT '',
`psp_prospecting_status` varchar(255) DEFAULT 'Active',
`psp_prospect_email_address` varchar(255) DEFAULT NULL,
`psp_remove_from_email_marketing` varchar(255) DEFAULT 'No',
`psp_id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`psp_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=2050793 ;
There are several related tables that came with the new module. I believe only one of them needs to be updated.
CREATE TABLE IF NOT EXISTS `phplist_user_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) CHARACTER SET latin1 NOT NULL,
`confirmed` tinyint(4) DEFAULT '0',
`blacklisted` tinyint(4) DEFAULT '0',
`bouncecount` int(11) DEFAULT '0',
`entered` datetime DEFAULT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`uniqid` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`htmlemail` tinyint(4) DEFAULT '0',
`subscribepage` int(11) DEFAULT NULL,
`rssfrequency` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`password` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`passwordchanged` date DEFAULT NULL,
`disabled` tinyint(4) DEFAULT '0',
`extradata` text CHARACTER SET latin1,
`foreignkey` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`optedin` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `foreignkey` (`foreignkey`),
KEY `idx_phplist_user_user_uniqid` (`uniqid`),
KEY `emailidx` (`email`),
KEY `enteredindex` (`entered`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
The php_list_user_user table would include data that is a result of this query:
SELECT `age_agent_email_address` AS `email` FROM `agent`
WHERE `age_status` = 'Active'
UNION DISTINCT
SELECT `psp_prospect_email_address` FROM `prospecting_contacts`
WHERE `psp_prospecting_status` = 'Active'
The legacy access application updates the agent and prospecting_contacts tables. The new module updates the php_list_user_user table. I believe I can copy the information back and forth using TRIGGER. But, I'm looking for a way that doesn't duplicate data.
I had thought of CREATE VIEW, but the mysql manual says that unions and joins break it's update ability. http://dev.mysql.com/doc/refman/5.1/en/view-updatability.html
So, is there a way to update these 3 tables without duplicating data? Or should I just duplicate the email addresses and use TRIGGERs on INSERT and UPDATE?
You might be able to do something clever with foreign keys though they are more attuned to keeping tables consistent rather than preventing duplicates. http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
It may seem counter-intuitive but another solution would be to maintain a lookup table that indicated where a specific value could be found. You could join with all three of the (sub)tables to prevent duplicates.
A trigger would work too.
The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.