Union 2 tables inner join on a third with 2 legacy applications - mysql

I have a legacy Access front end connected to a mySQL database. The legacy app has a lot of dangerous macros assigned to onclose triggers. I also have a web application under development running on the same database. There are a couple of modules in the web app that are in production use. My testing is being done on a separate development machine with a separate dedicated development version of the database.
A new module I'm installing into my web app comes with it's own set of tables. It will happily exist in the same database but want's it's own copy of the data in it's own tables. I hesitate to extensively modify the new tables or code base for that module.
There are a total of 6 tables that hold similar data for different objects in the legacy database. I am only working on the 2 most important of those tables now. The below represents only a very small subset of the columns in these 2 tables.
CREATE TABLE IF NOT EXISTS `agent` (
`age_id` int(11) NOT NULL AUTO_INCREMENT,
`age_agent_email_address` varchar(255) DEFAULT NULL,
`age_welcome_email_sent_y_or_n` varchar(255) DEFAULT 'No',
`age_status` varchar(255) DEFAULT 'Active',
PRIMARY KEY (`age_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=1854 ;
And
CREATE TABLE IF NOT EXISTS `prospecting_contacts` (
`psp_prospect_id` varchar(255) NOT NULL DEFAULT '',
`psp_prospecting_status` varchar(255) DEFAULT 'Active',
`psp_prospect_email_address` varchar(255) DEFAULT NULL,
`psp_remove_from_email_marketing` varchar(255) DEFAULT 'No',
`psp_id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`psp_id`) USING BTREE,
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC AUTO_INCREMENT=2050793 ;
There are several related tables that came with the new module. I believe only one of them needs to be updated.
CREATE TABLE IF NOT EXISTS `phplist_user_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) CHARACTER SET latin1 NOT NULL,
`confirmed` tinyint(4) DEFAULT '0',
`blacklisted` tinyint(4) DEFAULT '0',
`bouncecount` int(11) DEFAULT '0',
`entered` datetime DEFAULT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`uniqid` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`htmlemail` tinyint(4) DEFAULT '0',
`subscribepage` int(11) DEFAULT NULL,
`rssfrequency` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`password` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`passwordchanged` date DEFAULT NULL,
`disabled` tinyint(4) DEFAULT '0',
`extradata` text CHARACTER SET latin1,
`foreignkey` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
`optedin` tinyint(4) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`),
KEY `foreignkey` (`foreignkey`),
KEY `idx_phplist_user_user_uniqid` (`uniqid`),
KEY `emailidx` (`email`),
KEY `enteredindex` (`entered`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=11 ;
The php_list_user_user table would include data that is a result of this query:
SELECT `age_agent_email_address` AS `email` FROM `agent`
WHERE `age_status` = 'Active'
UNION DISTINCT
SELECT `psp_prospect_email_address` FROM `prospecting_contacts`
WHERE `psp_prospecting_status` = 'Active'
The legacy access application updates the agent and prospecting_contacts tables. The new module updates the php_list_user_user table. I believe I can copy the information back and forth using TRIGGER. But, I'm looking for a way that doesn't duplicate data.
I had thought of CREATE VIEW, but the mysql manual says that unions and joins break it's update ability. http://dev.mysql.com/doc/refman/5.1/en/view-updatability.html
So, is there a way to update these 3 tables without duplicating data? Or should I just duplicate the email addresses and use TRIGGERs on INSERT and UPDATE?

You might be able to do something clever with foreign keys though they are more attuned to keeping tables consistent rather than preventing duplicates. http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
It may seem counter-intuitive but another solution would be to maintain a lookup table that indicated where a specific value could be found. You could join with all three of the (sub)tables to prevent duplicates.
A trigger would work too.

Related

MySQL performance Issues while using INET_ATON

I have a MySQL query
SELECT * FROM table WHERE INET_ATON("10.0.0.1") BETWEEN INET_ATON(s_ip) AND INET_ATON(e_ip);
Here "10.0.0.1" comes dynamically when a user visits the website and s_ip is the starting ip address column which would probably have "10.0.0.0" as starting ip address range and e_ip is the ending IP address.
Now, the problem is I have almost ~350K records which do only one thing when this query is executed and that is to get me the country code of the visitor.
When this query is executed MySQL peaks CPU consumption at 1100% and multiply that by 1000 requests/minute and my server just cannot handle it.
My server is running CentOS 7 with 100 GB of RAM and 24 Cores clocked at 3.0 GHz but still the performance is becoming a nightmare for me to handle.
I was thinking of outsourcing this functionality to third party service but I just want to make sure that nothing can be done from my side to fix this issue.
(From Comments)
CREATE TABLE ip` (
ip_ip varbinary(16) NOT NULL,
ip_last_request_time timestamp(3) NULL DEFAULT NULL,
ip_min_timeSpan_get smallint(5) unsigned NOT NULL,
ip_min_timeSpan_post smallint(5) unsigned NOT NULL,
ip_violationsCount_get smallint(5) unsigned NOT NULL,
ip_violationsCount_post smallint(5) unsigned NOT NULL,
ip_maxViolations_get smallint(5) unsigned NOT NULL,
ip_maxViolations_post smallint(5) unsigned NOT NULL,
ip_bannedAt timestamp(3) NULL DEFAULT NULL,
ip_banSeconds mediumint(8) unsigned NOT NULL DEFAULT '300',
ip_isCapatchaResolved tinyint(1) NOT NULL DEFAULT '0',
ip_isManualBanned tinyint(1) NOT NULL DEFAULT '0',
ip_city varchar(45) DEFAULT '',
ip_region varchar(45) DEFAULT '',
ip_regionCode varchar(5) DEFAULT '',
ip_regionName varchar(45) DEFAULT '',
ip_countryCode varchar(3) DEFAULT '',
ip_countryName varchar(45) DEFAULT '',
ip_continentCode varchar(3) DEFAULT '',
ip_continentName varchar(45) DEFAULT '',
ip_timezone varchar(45) DEFAULT '',
ip_currencyCode varchar(4) DEFAULT '',
ip_currencySymbol_UTF8 varchar(5) DEFAULT '',
PRIMARY KEY (ip_ip),
KEY countryCode_index (ip_countryCode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4`
CREATE TABLE country` ( co_id char(2) COLLATE utf8mb4_unicode_ci NOT NULL,
co_re_id smallint(6) DEFAULT NULL,
co_flag_id char(4) COLLATE utf8mb4_unicode_ci NOT NULL,
co_english_name varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (co_id),
KEY fk_country_region1_idx (co_re_id),
CONSTRAINT fk_country_region1 FOREIGN KEY (co_re_id)
REFERENCES region (re_id) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Currently you're doing a full table scan for every query. There are a couple of things you can try.
Store INET_ATON(s_ip) in the table so it's not computed during the query. Same for e_ip.
Add an index that has these two new columns, and the country code.
Change the query to select only the country code, and use the two new columns.
Use EXPLAIN to make sure the DB uses the index for the query.
The optimizer does not know that you have a set of non-overlapping ranges that it could do some optimizations based on it. So, you have work harder to optimize the queries.
Instead of doing table scans, the code described here will do typical queries 'instantly'.
To put it bluntly, you cannot optimize the query without restructuring the data. I'm speaking also to all who have provided Answers and Comments.
(critique of schema)
ip is awfully bulky. Suggest moving city and all the fields after it to another table in order to 'normalize' that data.
It is 'wrong' to have both a ..code and ..name in the same table (except for the normalization table).
Several fields can (and should) be ascii, not utf8mb4. Example: countryCode.
On another topic... How will you handle AOL IP addresses? As I understand it, these are shared among its customers. That is, a "violator" will move around, tainting all of the AOL IPs.
10., 11., 172.16., 192.168. all come from behind a NAT, and cannot be associated with a given country, nor a given computer.

MySQL Workbench wont make constraint when parent table has Generated Virtual columns

SETUP
MySQL Workbench (ver 6.3.9)
MySQL 5.7.21
My setup is simple.. I have 2 tables:
CREATE TABLE `UserDevices` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`UserID` int(11) DEFAULT NULL,
`UUID` binary(16) DEFAULT NULL,
`DeviceName` varchar(45) DEFAULT NULL,
`DeviceType` tinyint(3) NOT NULL DEFAULT '1',
`CreatedDate` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`TimeStamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `UserInfo` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`UUID` binary(16) DEFAULT NULL,
`UUIDText` varchar(40) GENERATED ALWAYS AS (insert(insert(insert(insert(hex(`UUID`),9,0,'-'),14,0,'-'),19,0,'-'),24,0,'-')) VIRTUAL,
`FirstName` varchar(45) DEFAULT NULL,
`LastName` varchar(45) DEFAULT NULL,
`FullName` varchar(90) GENERATED ALWAYS AS (concat(`FirstName`,' ',`LastName`)) VIRTUAL,
`Email` varchar(120) DEFAULT NULL,
`Status` tinyint(3) DEFAULT '0',
`AccountType` tinyint(3) DEFAULT '1',
`CreatedDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`TimeStamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
PROBLEM
When working inside Workbench I'm simply trying to Make a Foreign key constraint in table "UserDevices" on Column "UserID" Pointed at Table "UserInfo" Column "ID". When selecting "UserInfo" as the Referenced Table. I can not put a check next to UserID.. Also no columns show up in the drop down list under Referenced Column..
QUESTION
I understand there are a number of reasons this scenario would happen. But I'm not seeing Any data type mismatch or such that would explain this. What is making it so I can't select UserID.ID?
P.S. Setting up another table named "DeviceMeasurements" with a Column "DeviceID" I'm completely successful at setting up the constraint exactly as expected.
UPDATE
On a hunch since this is my first time playing around with Generated Virtual Columns. I went into the table and removed columns "UUIDText" and "FullName". NOW I can build my constraints as desired. But my question stands. Why can't I build constraint with the tables built as above!?
UPDATE 2
This has been confirmed as a bug in WorkBench. Manually adding the constraint via SQL code is a valid work around currently. Please see accepted answer.
Can confirm, this is a bug in WB. Have raised it with MySQL dev team.
Bug link

MySQL: updating and reading the same row frequently

I made an app in which polls are sent to users via push notifications, and they have a short time to answer. We now have a deal with a news agency, and chances are that up to 100 000 people will answer to the polls sent by this company in a short period of time (5 minutes for example).
I have a MySQL database stored on Amazon RDS. Polls are stored in an innodb table:
CREATE TABLE `polls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`categoryId` int(11) NOT NULL,
`question` text CHARACTER SET utf8 NOT NULL,
`expiresAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sentAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` int(11) NOT NULL,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
`text1` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`text2` varchar(30) CHARACTER SET utf8 DEFAULT NULL,
`special` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
When people start voting, we increment the value of opt1 or opt2 by 1. For example if someone voted for option 1:
UPDATE polls SET opt1=opt1 +1 WHERE id=4644";
How can I configure MySQL to ensure it can support this load of traffic? I tried to go through the official docs but I can not find a clear overview of the steps I should take. Obviously I can buy a better database on AWS, but I want to be sure I am not making a mistake on scalability here.
By the way, all select queries (when people just read the polls) are sent to a replicated database on AWS.
Many thanks for your help, please ask for more information if I forgot something.
I'd create a separate table for the poll results in order to have rows has short as possible for the update statement to work with.
CREATE TABLE `pollResults` (
`poolId` int(11) NOT NULL AUTO_INCREMENT,
`opt1` int(11) DEFAULT '0',
`opt2` int(11) DEFAULT '0',
PRIMARY KEY (`poolId`)
) ENGINE=InnoDB AUTO_INCREMENT=3284 DEFAULT CHARSET=latin1;
In your polls table, I would put all the text column at the end of the table, but this might not be a big deal.

How do fields not selected for in a MySQL query affect query speed for the fields I am selecting on?

This is a theoretical question based on an application I have. I am wondering if there is some technical insight to be gained beyond just speed tests on my system.
I have the following two tables:
CREATE TABLE `files` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL DEFAULT '',
`processed` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_processed` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
and...
CREATE TABLE `file_metas` (
`file_id` int(10) unsigned NOT NULL,
`title` varchar(255) NOT NULL DEFAULT '',
`description` varchar(1000) NOT NULL DEFAULT '',
`keywords` varchar(1000) NOT NULL DEFAULT '',
PRIMARY KEY (`file_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The file_metas data is long text strings about each file from the files table. Each file only has one entry in the file_metas table so these two tables could be combined.
I'm wondering what affect adding the long text fields to the files table will have on the performance of select statements done on the files table when I'm not selecting for title, description, or keywords. I'm curious about the technical details. Does simply having the text fields in the table slow queries not involving those fields? How does this work in general with MySQL MyISAM tables? Is there any good reason to keep the file_metas data in a separate table?

Is there a better index to speed up this query?

The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.