proper index (or removal) to optimize a large data set table - mysql

We have a 'visitor' tracking schema going on - that when pushed, seems to be causing some strain on the DB server.
VISITORS table identifies unique users by a HASH (current records 310,000). A search is performed on the hash, and if not found, it is added. The ID is needed for the following two tables
CREATE TABLE visitors (
id int(10) UNSIGNED NOT NULL auto_increment,
ip varchar(25) NOT NULL,
hash varchar(64) NOT NULL,
first_visit varchar(32) NOT NULL,
created_at datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE visitors ADD UNIQUE INDEX (hash);
ALTER TABLE visitors ADD INDEX (created_at);
VISITOR_VISITS table identifies when a user visited only when we can identify some referral sources (current count 142,000). A search is performed looking for the visitor_id, type and visit_date. If there is nothing found - it is added. The ID is used in the following table.
CREATE TABLE visitor_visits (
id int(10) UNSIGNED NOT NULL auto_increment,
visitor_id int(10) UNSIGNED NOT NULL,
source varchar(64) NULL DEFAULT NULL DEFAULT NULL,
medium varchar(64) NULL DEFAULT NULL,
campaign varchar(256) NULL DEFAULT NULL,
page varchar(32) NULL DEFAULT NULL,
landing varchar(32) NULL DEFAULT NULL,
type enum('fundraiser_view') NULL DEFAULT NULL,
visit_date date NOT NULL default '0000-00-00',
created_at datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE visitor_visits ADD UNIQUE INDEX (visitor_id,type,visit_date);
ALTER TABLE visitor_visits ADD CONSTRAINT FK_visits_visitor_id FOREIGN KEY (visitor_id) REFERENCES visitors(id);
PAGE_VIEWS logs individual page views (not all pages, just pages we are tracking). It can be linked to a visitor and can reference a visitor_visit (current count 2.4million -- reason it is higher is we started micro-visitor logging after logging individual pages). An insert/on duplicate query is used to add the record to this based on the view_date for the identified user. Since the ID is not needed, a pure lookup query isnt required
CREATE TABLE page_views (
id int(10) UNSIGNED NOT NULL auto_increment,
page_id int(10) UNSIGNED NOT NULL,
current_donations decimal(10,2) NOT NULL DEFAULT 0,
ip varchar(25) NOT NULL,
hash varchar(32) NOT NULL,
visitor_id int(10) UNSIGNED NULL DEFAULT NULL AFTER,
visitor_visit_id int(10) UNSIGNED NULL DEFAULT NULL AFTER,
page_views int(10) UNSIGNED NOT NULL DEFAULT 0,
widget_views int(10) UNSIGNED NOT NULL DEFAULT 0,
view_date date NOT NULL,
viewed_at datetime NOT NULL default '0000-00-00 00:00:00',
created_at datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE page_views ADD UNIQUE INDEX (page_id,view_date,visitor_id,hash);
ALTER TABLE page_views ADD INDEX (visitor_id);
ALTER TABLE page_views ADD INDEX (visitor_visit_id);
ALTER TABLE page_views ADD CONSTRAINT FK_page_views_page_id FOREIGN KEY (page_id) REFERENCES pages(id);
ALTER TABLE page_views ADD CONSTRAINT FK_page_views_visitor_id FOREIGN KEY (visitor_id) REFERENCES visitors(id);
ALTER TABLE page_views ADD CONSTRAINT FK_page_views_visit_id FOREIGN KEY (visitor_visit_id) REFERENCES visitor_visits(id);
Last week, our site got a inflow of people due to a news article, and this visitor identifying rall bottlenecked performance. I am wondering if there is an obvious optimization in there. Could it be the foreign key constraints ? Over indexing? Need for better indexing?

Try this ::
1) Index on varchar doesn't much improve the performance.
2) Try to partition the table, on a date range.

You didn't tell us what is bottlenecking your database, so I just guess it's InnoDB concurrent writes. If it isn't so and the problem is only with SELECTs (which I doubt), you should show us the exact queries. You could try to reduce the write performance hit by creating a staging table and then bulk-moving stuff from in to the main table:
CREATE TABLE page_views_tmp (
id int(10) UNSIGNED NOT NULL auto_increment,
page_id int(10) UNSIGNED NOT NULL,
current_donations decimal(10,2) NOT NULL DEFAULT 0,
ip varchar(25) NOT NULL,
hash varchar(32) NOT NULL,
visitor_id int(10) UNSIGNED NULL DEFAULT NULL AFTER,
visitor_visit_id int(10) UNSIGNED NULL DEFAULT NULL AFTER,
page_views int(10) UNSIGNED NOT NULL DEFAULT 0,
widget_views int(10) UNSIGNED NOT NULL DEFAULT 0,
view_date date NOT NULL,
viewed_at datetime NOT NULL default '0000-00-00 00:00:00',
created_at datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (id)
) ENGINE=MEMORY DEFAULT CHARSET=utf8;
And then, once per a couple of seconds or after this table has a considerable amount of rows in it:
START TRANSACTION;
INSERT INTO page_views SELECT * FROM page_views_tmp;
DELETE FROM page_views_tmp;
COMMIT;

Related

Partitioning large table by dates

I have implemented custom url shortener in my app and I have one table for that. table structure looks like this:
CREATE TABLE `urls` (
`id` int(11) NOT NULL,
`url_id` varchar(10) DEFAULT NULL,
`long_url` varchar(255) DEFAULT NULL,
`clicked` mediumint(5) NOT NULL DEFAULT 0,
`user_id` varchar(7) DEFAULT NULL,
`type` varchar(15) DEFAULT NULL,
`ad_id` int(11) DEFAULT NULL,
`campaign` int(11) DEFAULT,
`increment` tinyint(1) NOT NULL DEFAULT 0,
`date` date DEFAULT NULL,
`del` enum('1','0') NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
ALTER TABLE `urls`
ADD PRIMARY KEY (`id`),
ADD KEY `url_id` (`url_id`),
ADD KEY `type` (`type`),
ADD KEY `campaign` (`campaign`),
ADD KEY `ad_id` (`ad_id`),
ADD KEY `date` (`date`),
ADD KEY `user_id` (`user_id`);
The table now has 20.000.000 records and currently growing by 300k-400k records per day.
url_id column is unique varchar(10) and url looks like that: http://example.com/asdfghjklu
Now i have partitioned this table into 10 partitions by HASH(id):
PARTITION BY HASH (`id`)
PARTITIONS 10;
When I try to generate reports and join this table on others query is getting really slow, so slow even can't get 1 week report.
When I try to make big query in this table I filter almost every query with dates and I think it will be much better if I partition this table by date column.
Is it good idea?
As I read if I want to partition this table by date I need to add date in composite primary key: PRIMARY KEY(id, date)
What do you think about this? How do I improve my query performance?
I wold recommend use hash partition using date or month or YEAR
CREATE TABLE `urls` (
`id` int(11) NOT NULL,
`url_id` varchar(10) DEFAULT NULL,
`long_url` varchar(255) DEFAULT NULL,
`clicked` mediumint(5) NOT NULL DEFAULT 0,
`user_id` varchar(7) DEFAULT NULL,
`type` varchar(15) DEFAULT NULL,
`ad_id` int(11) DEFAULT NULL,
`campaign` int(11) DEFAULT,
`increment` tinyint(1) NOT NULL DEFAULT 0,
`date` date DEFAULT NULL,
`del` enum('1','0') NOT NULL DEFAULT '0',
PartitionsID int(4) unsigned NOT NULL,
KEY PartitionsID (PartitionsID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH (PartitionsID)
PARTITIONS 366;
IN PARTITION ID you just need to insert TO_DAYS(date) so you have only one value for entire day .
SOURCE
and it will make easy for partition for each day or you can do with month wise also depending on your data size .
for select
you can use below query as example
SELECT *
FROM TT ACT
WHERE ACT.CustomerID = vCustomerID
AND ACT.TransactionTime BETWEEN vInvoiceEndDate AND vPaymentDueDate
AND ACT.TrxnInfoTypeID IN (19, 23)
AND ACT.PaymentType = '1'
AND ACT.PartitionsID BETWEEN TO_DAYS(vInvoiceEndDate) AND TO_DAYS(vPaymentDueDate);

Partition on existing table with millions of records

I have table name builds it look likes
CREATE TABLE `builds` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`testplan_id` int(10) unsigned NOT NULL DEFAULT '0',
`name` varchar(100) NOT NULL DEFAULT 'undefined',
`creation_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`testplan_id`,`name`),
KEY `testplan_id` (`testplan_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2074288 DEFAULT CHARSET=utf8'
I want to create partitions on build based on creation_ts. I am trying
Alter Table builds PARTITION BY RANGE (TO_DAYS(creation_ts))
( PARTITION p1 values less than (TO_DAYS('2015-05-07'))
in phpMyadmin but it is showing error "unrecognised alter operation";
I am using MYSQL Server version: 5.7.11.

MySql - Create view to read from Multiple Tables

I have archived some old line items for invoices that are no longer current but still need to reference them. I think I need to create a VIEW but not really understanding it. Can someone help so I can run a query to pull the invoice and then the total of all the line items assigned (no matter what table the items are in)?
CREATE TABLE `Invoice` (
`Invoice_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`Invoice_CreatedDateTime` DATETIME DEFAULT NULL,
`Invoice_Status` ENUM('Paid','Sent','Unsent','Hold') DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1
CREATE TABLE `Invoice_LineItem` (
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MYISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
CREATE TABLE `Invoice_LineItem_Archived` (
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MYISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Typically I would just run the following query to get the amount due on the invoices
SELECT
Invoice_ID,
Invoice_CreatedDateTime,
Invoice_Status,
(SELECT SUM(LineItem_Amount) AS totAmt FROM Invoice_LineItem WHERE LineItem_InvoiceID=Invoice_ID) AS Invoice_Total
FROM
Invoice
WHERE
Invoice_Status='Sent'
Also how can I select all the line items from both tables in one query?
SELECT
LineItem_ID,
LineItem_ChargeType,
LineItem_Amount
FROM
Invoice_LineItem
WHERE
LineItem_InvoiceID='1234'
You can use the MERGE Storage Engine to create a virtual table that's the union of two real tables:
CREATE TABLE Invoice_LineItem_All
(
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MERGE UNION=(Invoice_LineItem_Archived, Invoice_LineItem);
You can use UNION :
SELECT a.* FROM a
UNION
SELECT b.* FROM b;
You just need to have the same number and type of column in your different queries.
As far as I remember, you can add test in sub-queries, but I'm not sure you can order on the global result.
http://dev.mysql.com/doc/refman/4.1/en/union.html

MySQL - convert MyISAM into InnoDB getting error 1075

I am trying to convert a table from MyISAM into InnoDB, this is the definition and I am getting error #1075 - Incorrect table definition; there can be only one auto column and it must be defined as a key
The table has an AutoIncrement value and the field is indexed and it works with MyISAM. I am new to InnoDB so it might be a dumb question
CREATE TABLE `cart_item` (
`cart_id` int(10) unsigned NOT NULL DEFAULT '0',
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`design_number` int(10) unsigned NOT NULL,
`logo_position_id` smallint(5) unsigned NOT NULL,
`subst_style_id` varchar(10) DEFAULT NULL,
`style_id` varchar(10) NOT NULL DEFAULT '',
`subst_color_id` smallint(5) unsigned DEFAULT NULL,
`color_id` smallint(5) unsigned NOT NULL,
`size_id` smallint(5) unsigned NOT NULL,
`qty` mediumint(8) unsigned NOT NULL,
`active` enum('y','n') NOT NULL DEFAULT 'y',
`date_last_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`last_modified_by_id` mediumint(5) unsigned NOT NULL,
`date_last_locked` datetime DEFAULT NULL,
`last_locked_by_id` smallint(5) unsigned NOT NULL,
`date_added` datetime NOT NULL,
`subsite_logo_group_id` int(11) NOT NULL,
`bundle` varchar(32) NOT NULL,
`color_stop_1` varchar(4) DEFAULT NULL,
PRIMARY KEY (`cart_id`,`id`),
KEY `color_id` (`color_id`),
KEY `style_id` (`style_id`),
KEY `size_id` (`size_id`),
KEY `design_number` (`design_number`),
KEY `subsite_logo_group_id` (`subsite_logo_group_id`),
KEY `date_added` (`date_added`),
KEY `bundle` (`bundle`)
) ENGINE=InnoDB
What you were doing on the MyISAM table, cannot be done with InnoDB. See my answer on a (similar) problem: creating primary key based on date
MySQL docs, in the Using AUTO_INCREMENT section, explain it:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
You may get similar behaviour in InnoDB but not with AUTO_INCREMENT. You'll have to use either some fancy trigger or a stored procedure for your Inserts that will take care of the (per cart_id) auto-increment.
You have a composite PRIMARY KEY defined on (cart_id, id), but the AUTO_INCREMENT requires an index on id alone. You can add a KEY for it (not a primary key, but just a plain index):
KEY `idx_id` (`id`)
I question the use of the composite PK on (cart_id, id) though, since id is alone a unique value by definition. Perhaps you should make id the PK, and create a separate index across the combination.
PRIMARY KEY (`id`),
KEY (`cart_id`, `id`)
It doesn't even need to be specified as UNIQUE because the AUTO_INCREMENT can't be repeated anyway. There is no way to violate uniqueness on the combination (cart_id, id).
AUTO_INCREMENT columns should be define as key, as what the error implies.
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
and set UNIQUE on the two column instead of primary key
UNIQUE (`cart_id`,`id`),
SQLFiddle Demo

SQL: Refactoring a multi-join query

I have a query that should be quite simple and yet it causes me a lot of headaches.
I have a simple ads system that requires filtering ads according to a few variables.
I need to limit the number of views/clicks per day and the total number of views/clicks for a given ad. Also each ad is linked to one or more slots in which the ad can appear. I have a table that saves the statistics that I need about each ad. Note that the statistics table changes very frequently.
These are the tables that I'm using:
CREATE TABLE `t_ads` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL,
`content` text NOT NULL,
`is_active` tinyint(1) unsigned NOT NULL,
`start_date` date NOT NULL,
`end_date` date NOT NULL,
`max_views` int(10) unsigned NOT NULL,
`type` tinyint(3) unsigned NOT NULL default '0',
`refresh` smallint(5) unsigned NOT NULL default '0',
`max_clicks` int(10) unsigned NOT NULL,
`max_daily_clicks` int(10) unsigned NOT NULL,
`max_daily_views` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ad_slots` (
`id` int(10) unsigned NOT NULL auto_increment ,
`name` varchar(255) NOT NULL,
`width` int(10) unsigned NOT NULL,
`height` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `t_ads_to_slots` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`value` int(10) unsigned NOT NULL,
PRIMARY KEY (`ad_id`,`slot_id`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ads_to_slots`
ADD CONSTRAINT `t_ads_to_slots_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ads_to_slots_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
CREATE TABLE `t_ad_stats` (
`ad_id` int(10) unsigned NOT NULL,
`slot_id` int(10) unsigned NOT NULL,
`date` date NOT NULL COMMENT,
`views` int(10) unsigned NOT NULL,
`unique_views` int(10) unsigned NOT NULL,
`clicks` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`ad_id`,`slot_id`,`date`),
KEY `slot_id` (`slot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `t_ad_stats`
ADD CONSTRAINT `t_ad_stats_ibfk_1` FOREIGN KEY (`ad_id`) REFERENCES `t_ads` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
ADD CONSTRAINT `t_ad_stats_ibfk_2` FOREIGN KEY (`slot_id`) REFERENCES `t_ad_slots` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;
This is the query that I use to get ads for a given slot (Note that in this example I hard coded 20 as the slot id and 0,1,2 as the ad type, I get this data from a php script which invokes this query)
SELECT `ads`.`content`, `slots`.`value`, `ads`.`id`, `ads`.`refresh`, `ads`.`type`,
SUM(`total_stats`.`views`) AS "total_views",
SUM(`total_stats`.`clicks`) AS "total_clicks"
FROM (`t_ads` AS `ads`,
`t_ads_to_slots` AS `slots`)
LEFT JOIN `t_ad_stats` AS `total_stats`
ON `total_stats`.`ad_id` = `ads`.`id`
LEFT JOIN `t_ad_stats` AS `daily_stats`
ON (`daily_stats`.`ad_id` = `ads`.`id`) AND
(`daily_stats`.`date` = CURDATE())
WHERE (`ads`.`id` = `slots`.`ad_id`) AND
(`ads`.`type` IN(0,1,2)) AND
(`slots`.`slot_id` = 20) AND
(`ads`.`is_active` = 1) AND
(`ads`.`end_date` >= NOW()) AND
(`ads`.`start_date` <= NOW()) AND
((`ads`.`max_views` = 0) OR
(`ads`.`max_views` > "total_views")) AND
((`ads`.`max_clicks` = 0) OR
(`ads`.`max_clicks` > "total_clicks")) AND
((`ads`.`max_daily_clicks` = 0) OR
(`ads`.`max_daily_clicks` > IFNULL(`daily_stats`.`clicks`,0))) AND
((`ads`.`max_daily_views` = 0) OR
(`ads`.`max_daily_views` > IFNULL(`daily_stats`.`views`,0)))
GROUP BY (`ads`.`id`)
I believe that this query is self explanatory, even though its quite long. Note that the MySQL version that I'm using is: 5.0.51a-community. It seems to me like the big issue here is the double join to the stats table (I did that so that I will be able to get the data from a specific record and from multiple records (sum)).
How would you implement this query in order to get better results? (Note that I can't change from InnoDB).
Hopefully everything is clear about my question, but if that is not the case, please ask and I will clarify.
Thanks in advance,
Kfir
Add indexes to following columns:
t_ads.is_active
t_ads.start_date
t_ads.end_date
Change the order of the primary key on t_ad_stats to:
(`ad_id`,`date`,`slot_id`)
or add a covering index to t_ad_stats
('ad_id', 'date')
Change from 0 meaning "no limit" to 2147483647 meaning no limit, so you can change things like:
((`ads`.`max_views` = 0) OR (`ads`.`max_views` > "total_views"))
to
(`ads`.`max_views` > "total_views")
You could greatly improve this is if you were keeping running totals instead of having to calculate them each time.
Expanding on a comment above I believe that the following columns should be indexed:
ads.id
ads.type
ads.start_date
ads.end_date
daily_stats.date
As well as these:
slots.slot_id
ads.is_active
And these as well:
ads.max_views
ads.max_clicks
ads.max_daily_clicks
ads.max_daily_views
daily_stats.clicks
daily_stats.views
Do note that applying indexes on these columns will speed up your SELECTs but slow down your INSERTs since the indexes will need updating as well. But, you don't have to apply all of this all at once. You can do it incrementally and see how the performance shakes out for selects as well as inserts. If you cannot find a good middleground then I would suggest denormalization.