Partitioning large table by dates - mysql

I have implemented custom url shortener in my app and I have one table for that. table structure looks like this:
CREATE TABLE `urls` (
`id` int(11) NOT NULL,
`url_id` varchar(10) DEFAULT NULL,
`long_url` varchar(255) DEFAULT NULL,
`clicked` mediumint(5) NOT NULL DEFAULT 0,
`user_id` varchar(7) DEFAULT NULL,
`type` varchar(15) DEFAULT NULL,
`ad_id` int(11) DEFAULT NULL,
`campaign` int(11) DEFAULT,
`increment` tinyint(1) NOT NULL DEFAULT 0,
`date` date DEFAULT NULL,
`del` enum('1','0') NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT
ALTER TABLE `urls`
ADD PRIMARY KEY (`id`),
ADD KEY `url_id` (`url_id`),
ADD KEY `type` (`type`),
ADD KEY `campaign` (`campaign`),
ADD KEY `ad_id` (`ad_id`),
ADD KEY `date` (`date`),
ADD KEY `user_id` (`user_id`);
The table now has 20.000.000 records and currently growing by 300k-400k records per day.
url_id column is unique varchar(10) and url looks like that: http://example.com/asdfghjklu
Now i have partitioned this table into 10 partitions by HASH(id):
PARTITION BY HASH (`id`)
PARTITIONS 10;
When I try to generate reports and join this table on others query is getting really slow, so slow even can't get 1 week report.
When I try to make big query in this table I filter almost every query with dates and I think it will be much better if I partition this table by date column.
Is it good idea?
As I read if I want to partition this table by date I need to add date in composite primary key: PRIMARY KEY(id, date)
What do you think about this? How do I improve my query performance?

I wold recommend use hash partition using date or month or YEAR
CREATE TABLE `urls` (
`id` int(11) NOT NULL,
`url_id` varchar(10) DEFAULT NULL,
`long_url` varchar(255) DEFAULT NULL,
`clicked` mediumint(5) NOT NULL DEFAULT 0,
`user_id` varchar(7) DEFAULT NULL,
`type` varchar(15) DEFAULT NULL,
`ad_id` int(11) DEFAULT NULL,
`campaign` int(11) DEFAULT,
`increment` tinyint(1) NOT NULL DEFAULT 0,
`date` date DEFAULT NULL,
`del` enum('1','0') NOT NULL DEFAULT '0',
PartitionsID int(4) unsigned NOT NULL,
KEY PartitionsID (PartitionsID)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH (PartitionsID)
PARTITIONS 366;
IN PARTITION ID you just need to insert TO_DAYS(date) so you have only one value for entire day .
SOURCE
and it will make easy for partition for each day or you can do with month wise also depending on your data size .
for select
you can use below query as example
SELECT *
FROM TT ACT
WHERE ACT.CustomerID = vCustomerID
AND ACT.TransactionTime BETWEEN vInvoiceEndDate AND vPaymentDueDate
AND ACT.TrxnInfoTypeID IN (19, 23)
AND ACT.PaymentType = '1'
AND ACT.PartitionsID BETWEEN TO_DAYS(vInvoiceEndDate) AND TO_DAYS(vPaymentDueDate);

Related

MYSQL INNER JOIN is slow with index

this is my simple inner join:
SELECT
SUM(ASSNZ.assenzeDidattiche) AS TotaleAssenze,
SUM(ASSNZ.ore) AS totale_parziale,
FLOOR(((SUM(ASSNZ.assenzeDidattiche) / SUM(ASSNZ.ore)) * 100)) AS andamento,
MAX(ASSNZ.dataLezione) AS ultima_lezione,
ASSNZ.idServizio,
ASSNZ.idUtente
FROM
ciac_corsi_assenze AS ASSNZ
INNER JOIN
ciac_serviziAcquistati_ITA AS ACQ
ON ACQ.idContatto = ASSNZ.idUtente
AND ACQ.idServizio = ASSNZ.idServizio
AND ACQ.stato_allievo <> 'ritirato'
GROUP BY
ASSNZ.idServizio,
ASSNZ.idUtente
table "ASSNZ" has 213886 rows with index "idUtente", "idServizio"
table "ACQ" has 8950 rows with index "idContatto", "idServizio"
ASSNZ table:
CREATE TABLE `ciac_corsi_assenze` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idUtente` int(11) DEFAULT NULL,
`idServizio` int(11) DEFAULT NULL,
`idCorso` int(11) DEFAULT NULL,
`idCalendario` int(11) DEFAULT NULL,
`modalita` varchar(255) DEFAULT NULL,
`ore` int(11) DEFAULT NULL,
`assenzeDidattiche` float DEFAULT NULL,
`assenzeAmministrative` float DEFAULT NULL,
`dataLezione` date DEFAULT NULL,
`ora_inizio` varchar(8) DEFAULT NULL,
`ora_fine` varchar(8) DEFAULT NULL,
`dataFineStage` date DEFAULT NULL,
`giustificata` varchar(128) DEFAULT NULL,
`motivazione` longtext,
`grouped` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idUtente` (`idUtente`) USING BTREE,
KEY `idServizio` (`idServizio`) USING BTREE,
KEY `dataLezione` (`dataLezione`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=574582 DEFAULT CHARSET=utf8;
ACQ table:
CREATE TABLE `ciac_serviziacquistati_ita` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idServizio` int(11) NOT NULL,
`idContatto` int(11) NOT NULL,
`idAzienda` int(11) NOT NULL,
`idSede` int(11) NOT NULL,
`tipoPersona` int(11) NOT NULL,
`num_registro` int(11) NOT NULL,
`codice` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`dal` date NOT NULL,
`al` date NOT NULL,
`ore` int(11) NOT NULL,
`costoOrario` decimal(10,0) NOT NULL,
`annoFormativo` varchar(128) CHARACTER SET latin1 NOT NULL,
`stato_attuale` int(11) NOT NULL,
`datore_attuale` int(11) NOT NULL,
`stato_allievo` varchar(64) CHARACTER SET latin1 NOT NULL DEFAULT 'corsista',
`data_ritiro` date DEFAULT NULL,
`crediti_formativi` int(11) NOT NULL,
`note` longtext CHARACTER SET latin1 NOT NULL,
`valore_economico` decimal(10,2) NOT NULL,
`dataInserimento` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idServizio` (`idServizio`) USING BTREE,
KEY `idAzienda` (`idAzienda`) USING BTREE,
KEY `idContatto` (`idContatto`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=9542 DEFAULT CHARSET=utf8;
this is my EXPLAIN of the select
Now because the query is slow, during 1.5s / 2.0s??
Something wrong?
UPDATE
added new index (with the John Bollinger's answer) to the table ciac_corsi_assenze:
PRIMARY KEY (`id`),
KEY `dataLezione` (`dataLezione`) USING BTREE,
KEY `test` (`idUtente`,`idServizio`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=574582 DEFAULT CHARSET=utf8;
added new index to the table ciac_serviziAcquistati_ITA:
PRIMARY KEY (`id`),
KEY `idAzienda` (`idAzienda`) USING BTREE,
KEY `test2` (`idContatto`,`idServizio`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=9542 DEFAULT CHARSET=utf8;
New EXPLAIN:
But it's always slow :(
Your tables have separate indexes on various columns of interest, but MySQL will use at most one index per table to perform your query. This particular query would probably be sped by table ciac_corsi_assenze having an index on (idUtente, idServizio) (and such an index would supersede the existing one on (idUtente) alone). That should allow MySQL to avoid sorting the result rows to perform the grouping, and it will help more in performing the join than any of the existing indexes do.
The query would probably be sped further by table ciac_serviziAcquistati_ITA having an index on (idContatto, idServizio), or even on (idContatto, idServizio, ritirato). Either of those would supersede the existing index on just (idContatto).
John went the right direction. However the order of columns in the composite index needs changing.
For the GROUP BY, this order is needed (on ASSNZ):
INDEX(idServizio, idUtente)
(and that should replace KEY(idServizio), but not KEY(idUtente))
Then ACQ needs, in this order:
INDEX(idContatto, idServizio, stato_allievo)
replacing only KEY(idContatto).

Large MySQL table slow to query column with unique index

I have a large MySQL table (36 million rows, 120 GB) that is unable to handle a simple query on an column with a UNIQUE KEY. Ex:
select * from items where item_id = 12345;
Is there some reason why the index isn't helping here or am I just way beyond what MySQL can handle in terms of table size? Any pointers?
Edit: My table create statement:
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`product_sku` int(11) DEFAULT NULL,
`item_id` varchar(19) NOT NULL DEFAULT '',
`title` tinytext NOT NULL,
`subtitle` tinytext,
`description` text,
`category_id` varchar(10) NOT NULL DEFAULT '',
`created_at` datetime NOT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `itemId` (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

MySql - Create view to read from Multiple Tables

I have archived some old line items for invoices that are no longer current but still need to reference them. I think I need to create a VIEW but not really understanding it. Can someone help so I can run a query to pull the invoice and then the total of all the line items assigned (no matter what table the items are in)?
CREATE TABLE `Invoice` (
`Invoice_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`Invoice_CreatedDateTime` DATETIME DEFAULT NULL,
`Invoice_Status` ENUM('Paid','Sent','Unsent','Hold') DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`)
) ENGINE=MYISAM DEFAULT CHARSET=latin1
CREATE TABLE `Invoice_LineItem` (
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MYISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
CREATE TABLE `Invoice_LineItem_Archived` (
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MYISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1
Typically I would just run the following query to get the amount due on the invoices
SELECT
Invoice_ID,
Invoice_CreatedDateTime,
Invoice_Status,
(SELECT SUM(LineItem_Amount) AS totAmt FROM Invoice_LineItem WHERE LineItem_InvoiceID=Invoice_ID) AS Invoice_Total
FROM
Invoice
WHERE
Invoice_Status='Sent'
Also how can I select all the line items from both tables in one query?
SELECT
LineItem_ID,
LineItem_ChargeType,
LineItem_Amount
FROM
Invoice_LineItem
WHERE
LineItem_InvoiceID='1234'
You can use the MERGE Storage Engine to create a virtual table that's the union of two real tables:
CREATE TABLE Invoice_LineItem_All
(
`LineItem_ID` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`LineItem_ChargeType` VARCHAR(64) NOT NULL DEFAULT '',
`LineItem_InvoiceID` INT(11) UNSIGNED DEFAULT NULL,
`LineItem_Amount` DECIMAL(11,4) DEFAULT NULL,
`LastUpdatedAt` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
KEY (`LineItem_ID`),
KEY `LastUpdatedAt` (`LastUpdatedAt`),
KEY `LineItem_InvoiceID` (`LineItem_InvoiceID`)
) ENGINE=MERGE UNION=(Invoice_LineItem_Archived, Invoice_LineItem);
You can use UNION :
SELECT a.* FROM a
UNION
SELECT b.* FROM b;
You just need to have the same number and type of column in your different queries.
As far as I remember, you can add test in sub-queries, but I'm not sure you can order on the global result.
http://dev.mysql.com/doc/refman/4.1/en/union.html

getting error 1503: A primary key must include all columns in the table's partitioning function

I have a table structure like-
CREATE TABLE `cdr` (`id` bigint(20) NOT NULL AUTO_INCREMENT,
`dataPacketDownLink` bigint(20) DEFAULT NULL,
`dataPacketUpLink` bigint(20) DEFAULT NULL,
`dataPlanEndTime` datetime DEFAULT NULL,
`dataPlanStartTime` datetime DEFAULT NULL,
`dataVolumeDownLink` bigint(20) DEFAULT NULL,
`dataVolumeUpLink` bigint(20) DEFAULT NULL,
`dataplan` varchar(255) DEFAULT NULL,
`dataplanType` varchar(255) DEFAULT NULL,
`createdOn` datetime DEFAULT NULL,
`deviceName` varchar(500) DEFAULT NULL,
`duration` int(11) NOT NULL,
`effectiveDuration` int(11) NOT NULL,
`hour` int(11) DEFAULT NULL,
`eventDate` datetime DEFAULT NULL,
`msisdn` bigint(20) DEFAULT NULL,
`quarter` int(11) DEFAULT NULL,
`validDays` int(11) DEFAULT NULL,
`dataLeft` bigint(20) DEFAULT NULL,
`completedOn` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `msisdn_index` (`msisdn`),
KEY `eventdate_index` (`eventDate`)
) ENGINE=MyISAM AUTO_INCREMENT=55925171 DEFAULT CHARSET=latin1
and when i am creating partition -
ALTER TABLE cdr PARTITION BY RANGE (TO_DAYS(eventdate)) (
PARTITION p01 VALUES LESS THAN (TO_DAYS('2013-09-01')),
PARTITION p02 VALUES LESS THAN (TO_DAYS('2013-09-15')),
PARTITION p03 VALUES LESS THAN (TO_DAYS('2013-09-30')),
PARTITION p04 VALUES LESS THAN (MAXVALUE));
getting the
error 1503: A primary key must include all columns in the table's partitioning function
i have read everywhere about this but not getting anything so please let me know how to partition this table. i have 20+ million records in it.
Thank you.
I have already solved this problem by adding eventdate with primary key.
Possible solutions:
change eventdate to eventDate on 'ALTER TABLE cdr PARTITION BY RANGE (TO_DAYS(eventdate)) '
change eventDate to timestamp. (mysql can't do partition on datetime)

Why do i HAVE to optimize tables?

I have a pretty big table with contains about 3 million records.
When running a very simple query, joining this table on a few others (all with indexes and/or primary keys), the query will take about 25 seconds to complete!
The value of "Handler_read_next" is about 7 million!
Number of requests to read the next row in key order, incremented if you are querying an index column with a range constraint or if you are doing an index scan.
This problem have only started since this table began to grow big.
Now if I do an "optimize tables" on this table, the query will run in about 0.02 seconds and "Handler_read_next" will have a value of about 1500.
How can the difference be so extreme, and do I really have to setup a scheduled query, optimizing this table once a week or so? Even so, I would like to know the meaning behind this and why mysql behaves like this. Sure, rows are deleted and updated pretty much in this table, but should it get so badly fragmented in only one week that the query goes from 0.02 sec to 25 sec?
Edit: After request, here comes the query in question:
SELECT *
FROM budget_expenses
JOIN budget_categories
ON budget_categories.BudgetAreaId = budget_expenses.BudgetAreaId
AND budget_categories.BudgetCategoryId = budget_expenses.BudgetCategoryId
LEFT JOIN budget_types
ON budget_types.BudgetAreaId = budget_expenses.BudgetAreaId
AND budget_types.BudgetCategoryId = budget_expenses.BudgetCategoryId
AND budget_types.BudgetTypeId = budget_expenses.BudgetTypeId
WHERE budget_expenses.BudgetId = 1
AND budget_expenses.ExpenseDate >= '2012-11-25'
AND budget_expenses.ExpenseDate <= '2012-12-24'
AND budget_expenses.BudgetAreaId = 2
ORDER BY budget_expenses.ExpenseDate DESC,
budget_expenses.ExpenseTime IS NULL ASC,
budget_expenses.ExpenseTime DESC
(BudgetAreaId, BudgetCategoryId) is the primary key in budget_categories and (BudgetAreaId, BudgetCategoryId, BudgetTypeId) is the primary key in budget_types. In budget_expenses these 3 keys are indexes and also ExpenseDate has an index. This query returns about 20 rows.
Show create table:
CREATE TABLE `budget_areas` (
`BudgetAreaId` int(11) NOT NULL,
`Name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `budget_categories` (
`BudgetAreaId` int(11) NOT NULL,
`BudgetCategoryId` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(255) DEFAULT NULL,
`SortOrder` int(11) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`,`BudgetCategoryId`),
KEY `BudgetAreaId` (`BudgetAreaId`,`BudgetCategoryId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE `budget_types` (
`BudgetAreaId` int(11) NOT NULL,
`BudgetCategoryId` int(11) NOT NULL,
`BudgetTypeId` int(11) NOT NULL,
`Name` varchar(255) DEFAULT NULL,
`SortId` int(11) DEFAULT NULL,
PRIMARY KEY (`BudgetAreaId`,`BudgetCategoryId`,`BudgetTypeId`),
KEY `BudgetAreaId` (`BudgetAreaId`,`BudgetCategoryId`,`BudgetTypeId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
CREATE TABLE `budget_expenses` (
`ExpenseId` int(11) NOT NULL AUTO_INCREMENT,
`BudgetId` int(11) NOT NULL,
`TempId` int(11) DEFAULT NULL,
`BudgetAreaId` int(11) DEFAULT NULL,
`BudgetCategoryId` int(11) DEFAULT NULL,
`BudgetTypeId` int(11) DEFAULT NULL,
`Company` varchar(255) DEFAULT NULL,
`ImportCompany` varchar(255) DEFAULT NULL,
`Sum` double(50,2) DEFAULT NULL,
`ExpenseDate` date DEFAULT NULL,
`ExpenseTime` time DEFAULT NULL,
`Inserted` datetime DEFAULT NULL,
`Changed` datetime DEFAULT NULL,
`InsertType` int(1) DEFAULT NULL,
`AccountId` int(11) DEFAULT NULL,
`BankCardId` int(11) DEFAULT NULL,
PRIMARY KEY (`ExpenseId`),
KEY `BudgetId` (`BudgetId`),
KEY `AccountId` (`AccountId`),
KEY `Company` (`Company`) USING BTREE,
KEY `ExpenseDate` (`ExpenseDate`),
KEY `BudgetAreaId` (`BudgetAreaId`),
KEY `BudgetCategoryId` (`BudgetCategoryId`),
KEY `BudgetTypeId` (`BudgetTypeId`),
CONSTRAINT `budget_expenses_ibfk_1` FOREIGN KEY (`BudgetId`) REFERENCES `budgets` (`BudgetId`)
) ENGINE=InnoDB AUTO_INCREMENT=3604462 DEFAULT CHARSET=latin1
After I copy pasted this I changed from MyIsam to Innodb on the budget_categories table.
Edit: The change from myisam to innodb didn't make any difference. The query is now very slow, just 12 hours after i optimized the budget_expenses table!
Here is the explain for the query which now takes about 9 seconds:
http://jsfiddle.net/dmVPY/1/
Ahhh MyISAM....
Try changing the table type (aka 'storage engine') to InnoDB instead.
If you do this, make sure innodb_buffer_pool_size in your my.cnf is a sensible value - the default is too small.