I need to design tables which stores all the metadata of files (i.e., file name, author, title, date created), and custom metadata (which has been added to files by users, e.g. CustUseBy, CustSendBy). The number of custom metadata fields cannot be set beforehand. Indeed, the only way of determining what and how many custom tags have been added on files is to examine what exists in the tables.
To store this, I have created a base table (having all common metadata of files), an Attributes table (holding additional, optional attributes that may be set on files) and a FileAttributes table (which assigns a value to an attribute for a file).
CREAT TABLE FileBase (
id VARCHAR(32) PRIMARY KEY,
name VARCHAR(255) UNIQUE NOT NULL,
title VARCHAR(255),
author VARCHAR(255),
created DATETIME NOT NULL,
) Engine=InnoDB;
CREATE TABLE Attributes (
id VARCHAR(32) PRIMARY KEY,
name VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL
) Engine=InnoDB;
CREATE TABLE FileAttributes (
sNo INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
fileId VARCHAR(32) NOT NULL,
attributeId VARCHAR(32) NOT NULL,
attributeValue VARCHAR(255) NOT NULL,
FOREIGN KEY fileId REFERENCES FileBase (id),
FOREIGN KEY attributeId REFERENCES Attributes (id)
) Engine=InnoDB;
Sample data:
INSERT INTO FileBase
(id, title, author, name, created)
VALUES
('F001', 'Dox', 'vinay', 'story.dox', '2009/01/02 15:04:05'),
('F002', 'Excel', 'Ajay', 'data.xls', '2009/02/03 01:02:03');
INSERT INTO Attributes
(id, name, type)
VALUES
('A001', 'CustomeAttt1', 'Varchar(40)'),
('A002', 'CustomUseDate', 'Datetime');
INSERT INTO FileAttributes
(fileId, attributeId, attributeValue)
VALUES
('F001', 'A001', 'Akash'),
('F001', 'A002', '2009/03/02');
Now the problem is I want to show the data in a manner like this:
FileId, Title, Author, CustomAttri1, CustomAttr2, ...
F001 Dox vinay Akash 2009/03/02 ...
F002 Excel Ajay
What query will generate this result?
The question mentions MySQL, and in fact this DBMS has a special function for this kind of problem: GROUP_CONCAT(expr). Take a look in the MySQL reference manual on group-by-functions. The function was added in MySQL version 4.1. You'll be using GROUP BY FileID in the query.
I'm not really sure about how you want the result to look. If you want every attribute listed for every item (even if not set), it will be harder. However, this is my suggestion for how to do it:
SELECT bt.FileID, Title, Author,
GROUP_CONCAT(
CONCAT_WS(':', at.AttributeName, at.AttributeType, avt.AttributeValue)
ORDER BY at.AttributeName SEPARATOR ', ')
FROM BaseTable bt JOIN AttributeValueTable avt ON avt.FileID=bt.FileID
JOIN AttributeTable at ON avt.AttributeId=at.AttributeId
GROUP BY bt.FileID;
This gives you all attributes in the same order, which could be useful. The output will be like the following:
'F001', 'Dox', 'vinay', 'CustomAttr1:varchar(40):Akash, CustomUseDate:Datetime:2009/03/02'
This way you only need one single DB query, and the output is easy to parse. If you want to store the attributes as real Datetime etc. in the DB, you'd need to use dynamic SQL, but I'd stay clear from that and store the values in varchars.
If you're looking for something more usable (and joinable) than a group-concat result, try this solution below. I've created some tables very similar to your example to make this make sense.
This works when:
You want a pure SQL solution (no code, no loops)
You have a predictable set of attributes (e.g. not dynamic)
You are OK updating the query when new attribute types need to be added
You would prefer a result that can be JOINed to, UNIONed, or nested as a subselect
Table A (Files)
FileID, Title, Author, CreatedOn
Table B (Attributes)
AttrID, AttrName, AttrType [not sure how you use type...]
Table C (Files_Attributes)
FileID, AttrID, AttrValue
A traditional query would pull many redundant rows:
SELECT * FROM
Files F
LEFT JOIN Files_Attributes FA USING (FileID)
LEFT JOIN Attributes A USING (AttributeID);
AttrID FileID Title Author CreatedOn AttrValue AttrName AttrType
50 1 TestFile Joe 2011-01-01 true ReadOnly bool
60 1 TestFile Joe 2011-01-01 xls FileFormat text
70 1 TestFile Joe 2011-01-01 false Private bool
80 1 TestFile Joe 2011-01-01 2011-10-03 LastModified date
60 2 LongNovel Mary 2011-02-01 json FileFormat text
80 2 LongNovel Mary 2011-02-01 2011-10-04 LastModified date
70 2 LongNovel Mary 2011-02-01 true Private bool
50 2 LongNovel Mary 2011-02-01 true ReadOnly bool
50 3 ShortStory Susan 2011-03-01 false ReadOnly bool
60 3 ShortStory Susan 2011-03-01 ascii FileFormat text
70 3 ShortStory Susan 2011-03-01 false Private bool
80 3 ShortStory Susan 2011-03-01 2011-10-01 LastModified date
50 4 ProfitLoss Bill 2011-04-01 false ReadOnly bool
70 4 ProfitLoss Bill 2011-04-01 true Private bool
80 4 ProfitLoss Bill 2011-04-01 2011-10-02 LastModified date
60 4 ProfitLoss Bill 2011-04-01 text FileFormat text
50 5 MonthlyBudget George 2011-05-01 false ReadOnly bool
60 5 MonthlyBudget George 2011-05-01 binary FileFormat text
70 5 MonthlyBudget George 2011-05-01 false Private bool
80 5 MonthlyBudget George 2011-05-01 2011-10-20 LastModified date
This coalescing query (approach using MAX) can merge the rows:
SELECT
F.*,
MAX( IF(A.AttrName = 'ReadOnly', FA.AttrValue, NULL) ) as 'ReadOnly',
MAX( IF(A.AttrName = 'FileFormat', FA.AttrValue, NULL) ) as 'FileFormat',
MAX( IF(A.AttrName = 'Private', FA.AttrValue, NULL) ) as 'Private',
MAX( IF(A.AttrName = 'LastModified', FA.AttrValue, NULL) ) as 'LastModified'
FROM
Files F
LEFT JOIN Files_Attributes FA USING (FileID)
LEFT JOIN Attributes A USING (AttributeID)
GROUP BY
F.FileID;
FileID Title Author CreatedOn ReadOnly FileFormat Private LastModified
1 TestFile Joe 2011-01-01 true xls false 2011-10-03
2 LongNovel Mary 2011-02-01 true json true 2011-10-04
3 ShortStory Susan 2011-03-01 false ascii false 2011-10-01
4 ProfitLoss Bill 2011-04-01 false text true 2011-10-02
5 MonthlyBudget George 2011-05-01 false binary false 2011-10-20
The general form of such a query would be
SELECT file.*,
attr1.value AS 'Attribute 1 Name',
attr2.value AS 'Attribute 2 Name',
...
FROM
file
LEFT JOIN attr AS attr1
ON(file.FileId=attr1.FileId and attr1.AttributeId=1)
LEFT JOIN attr AS attr2
ON(file.FileId=attr2.FileId and attr2.AttributeId=2)
...
So you need to dynamically build your query from the attributes you need. In php-ish pseudocode
$cols="file";
$joins="";
$rows=$db->GetAll("select * from Attributes");
foreach($rows as $idx=>$row)
{
$alias="attr{$idx}";
$cols.=", {$alias}.value as '".mysql_escape_string($row['AttributeName'])."'";
$joins.="LEFT JOIN attr as {$alias} on ".
"(file.FileId={$alias}.FileId and ".
"{$alias}.AttributeId={$row['AttributeId']}) ";
}
$pivotsql="select $cols from file $joins";
This is the standard "rows to columns" problem in SQL.
It is most easily done outside SQL.
In your application, do the following:
Define a simple class to contain the file, the system attributes, and a Collection of user attributes. A list is a good choice for this collection of customer attributes. Let's call this class FileDescription.
Execute a simple join between the file and all of the customer attributes for the file.
Write a loop to assemble FileDescriptions from the query result.
Fetch the first row, create a FileDescription and set the first customer attribute.
While there are more rows to fetch:
Fetch a row
If this row's file name does not match the FileDescription we're building: finish building a FileDescription; append this to a result Collection of File Descriptions; create a fresh, empty FileDescription with the given name and first customer attribute.
If this row's file name matches the FileDescription we're building: append another customer attribute to the current FileDescription
I have been experimenting with the different answers and Methai's answer was the most convenient for me. My current project, although it does uses Doctrine with MySQL, has quite a few loose tables.
The following is the result of my experience with Methai's solution:
create entity table
DROP TABLE IF EXISTS entity;
CREATE TABLE entity (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
author VARCHAR(255),
createdOn DATETIME NOT NULL
) Engine = InnoDB;
create attribute table
DROP TABLE IF EXISTS attribute;
CREATE TABLE attribute (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
type VARCHAR(255) NOT NULL
) Engine = InnoDB;
create attributevalue table
DROP TABLE IF EXISTS attributevalue;
CREATE TABLE attributevalue (
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
value VARCHAR(255) NOT NULL,
attribute_id INT UNSIGNED NOT NULL,
FOREIGN KEY(attribute_id) REFERENCES attribute(id)
) Engine = InnoDB;
create entity_attributevalue join table
DROP TABLE IF EXISTS entity_attributevalue;
CREATE TABLE entity_attributevalue (
entity_id INT UNSIGNED NOT NULL,
attributevalue_id INT UNSIGNED NOT NULL,
FOREIGN KEY(entity_id) REFERENCES entity(id),
FOREIGN KEY(attributevalue_id) REFERENCES attributevalue(id)
) Engine = InnoDB;
populate entity table
INSERT INTO entity
(title, author, createdOn)
VALUES
('TestFile', 'Joe', '2011-01-01'),
('LongNovel', 'Mary', '2011-02-01'),
('ShortStory', 'Susan', '2011-03-01'),
('ProfitLoss', 'Bill', '2011-04-01'),
('MonthlyBudget', 'George', '2011-05-01'),
('Paper', 'Jane', '2012-04-01'),
('Essay', 'John', '2012-03-01'),
('Article', 'Dan', '2012-12-01');
populate attribute table
INSERT INTO attribute
(name, type)
VALUES
('ReadOnly', 'bool'),
('FileFormat', 'text'),
('Private', 'bool'),
('LastModified', 'date');
populate attributevalue table
INSERT INTO attributevalue
(value, attribute_id)
VALUES
('true', '1'),
('xls', '2'),
('false', '3'),
('2011-10-03', '4'),
('true', '1'),
('json', '2'),
('true', '3'),
('2011-10-04', '4'),
('false', '1'),
('ascii', '2'),
('false', '3'),
('2011-10-01', '4'),
('false', '1'),
('text', '2'),
('true', '3'),
('2011-10-02', '4'),
('false', '1'),
('binary', '2'),
('false', '3'),
('2011-10-20', '4'),
('doc', '2'),
('false', '3'),
('2011-10-20', '4'),
('rtf', '2'),
('2011-10-20', '4');
populate entity_attributevalue table
INSERT INTO entity_attributevalue
(entity_id, attributevalue_id)
VALUES
('1', '1'),
('1', '2'),
('1', '3'),
('1', '4'),
('2', '5'),
('2', '6'),
('2', '7'),
('2', '8'),
('3', '9'),
('3', '10'),
('3', '11'),
('3', '12'),
('4', '13'),
('4', '14'),
('4', '15'),
('4', '16'),
('5', '17'),
('5', '18'),
('5', '19'),
('5', '20'),
('6', '21'),
('6', '22'),
('6', '23'),
('7', '24'),
('7', '25');
Showing all the records
SELECT *
FROM `entity` e
LEFT JOIN `entity_attributevalue` ea ON ea.entity_id = e.id
LEFT JOIN `attributevalue` av ON ea.attributevalue_id = av.id
LEFT JOIN `attribute` a ON av.attribute_id = a.id;
id title author createdOn entity_id attributevalue_id id value attribute_id id name type
1 TestFile Joe 2011-01-01 00:00:00 1 1 1 true 1 1 ReadOnly bool
1 TestFile Joe 2011-01-01 00:00:00 1 2 2 xls 2 2 FileFormat text
1 TestFile Joe 2011-01-01 00:00:00 1 3 3 false 3 3 Private bool
1 TestFile Joe 2011-01-01 00:00:00 1 4 4 2011-10-03 4 4 LastModified date
2 LongNovel Mary 2011-02-01 00:00:00 2 5 5 true 1 1 ReadOnly bool
2 LongNovel Mary 2011-02-01 00:00:00 2 6 6 json 2 2 FileFormat text
2 LongNovel Mary 2011-02-01 00:00:00 2 7 7 true 3 3 Private bool
2 LongNovel Mary 2011-02-01 00:00:00 2 8 8 2011-10-04 4 4 LastModified date
3 ShortStory Susan 2011-03-01 00:00:00 3 9 9 false 1 1 ReadOnly bool
3 ShortStory Susan 2011-03-01 00:00:00 3 10 10 ascii 2 2 FileFormat text
3 ShortStory Susan 2011-03-01 00:00:00 3 11 11 false 3 3 Private bool
3 ShortStory Susan 2011-03-01 00:00:00 3 12 12 2011-10-01 4 4 LastModified date
4 ProfitLoss Bill 2011-04-01 00:00:00 4 13 13 false 1 1 ReadOnly bool
4 ProfitLoss Bill 2011-04-01 00:00:00 4 14 14 text 2 2 FileFormat text
4 ProfitLoss Bill 2011-04-01 00:00:00 4 15 15 true 3 3 Private bool
4 ProfitLoss Bill 2011-04-01 00:00:00 4 16 16 2011-10-02 4 4 LastModified date
5 MonthlyBudget George 2011-05-01 00:00:00 5 17 17 false 1 1 ReadOnly bool
5 MonthlyBudget George 2011-05-01 00:00:00 5 18 18 binary 2 2 FileFormat text
5 MonthlyBudget George 2011-05-01 00:00:00 5 19 19 false 3 3 Private bool
5 MonthlyBudget George 2011-05-01 00:00:00 5 20 20 2011-10-20 4 4 LastModified date
6 Paper Jane 2012-04-01 00:00:00 6 21 21 binary 2 2 FileFormat text
6 Paper Jane 2012-04-01 00:00:00 6 22 22 false 3 3 Private bool
6 Paper Jane 2012-04-01 00:00:00 6 23 23 2011-10-20 4 4 LastModified date
7 Essay John 2012-03-01 00:00:00 7 24 24 binary 2 2 FileFormat text
7 Essay John 2012-03-01 00:00:00 7 25 25 2011-10-20 4 4 LastModified date
8 Article Dan 2012-12-01 00:00:00 NULL NULL NULL NULL NULL NULL NULL NULL
pivot table
SELECT e.*,
MAX( IF(a.name = 'ReadOnly', av.value, NULL) ) as 'ReadOnly',
MAX( IF(a.name = 'FileFormat', av.value, NULL) ) as 'FileFormat',
MAX( IF(a.name = 'Private', av.value, NULL) ) as 'Private',
MAX( IF(a.name = 'LastModified', av.value, NULL) ) as 'LastModified'
FROM `entity` e
LEFT JOIN `entity_attributevalue` ea ON ea.entity_id = e.id
LEFT JOIN `attributevalue` av ON ea.attributevalue_id = av.id
LEFT JOIN `attribute` a ON av.attribute_id = a.id
GROUP BY e.id;
id title author createdOn ReadOnly FileFormat Private LastModified
1 TestFile Joe 2011-01-01 00:00:00 true xls false 2011-10-03
2 LongNovel Mary 2011-02-01 00:00:00 true json true 2011-10-04
3 ShortStory Susan 2011-03-01 00:00:00 false ascii false 2011-10-01
4 ProfitLoss Bill 2011-04-01 00:00:00 false text true 2011-10-02
5 MonthlyBudget George 2011-05-01 00:00:00 false binary false 2011-10-20
6 Paper Jane 2012-04-01 00:00:00 NULL binary false 2011-10-20
7 Essay John 2012-03-01 00:00:00 NULL binary NULL 2011-10-20
8 Article Dan 2012-12-01 00:00:00 NULL NULL NULL NULL
However there are solutions to use lines as columns, aka transpose the data.
It involve query tricks to do it in pure SQL, or you will have to rely on certain features only avaible in certain database, using Pivot tables (or Cross tables).
As exemple you can see how to do this here in Oracle (11g).
The programming version will be simplier to maintain and to make and moreover will work with any database.
Partial answer since I do not know MySQL (well). In MSSQL I would look at Pivot tables or would create a temporary table in a stored procedure. It may well be a hard time ...
I want to get the greatest (or lowest) value in a field for a specific value of a different field but I am a bit lost. I am already aware of answered questions on the topic, but I already have a join in my query and I can't apply the terrific answers I found on my specific problem.
I have two tables, namely register and records. Records has all (weather) stations listed once for each month (each stationid represented 12 times, if complete data exists, a stationid can thus not be presented more than 12 times), and register has all stations listed with some of their characteristics. For the sake of the example, the two tables look pretty much like this:
CREATE TABLE IF NOT EXISTS `records` (
`stationid` varchar(30),
`month` int(11),
`tmin` decimal(3,1),
`tmax` decimal(3,1),
`commentsmax` text,
`commentsmin` text,
UNIQUE KEY `webcode` (`stationid`,`month`)
);
INSERT INTO `records` (`stationid`, `month`, `tmin`, `tmax`, `commentsmin`, `commentsmax`) VALUES
('station1', 7, '10.0', '46.0', 'Extremely low temperature.', 'Very high temperature.'),
('station2', 7, '15.0', '48.0', 'Very low temperature.', 'Extremely low temperature.'),
('station1', 1, '-10', '15', 'Extremely low temperature.', 'Somewhat high temperature.');
CREATE TABLE IF NOT EXISTS `register` (
`stationid` varchar(30),
`stationname` varchar(40),
`stationowner` varchar(10),
`georegion` varchar(40),
`altitude` int(4),
KEY `stationid` (`stationid`)
);
INSERT INTO `register` (`stationid`, `stationname`, `stationowner`, `georegion`, `altitude`) VALUES
('station1', 'Halifax', 'Maria', 'the North', 16),
('station2', 'Leeds', 'Peter', 'the South', 240);
The desired output is:
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
| stationname | month | tmin | stationowner | georegion | altitude | commentsmin |
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
| Leeds | 7 | 15.0 | Peter | the South | 240 | Very low temperature |
| Halifax | 1 | -10.0 | Maria | the North | 16 | Extremely low temperature |
+-------------+-------+-------+---------------+-----------+----------+-----------------------------+
where each station appears only one with the lowest temperatures from table 'records', including some station properties from the table 'register'. I am using the following code:
SELECT register.stationname, records.month, min(records.tmin), register.stationowner, register.georegion, register.altitude, records.commentsmin FROM records INNER JOIN register ON records.stationid=register.stationid GROUP BY records.stationid ORDER BY min(tmin) ASC
but it doesn't give the correct bits of the records table corresponding to the lowest tmin values BY stationid when there are many records in the tables.
I have seen solutions like this one here: MySQL Greatest N Results with Join Tables, but I just can't get my head around applying it on my two tables. I would be grateful for any ideas!
SELECT stuff
FROM some_table x
JOIN some_other_table y
ON y.something = x.something
JOIN
( SELECT something
, MIN(something_other_thing) min_a
FROM that_other_table
GROUP
BY something
) z
ON z.something = y.something
AND z.min_a = y.another_thing;
I have 2 tables
account
id | desc
Prices
id | desc
Table account stores users info while table prices stores several prices depending type of service.
Now, I need to assign prices that apply to each account.
I would like to display a result containg prices and an extra column that tells (in form of a list) the accounts that apply that service...
I was thinking on
CREATE TABLE `account` (
`id_account` smallint(2) unsigned PRIMARY KEY AUTO_INCREMENT,
`user` VARCHAR(55) ,
`pass` VARCHAR(55) ,
`descr` VARCHAR(250)
);
INSERT INTO account VALUES
(1,'67395' , 'pass1','DrHeL'),
(2,'12316' , 'pass2','DeHrL'),
(3,'92316' , 'pass3','EfL');
CREATE TABLE `prices`(
`id_price` smallint(2) unsigned PRIMARY KEY AUTO_INCREMENT,
`service` VARCHAR(40),
`cost_1_1Kg` double ,
`cost_4_1Kg` double ,
`cost_8_1Kg` double
);
INSERT INTO prices VALUES
(1,'laundry', 1.50, 2.00,5.00),
(2,'walk.' , 2.50, 3.00,4.00);
CREATE TABLE `account_prices` (
`id_account` smallint(2) unsigned NOT NULL,
`id_price` smallint(2) unsigned NOT NULL,
`descr` VARCHAR(250)
) ;
INSERT INTO account_prices VALUES
(1,1,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2'),
(2,1,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2'),
(3,1,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2'),
(1,2,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2'),
(2,2,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2'),
(3,2,'apply SERVICE WITH ID 1 AND SERVICE WITH ID 2');
This gives me
ID_ACCOUNT ID_PRICE DESCR USER PASS SERVICE COST_1_1KG COST_4_1KG COST_8_1KG
1 1 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 67395 pass1 laundry 1.5 2 5
2 1 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 12316 pass2 laundry 1.5 2 5
3 1 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 92316 pass3 laundry 1.5 2 5
1 2 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 67395 pass1 walk. 2.5 3 4
2 2 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 12316 pass2 walk. 2.5 3 4
3 2 apply SERVICE WITH ID 1 AND SERVICE WITH ID 2 92316 pass3 walk. 2.5 3 4
However I would like somethink like:
ID_PRICE SERVICE COST_1_1KG COST_4_1KG COST_8_1KG descr
1 laundry 1.5 2 5 acount 1, account 2, account 3
2 walk. 2.5 3 4 acount 1, account 2, account 3
How to do It?
Please take a look at the corresponding fiddle:
http://sqlfiddle.com/#!2/16f05/3
You can get your desired output with this query:
select ap.id_price
, p.service
, p.cost_1_1Kg
, p.cost_4_1Kg
, cost_8_1Kg
, group_concat(
concat('account ', ap.id_account)
order by ap.id_account
separator ', '
) as descr
from account_prices ap
inner join prices p using (id_price)
group by ap.id_price
Result:
| ID_PRICE | SERVICE | COST_1_1KG | COST_4_1KG | COST_8_1KG | DESCR |
|----------|---------|------------|------------|------------|---------------------------------|
| 1 | laundry | 1.5 | 2 | 5 | account 1, account 2, account 3 |
| 2 | walk. | 2.5 | 3 | 4 | account 1, account 2, account 3 |
Check updated SQL fiddle
How this works:
You only need to join account_prices with prices to get all the data you need.
To get the account x stuff, concatenate "account " with the value of id_account using the concat() function
Finally, to get a concatenated group of values, use the group_concat() function. It works like any other aggregate function, but instead of performing an operation (like sum() or count()), it concatenates the values of the column (or expression). You can define the order you want for the output and a custom separator (the default separator is ,.
Hope this helps you.
my query is taking around 2800 secs to get output.
we have indexes also but no luck.
my target is need to get the output with in 2 to 3 seconds.
if possible please re-write the query.
query:
select ttl.id, ttl.url, ttl.canonical_url_id
from t_target_url ttl
where ttl.own_domain_id=476 and ttl.type != 10
order by ttl.week_entrances desc
limit 550000;
Explain Plan:
+----+-------------+-------+------+--------------------------------+---------------------------+---------+-------+----------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------------------+---------------------------+---------+-------+----------+-----------------------------+
| 1 | SIMPLE | ttl | ref | own_domain_id_type_status,type | own_domain_id_type_status | 5 | const | 57871959 | Using where; Using filesort |
+----+-------------+-------+------+--------------------------------+---------------------------+---------+-------+----------+-----------------------------+
1 row in set (0.80 sec)
mysql> show create table t_target_url\G
*************************** 1. row ***************************
Table: t_target_url
Create Table: CREATE TABLE `t_target_url` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`own_domain_id` int(11) DEFAULT NULL,
`url` varchar(2000) NOT NULL,
`create_date` datetime DEFAULT NULL,
`friendly_name` varchar(255) DEFAULT NULL,
`section_name_id` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL,
`status` int(11) DEFAULT NULL,
`week_entrances` int(11) DEFAULT NULL COMMENT 'last 7 days entrances',
`week_bounces` int(11) DEFAULT NULL COMMENT 'last 7 days bounce',
`canonical_url_id` int(11) DEFAULT NULL COMMENT 'the primary URL ID, NOT allow canonical of canonical',
KEY `id` (`id`),
KEY `urlindex` (`url`(255)),
KEY `own_domain_id_type_status` (`own_domain_id`,`type`,`status`),
KEY `canonical_url_id` (`canonical_url_id`),
KEY `type` (`type`,`status`)
) ENGINE=InnoDB AUTO_INCREMENT=227984392 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (`type`)
(PARTITION p0 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (1) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (2) ENGINE = InnoDB,
PARTITION pEOW VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
1 row in set (0.00 sec)
Your query itself looks fine, however, the order by clause, and possible half-million records is probably your killer. I would add an index to help optimize that portion via
( own_domain_id, week_entrances, type )
So this way, you are first hitting your critical key "own_domain_id", and then getting everything already in order. The type is for != 10, thus any other type and would appear to cause more problems if that was in the second index position.
Comment Feedback.
For simplistic purposes, your critical key per the where clause is "ttl.own_domain_id=476". You only care about data for domain ID 476. Now, lets assume you have 15 "types" that span all different week entrances, such as
own_domain_id type week_entrances
476 1 1000
476 1 1700
476 1 850
476 2 15000
476 2 4250
476 2 12000
476 7 2500
476 7 5300
476 10 1250
476 10 4100
476 12 8000
476 12 3150
476 15 5750
476 15 27000
This obviously is not to scale of your half-million capacity, but shows sample data.
By having the type != 10, it will STILL have to blow through all the records for id=476, yet exclude only those with the type = 10. It then has to put all the data in order by the week entrances which would take more time. By having the week entrances as part of the key in the second position, THEN the type, the data WILL BE able to be optimized in the returned result set already in proper order. However, when it gets to the type of "!= 10", it will still skip over those quickly as they are encountered. Here would be the revised index data per above sample.
own_domain_id week_entrances type
476 850 1
476 1000 1
476 1250 10
476 1700 1
476 2500 7
476 3150 12
476 4100 10
476 4250 2
476 5300 7
476 5750 15
476 8000 12
476 12000 2
476 15000 2
476 27000 15
So, as you can see, the data is already pre-sorted per the index, and applying DESCENDING order is no problem for the engine, just pulls the records in reverse order and skips the 10's as they are found.
Does that help?
Additional comment feedback per Salman.
Think of this another way with a store with 10 different branch locations, each with their own sales. The transactions receipts are stored in boxes (literally). Think of how you would want to go through the boxes if you were looking for all transactions on a given date.
Box 1 = Store #1 only, and transactions sorted by date
Box 2 = Store #2 only, and transactions sorted by date
Box ...
Box 10 = Store #10 only, sorted by date.
You have to go through 10 boxes, pulling out all for a given date... Or in the original question, every transaction EXCEPT for one date, and you want them in order by dollar amount of transaction, regardless of date... What a mess that could be.
If you had the boxes pregroup sorted, regardless of store
Box 1 = Sales from $1 - $1000 (all properly sorted by amount)
Box 2 = Sales from $1001 - $2000 (properly sorted)
Box ...
Box 10... same...
You STILL have to go through all the boxes and put them in order, but at least, as you are looking through the transactions, you could just throw out the one for the date exclusion to ignore.
Indexes help pre-organize how the engine can best go through them for your criteria.