mySQL query explanation needed - mysql

I have following mysql query:
CREATE TABLE `sampledata`.`ORDERFACT` (
`ORDERNUMBER` int(11) DEFAULT NULL,
`PRODUCTCODE` varchar(50) DEFAULT NULL,
`QUANTITYORDERED` int(11) DEFAULT NULL,
`PRICEEACH` decimal(17,5) DEFAULT NULL,
`ORDERLINENUMBER` int(11) DEFAULT NULL,
`TOTALPRICE` decimal(17,0) DEFAULT NULL,
`ORDERDATE` datetime DEFAULT NULL,
`REQUIREDDATE` datetime DEFAULT NULL,
`SHIPPEDDATE` datetime DEFAULT NULL,
`STATUS` varchar(15) DEFAULT NULL,
`COMMENTS` longtext,
`CUSTOMERNUMBER` int(11) DEFAULT NULL,
`TIME_ID` varchar(10) DEFAULT NULL,
`QTR_ID` bigint(20) DEFAULT NULL,
`MONTH_ID` bigint(20) DEFAULT NULL,
`YEAR_ID` bigint(20) DEFAULT NULL,
KEY `idx_ORDERFACT_lookup` (`ORDERNUMBER`,`PRODUCTCODE`,`QUANTITYORDERED`,`PRICEEACH`,`ORDERLINENUMBER`,`TOTALPRICE`,`ORDERDATE`,`REQUIREDDATE`,`SHIPPEDDATE`,`STATUS`,`CUSTOMERNUMBER`,`TIME_ID`,`QTR_ID`,`MONTH_ID`,`YEAR_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Can someone explain me what will do the last three lines:
(KEY `idx_ORDERFACT_lookup` ...)

(KEY idx_ORDERFACT_lookup ...)
will create a composite primary key on all the columns in the bracket.
If you want to learn more read MySQL :: MySQL 5.0 Reference Manual :: 7.5.2 Multiple-Column Indexes
Addendum:
Actually if you are designing this table I might have few suggestions for you:
It is more cross-platform compatible to use small letters in columns names for some systems/dbrms are not case sensitive
PRICEEACH... do you really need to store 5 decimal places for price? And then TOTALPRICE has 0 decimal positions...
Those monthid, quarterid itd look sort of weird, especially being BIGINT but probably you have a requirement for that.
Just a few thoughts.

(KEY idx_ORDERFACT_lookup ...) will add an index to the table.
If you really want to dig deep then visit:
http://dev.mysql.com/doc/refman/5.0/en/create-table.html
It is 12.1.10 topic. Focus on third grey box. You will get something like this:
{INDEX|KEY} [index_name] [index_type] (index_col_name,...) [index_type]
Everything is square brackets is optional.
Hope it helps!!!

Related

MySQL - ordering by column with many different values in big table

Probably through poor database design, the following really simple query is taking ~1.5 minutes to run.
SELECT s.title, t.name AS team_name
FROM stories AS s
JOIN teams AS t ON s.team_id = t.id
WHERE s.pubdate >= "1970-01-01 00:00"
ORDER BY s.hits /* <-- here's the problem */
LIMIT 3 OFFSET 0
The problem is the stories table is fairly big, with ~1.5m rows, and there's a ton of unique values for hits (this column logs the hits to each story.)
Take out the order clause and it resolves almost instantly.
Question: what can I do to optimise for queries like this? Presumably I shouldn't apply an index to hits since direct no look-ups take place on that column.
[UPDATE]
SHOW CREATE TABLE for all tables concerned:
CREATE TABLE stories (
`id` varchar(11) NOT NULL,
`link` text NOT NULL,
`title` varchar(255) CHARACTER SET utf8 NOT NULL,
`description` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`pubdate` datetime NOT NULL,
`source_id` varchar(11) NOT NULL,
`team_id` varchar(11) NOT NULL,
`hits` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `Unique combo (title + date)` (`title`,`pubdate`),
KEY `team (FK)` (`team_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE teams (
`id` varchar(11) NOT NULL,
`is_live` enum('y') DEFAULT NULL,
`name` varchar(50) NOT NULL,
`short_name` varchar(12) DEFAULT NULL,
`server` varchar(11) DEFAULT NULL,
`url_token` varchar(255) NOT NULL,
`league` varchar(11) NOT NULL,
`away_game_id` varchar(255) DEFAULT NULL,
`digest_list_id` varchar(25) DEFAULT NULL,
`twitter_handle` varchar(255) DEFAULT NULL,
`no_official_news` enum('y') DEFAULT NULL,
`alt_names` varchar(255) DEFAULT NULL,
`no_use_nickname` enum('y') DEFAULT NULL,
`official_hashtag` varchar(30) DEFAULT NULL,
`merge_news_and_fans` enum('y') DEFAULT NULL,
`colour_1` varchar(6) NOT NULL,
`colour_2` varchar(6) DEFAULT NULL,
`colour_3` varchar(6) DEFAULT NULL,
`link_colour_modifier` float DEFAULT NULL,
`alt_link_colour_modifier` float DEFAULT NULL,
`title_shade` enum('dark','light') NOT NULL,
`shirt_style` enum('vert_stripes','horiz_stripes','vert_stripes_thin','horiz_stripes_thin','vert_split','horiz_split') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `URL token` (`url_token`),
KEY `league (FK)` (`league`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Consider removing the filter on pubdate if the user does not need it. It confuses the optimizer.
INDEX(hits, pubdate, title)
will probably help the query the most. It is "covering".
The reason why removing ORDER BY runs fast: Without it, it gives you any 3 rows. With it, and without a useful index, it needs to sort the 1.5M rows to discover the 3 with the least number of hits.
Perhaps you wanted ORDER BY s.hits DESC? -- to get those with the most hits.

MySQL index help - which is faster?

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'
As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`
The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).

MySQL: What are the fields required to make a paypal payment credits system in the database?

Right now I have 3 tables:
CREATE TABLE `tbl_payment_info` (
`payment_info_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`payment_type` tinyint(2) NOT NULL, (for users to choose paypal or credit card)
`num_credits` int(11) NOT NULL,
`price_per_credits` float NOT NULL,
`payment_time_stamp` datetime NOT NULL,
PRIMARY KEY (`payment_info_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tbl_paypal_info` (
`paypal_info_id` int(11) NOT NULL AUTO_INCREMENT,
`paypal_email` varchar(100) DEFAULT NULL,
`payment_info_id` int(11) NOT NULL,
PRIMARY KEY (`paypal_info_id`),
KEY `tbl_paypal_info_ibfk_1` (`payment_info_id`),
CONSTRAINT `tbl_paypal_info_ibfk_1` FOREIGN KEY (`payment_info_id`) REFERENCES `tbl_payment_info` (`payment_info_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `tbl_credit_card_info` (
`credit_card_info_id` int(11) NOT NULL,
`payment_info_id` int(11) NOT NULL,
`card_fullname` varchar(128) NOT NULL,
`credit_number` varchar(20) NOT NULL,
`credit_card_type` tinyint(2) NOT NULL,
`billing_address_Line_1` varchar(100) NOT NULL,
`billing_address_line_2` varchar(100) DEFAULT NULL,
`billing_city` varchar(50) NOT NULL,
`billing_country` varchar(50) NOT NULL,
`billing_phone` varchar(50) DEFAULT NULL,
PRIMARY KEY (`credit_card_info_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
am I missing anything? is there any better ways to write it more efficiently? Any advice will be helpful. Thanks in advance.
No idea if these might be all the fields missing, but the ones that stick out on the credit card table are:
billing_state
billing_zip
credit_exp_date
On the PayPal table, you might also want to save the transactionId. I don't know if this is the intent of the payment_info column you have in that table, but if it is, you might want to change the field type from int to varchar, as paypal returns a mixture of letters and numbers in their transaction id's
And one comment, credit card numbers are at most 16 digits long. You have a 20 char field for it.
My final comment on efficiency, is that I wouldn't waste space using UTF-8 on the first table. It doesn't really look as if it will be storing international characters.

django fulltext search mysql

I have the default User model in django as per below:
delimiter $$
CREATE TABLE `auth_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) NOT NULL,
`first_name` varchar(30) NOT NULL,
`last_name` varchar(30) NOT NULL,
`email` varchar(75) NOT NULL,
`password` varchar(128) NOT NULL,
`is_staff` tinyint(1) NOT NULL,
`is_active` tinyint(1) NOT NULL,
`is_superuser` tinyint(1) NOT NULL,
`last_login` datetime NOT NULL,
`date_joined` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `username` (`username`)
) ENGINE=InnoDB AUTO_INCREMENT=26 DEFAULT CHARSET=latin1$$
I tried to add full text searching by alter table 'auth_user' add fulltext(first_name), last_name, email), and I keep getting the error Error Code: 1214. The used table type doesn't support FULLTEXT indexes. Is there a reason why this doesn't support full text searching? I'm thinking it may be because I extended the model and added my own table?
Fulltext indices only work on MyISAM tables, and yours is InnoDB. source
You may want to use Django-Sphinx for full text search or look at their implementation. https://github.com/dcramer/django-sphinx

Is there a better index to speed up this query?

The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.