I need to define my table as
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`o_id` int(11) unsigned NOT NULL,
`m_name` varchar(45) NOT NULL,
`o_name` varchar(45) NOT NULL,
`customer_id` int(11) unsigned NOT NULL,
`client_id` tinyint(4) unsigned DEFAULT '1',
`set_id` tinyint(4) unsigned DEFAULT NULL,
`s1` tinyint(4) unsigned DEFAULT NULL,
`s2` tinyint(4) unsigned DEFAULT NULL,
`s3` tinyint(4) unsigned DEFAULT NULL,
`review` varchar(2045) DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `br_o_id_idx` (`order_id`),
KEY `br_date_idx` (`created_at`),
KEY `br_on_idx` (`operator_name`),
KEY `br_mn_idx` (`merchant_name`)
)
but as i am looking on sequelize documentation , it does not have support for tiny int with its size.
From the lack of ZERO_FILL in your table definition, I suspect tinyint(4) probably does not do what you think it does. From 11.2.5 Numeric Type Attributes:
MySQL supports an extension for optionally specifying the display width of integer data types in parentheses following the base keyword for the type.
...
The display width does not constrain the range of values that can be stored in the column.
...
For example, a column specified as SMALLINT(3) has the usual SMALLINT range of -32768 to 32767, and values outside the range permitted by three digits are displayed in full using more than three digits.
I'm not sure if other RDBMSs treat the number in parens differently, but from perusing the sequelize source it looks like they're under the same, incorrect, impression you are.
That being said, the important part of your schema, being that you want to store those fields as TINYINTs (using only a byte of storage to contain values between 0-255), is sadly not available in the Sequelize DataTypes. I might suggest opening a PR to add it...
On the other hand, if you really are looking for the ZEROFILL functionality, and need to specify that display width of 4, you could do something like Sequelize.INTEGER(4).ZEROFILL, but obviously, that would be pretty wasteful of space in your DB.
For MySQL, the Sequelize.BOOLEAN data type maps to TINYINT(1). See
https://github.com/sequelize/sequelize/blob/3e5b8772ef75169685fc96024366bca9958fee63/lib/data-types.js#L397
and
http://docs.sequelizejs.com/en/v3/api/datatypes/
As noted by #user866762, the number in parentheses only affects how the data is displayed, not how it is stored. So, TINYINT(1) vs. TINYINT(4) should have no effect on your data.
Related
I have a query that returns results from a single table based on the provided ID existing in a column in one of two, or both, tables. The DB schema for the relevant tables is provided below as well as the initial query and then what was later recommended to me by a peer. I go into some details below as to why this query works but I need to optimize it farther for larger datasets and pagination.
CREATE TABLE `killmails` (
`id` BIGINT(20) UNSIGNED NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`moon_id` BIGINT(20) NULL DEFAULT NULL,
`solar_system_id` BIGINT(20) UNSIGNED NOT NULL,
`war_id` BIGINT(20) NULL DEFAULT NULL,
`is_npc` TINYINT(1) NOT NULL DEFAULT '0',
`is_awox` TINYINT(1) NOT NULL DEFAULT '0',
`is_solo` TINYINT(1) NOT NULL DEFAULT '0',
`dropped_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`destroyed_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`fitted_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`total_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`killmail_time` DATETIME NOT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`, `hash`),
INDEX `total_value` (`total_value`),
INDEX `killmail_time` (`killmail_time`),
INDEX `solar_system_id` (`solar_system_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_attackers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_done` BIGINT(20) UNSIGNED NOT NULL,
`final_blow` TINYINT(1) NOT NULL DEFAULT '0',
`security_status` DECIMAL(17,15) NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`weapon_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `weapon_type_id` (`weapon_type_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_attackers_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_victim` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_taken` BIGINT(20) UNSIGNED NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NOT NULL,
`ship_value` DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
`pos_x` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_y` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_z` DECIMAL(30,10) NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_victim_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This first query is where the problem started:
SELECT
*
FROM
killmails k
LEFT JOIN killmail_attackers ka ON k.id = ka.killmail_id
LEFT JOIN killmail_victim kv ON k.id = kv.killmail_id
WHERE
ka.character_id = ?
OR kv.character_id = ?
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
This worked okay, but long query times. We optimized to this
SELECT
killmails.*,
FROM (
SELECT killmail_victim.killmail_id FROM killmail_victim
WHERE killmail_victim.corporation_id = ?
UNION
SELECT killmail_attackers.killmail_id FROM killmail_attackers
WHERE killmail_attackers.corporation_id = ?
) SELECTED_KMS
LEFT JOIN killmails ON killmails.id = SELECTED_KMS.killmail_id
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
I saw a huge improvement in query times when looking up killmails for characters, however when I started querying for larger datasets like corporation and alliance killmails, the query slows down. This is because the queries that are union'd together can potentially return large sets of data and the time it takes to read all that into memory so that the SELECTED_KMS table can be created is what I believe is taking so much time. Most of the time, with alliances, my connection to the database times out from the application. One alliance returned 900K killmailIDs from one of the union'd tables, not sure what the other returned.
I can easily add limit statements to the internal queries, but this will introduce a lot of complications when I get to paginating the data or when I introduce a feature to search for KMs by date for example.
I am looking for suggestions on how this query can be optimized and still allow for easy pagination in the near future.
Thank You
Change INDEX(corporation_id) in both tables to INDEX(corporation_id, killmail_id) so that the inner queries will be "covering".
In general, INDEX(a) is useless when you also have INDEX(a,b). Any query that needs just a, can use either of those indexes. (This rule does not apply to b; only the "leftmost" column(s).)
Where does killmails.id come from? It's not AUTO_INCREMENT; it is not alone in the PRIMARY KEY, so there is no specified "uniqueness" constraint. Is it unique by some other design? Is it computed somewhere else in the code? (I ask because I need a feel for its uniqueness and other characteristics.)
Add INDEX(id, killmails_time).
What version are you using?
Perhaps UNION ALL give the same results? It would be faster because it would not need to de-dup.
How much RAM do you have? What is the value of innodb_buffer_pool_size?
Do you really need 8-byte BIGINTs? Even if your application is using longlong (or whatever it calls it), you can probably change the schema without changing the app.
Do you need this much precision and range? DECIMAL(30,10) -- it takes 14 bytes each. DOUBLE would give you about 16 significant digits in 8 bytes, with a wider range of values (up to about 10^308). What "units" are you using? (Overkill for light-years or parsecs; inadequate for miles or km. Perhaps AUs? Then the bottom digit would be a precision of a few meters?)
The last few questions are aimed at shrinking the table and seeing if we can avoid it being as I/O-bound as it apparently is now.
Important
innodb_buffer_pool_size = 128M is terribly small, especially for a 32GB machine, and especially if your dataset is much bigger than 128MB. If there are not any other apps running on the server, bump that setting up to 20G.
CREATE TABLE `features` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`feature_name` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=utf8;
For int(size)
size means display width, that is used when the field has ZEROFILL specified.
For varchar(size)
the size means the max length(byte).
this parameter sometimes is used as display attribute, sometimes is used as storage limit. It's really confusing for web developer.
why are they designed like this?
Is it because of historical reasons?
Reference
Stack Overflow: int(11) vs. int(anything else)
Blog: https://alexander.kirk.at/2007/08/24/what-does-size-in-intsize-of-mysql-mean/
I need to create the table in the below format. But is returning the error in the partitioning area that is Error Code: 1659. Field 'fldassigndate' is of a not allowed type for this type of partitioning. How to resolve this error and make partitioning ?
CREATE TABLE tblattendancesetup (
fldattendanceid int(11) NOT NULL AUTO_INCREMENT,
flddept varchar(100) DEFAULT NULL,
fldemployee varchar(100) DEFAULT NULL,
fldintime varchar(20) DEFAULT NULL,
fldouttime varchar(20) DEFAULT NULL,
fldlateafter varchar(20) DEFAULT NULL,
fldearlybefore varchar(20) DEFAULT NULL,
fldweekoff varchar(20) DEFAULT NULL,
fldshiftname varchar(20) DEFAULT NULL,
fldassigndate varchar(20) DEFAULT NULL,
fldfromdate varchar(20) DEFAULT NULL,
fldtodate varchar(20) DEFAULT NULL,
fldrefid varchar(20) DEFAULT NULL,
UNIQUE KEY fldattendanceid (fldattendanceid),
KEY in_attendancesetup (fldemployee,fldintime,fldouttime,fldlateafter,fldearlybefore,fldfromdate,fldtodate,fldattendanceid),
KEY i_emp_tmp (fldemployee),
KEY i_emp_attendance (fldemployee)
)
PARTITION BY RANGE (fldassigndate)
(PARTITION p_Apr VALUES LESS THAN (TO_DAYS('2012-05-01')),
PARTITION p_May VALUES LESS THAN (TO_DAYS('2012-06-01')),
PARTITION p_Nov VALUES LESS THAN MAXVALUE );
From the MySQL manual (Section 18):
Data type of partitioning key. A partitioning key must be either an
integer column or an expression that resolves to an integer.
Neither dates nor varchars can be used for partitioning
Although this question is very old, I thought it'd help those searching, like I was at one point.
OP should first set his column structures correctly by using a correct structure for a date field, like datetime.
Once that is done, I have not tried to set my keys in that manner, but I have found that I needed to use a composite primary key if I wanted to keep the structure of my table the same. In this case, it'd be
PRIMARY KEY(`fldattendanceid`, `fldassigndate`)
Then, there is a missing TO_DAYS specified in the PARTITION BY RANGE. Lastly, I'm not 100 percent sure, but I don't think you can partition on a null field. Even if you can, it'd be awful practice to do so.
CREATE TABLE tblattendancesetup (
fldattendanceid INT(11) NOT NULL AUTO_INCREMENT,
flddept VARCHAR(100) DEFAULT NULL,
fldemployee VARCHAR(100) DEFAULT NULL,
fldintime DATETIME DEFAULT NULL,
fldouttime DATETIME DEFAULT NULL,
fldlateafter DATETIME DEFAULT NULL,
fldearlybefore DATETIME DEFAULT NULL,
fldweekoff VARCHAR(20) DEFAULT NULL,
fldshiftname VARCHAR(20) DEFAULT NULL,
fldassigndate DATETIME NOT NULL,
fldfromdate DATETIME DEFAULT NULL,
fldtodate DATETIME DEFAULT NULL,
fldrefid VARCHAR(20) DEFAULT NULL,
PRIMARY KEY(`fldattendanceid`, `fldassigndate`),
KEY in_attendancesetup (fldemployee,fldintime,fldouttime,fldlateafter,fldearlybefore,fldfromdate,fldtodate,fldattendanceid),
KEY i_emp_tmp (fldemployee),
KEY i_emp_attendance (fldemployee)
)
PARTITION BY RANGE ( TO_DAYS(fldassigndate))
(PARTITION p_Apr VALUES LESS THAN (TO_DAYS('2012-05-01')),
PARTITION p_May VALUES LESS THAN (TO_DAYS('2012-06-01')),
PARTITION p_Nov VALUES LESS THAN MAXVALUE );
I hope this helps someone!
This question expects a generic answer to the wide problematic of indexes creation on MySQL database.
Let's take this table example :
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) NOT NULL DEFAULT '0',
`author_id` int(11) unsigned NOT NULL,
`modificator_id` int(11) unsigned DEFAULT NULL,
`category_id` int(11) unsigned DEFAULT NULL,
`title` varchar(200) COLLATE utf8_unicode_ci NOT NULL,
`headline` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`url_alias` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`priority` mediumint(11) unsigned NOT NULL DEFAULT '50',
`publication_date` datetime NOT NULL,
`creation_date` datetime NOT NULL,
`modification_date` datetime NOT NULL,
PRIMARY KEY (`id`)
);
Over such a sample there is a wide range of queries that could be performed on different criterions :
category_id
published
publication_date
e.g.:
SELECT id FROM article WHERE NOT published AND category_id = '2' ORDER BY publication_date;
On many tables you can see a wide range of state fields (like published here), date fields or reference fields (like author_id or category_id). What strategy should be picked to make indexes ?
Which can be developed under the following points:
Make an index on every fields that can be used in query (either as where argument or order by) even if this can lead to have a lot of indexes per table ?
Also make an index on fields that have only a small set of values like boolean or enum, this just does reduce the scope size of the scan by a n factor (assuming n being the number of inputs and every value homogeneously used) ?
I've read that MySQL prior to 5.0 used only one index per request how do the system picks it ? (by choosing the more restrictive one ?)
How does a OR statement is processed ?
How much does this is going to slow insert ?
Does InnoDB/MyISAM change anything to this problem ?
I know the EXPLAIN statement could be used to know whether a request is optimized or not, but a bit of concrete theoretical stuff would really be more constructive than a purely empirical approach !
The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.