Mysql: how to handle tables with millions of rows? - mysql

I'm looking for solutions to properly archive very large tables (about 10,000 rows per day).
I currently have this situation:
Order Table:
CREATE TABLE `tbl_order` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idproduct` int(11) NOT NULL DEFAULT '0',
`iduser` int(11) NOT NULL DEFAULT '0',
`state` int(11) NOT NULL DEFAULT '0',
`progressive` int(11) NOT NULL DEFAULT '0',
`show-voucher` int(11) NOT NULL DEFAULT '0',
`voucher-custom` int(11) NOT NULL DEFAULT '0',
`check-validate` int(11) NOT NULL DEFAULT '0',
`code-order` varchar(8) NOT NULL DEFAULT '',
`code-product` char(15) NOT NULL DEFAULT '',
`product-year` int(11) NOT NULL DEFAULT '2017',
`product-area` char(3) NOT NULL DEFAULT '',
`payment-type` char(3) NOT NULL DEFAULT '',
`usr-qnt` int(11) NOT NULL DEFAULT '0',
`usr-id` char(11) NOT NULL DEFAULT '',
`usr-cid` char(8) NOT NULL DEFAULT '',
`usr-ct` char(3) NOT NULL DEFAULT '000',
`price` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-penale` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-rate` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-bonusmalus-rate` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-bonusmalus-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
`price-incasso-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
`rate-qnt` int(11) NOT NULL DEFAULT '0',
`card-qnt` int(11) NOT NULL DEFAULT '0',
`grp-user` longtext NOT NULL,
`grp-price` longtext NOT NULL,
`grp-item` longtext NOT NULL,
`grp-element` longtext NOT NULL,
`bonusmalus-description` varchar(500) NOT NULL,
`note-s` text NOT NULL ,
`note-c` text NOT NULL,
`note-incasso` text NOT NULL,
`note-interne` text NOT NULL,
`d-start` date NOT NULL DEFAULT '0000-00-00',
`d-end` date NOT NULL DEFAULT '0000-00-00',
`d-create` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-incasso` date NOT NULL DEFAULT '0000-00-00',
`d-sconf` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-cconf` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-export` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-expire` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-notify-vote` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`d-lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `iduser` (`iduser`),
KEY `code-order` (`code-order`),
KEY `code-product` (`code-product`),
KEY `idproduct` (`idproduct`),
KEY `state` (`state`),
KEY `price` (`price`),
KEY `usr-qnt` (`usr-qnt`),
KEY `d-expire` (`d-expire`),
KEY `d-export` (`d-export`),
KEY `price-bonusmalus-contanti` (`price-bonusmalus-contanti`),
KEY `price-penale` (`price-penale`),
KEY `price-bonusmalus-contanti_2` (`price-bonusmalus-contanti`),
KEY `price-rate` (`price-rate`),
KEY `price-contanti` (`price-contanti`),
KEY `show-voucher` (`show-voucher`),
KEY `voucher-custom` (`voucher-custom`),
KEY `check-validate` (`check-validate`),
KEY `progressive` (`progressive`),
KEY `d-incasso` (`d-incasso`),
KEY `price-incasso-contanti` (`price-incasso-contanti`),
KEY `d-notify-vote` (`d-notify-vote`),
KEY `product-year` (`product-year`),
KEY `product-area` (`product-area`),
KEY `d-lastupdate` (`d-lastupdate`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Orders are the requests that users (iduser) make for tourist packages, such as Booking.com
This table generates about 8,000 to 15,000 lines per day.
I'm afraid this table may become too big and cause problems.
My core fields are:
id or code-order
Idproduct or product code (I have about 5000 products each year)
Iduser
Product year (I have different products for each year)
Product area (I have a total of 20 areas: 000, 001, 002, 003 ... 019)
I've read several solutions on the web, but I can not figure out which one might be the best one:
1) Divide the main table into many other sub-tables or many other databases?
Ex.
Db-2016.tbl_order-jan
Db-2016.tbl_order-feb
.....
Db-2017.tbl_order-jan
Db-2017.tbl_order-feb
Or
Db.tbl_order-2016
Db.tbl_order-2015
Db.tbl_order-2014
Or
Db.tbl_order-master
Db.tbl_order-product-area
Or
Db.tbl_order-master
Db.tbl_order-product-year
In this case, are SELECTs to be made through the UNION?
2) Partition the table?
Which fields can be performing?
Product year? Product area (Total 20)?
The date of creation of the order (d-create)?
By id?
3) Sharding? But I do not know what it is ...
4) Innodb o MyISAM?
One solution I can still adopt is also to split the LONGTEXT fields into secondary tables to reduce the weight of tbl_order:
Db.tbl_order
Db.tbl_order-grp-user (idorder, data)
Db.tbl_order-grp-price (idorder, data)
Db.tbl_order-grp-item (idorder, data)
Db.tbl_order-grp-element (idorder, data)
Doubt: If I do this I have reduced the weight of the tbl_order table, but I have not reduced the number of records. Therefore the db.tbl_order-grp-user, db.tbl_order-grp-price, db.tbl_order-grp-item, db.tbl_order-grp-elements tables must be partitioned? If you are using range idorder?
The SELECT to have all the data will be:
Select *,
( SELECT `u`.`data` FROM `db`.`tbl_order-grp-user` as `u` where `u`.`idorder`=`order`.`id`) as `grp-user`,
( SELECT `p`.`data` FROM `db`.`tbl_order-grp-price` as `p` where `p`.`idorder`=`order`.`id`) as `grp-price`,
( SELECT `i`.`data` FROM `db`.`tbl_order-grp-item` as `i` where `i`.`idorder`=`order`.`id`) as `grp-item`
FROM `db`.`tbl_order` as `order`
WHERE ............
Thanks for all the support! :-)

Novice alert...
Don't use INT (4 bytes) when MEDIUMINT UNSIGNED (3 bytes) would suffice. Look up the rest of the INT options.
Don't blindly index every column.
Do look at your SELECTs to see which composite indexes would be beneficial. See my Index Cookbook .
Don't worry much about 15K/day -- that is less than 1/sec. 100/sec is the first tipping point toward potential problems.
Don't PARTITION. It is not a panacea, and usually provides no benefits. There are very few use cases.
Don't split into multiple 'identical' tables. Ever. (Well, there are very few valid use cases.)
Don't fear a million rows; do have concern about a billion rows.
Don't use CHAR for variable length fields; use VARCHAR.
Do consider utf8mb4 instead of utf8. utf8mb4 matches the external world's view of UTF-8, and includes Emoji and all of Chinese.
Do use InnoDB. Period. Full stop. MyISAM is going away; InnoDB is as good or better in virtually all respects.
Consider changing the column names to avoid -; _ is common, and avoids errors when you forget the backtics.
Don't Shard. (This is splitting the data across multiple servers.) This is a medium sized table with medium sized traffic; Sharding is need for huge tables with huge traffic.
Do say CHARACTER SET ascii where appropriate. For example product-area. What you have now takes 9 bytes -- 3 characters * room for 3 bytes (utf8) per character.
Consider TINYINT(3) UNSIGNED ZEROFILL for product-area -- This will take 1 byte and reconstruct the leading zeros for you.
Consider whether you have a "natural" PRIMARY KEY instead of AUTO_INCREMENT.
Do tell us what the grp columns contain.
Do come back with tentative SELECT statements. I cannot finish this review without them.
Do consider whether this should be a single table. Is it really the case that one user orders exactly one product? You probably need a table for users, a table for products, a table for orders, etc.

Related

MySQL Query Optimization that touches three tables via a union of two of them

I have a query that returns results from a single table based on the provided ID existing in a column in one of two, or both, tables. The DB schema for the relevant tables is provided below as well as the initial query and then what was later recommended to me by a peer. I go into some details below as to why this query works but I need to optimize it farther for larger datasets and pagination.
CREATE TABLE `killmails` (
`id` BIGINT(20) UNSIGNED NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`moon_id` BIGINT(20) NULL DEFAULT NULL,
`solar_system_id` BIGINT(20) UNSIGNED NOT NULL,
`war_id` BIGINT(20) NULL DEFAULT NULL,
`is_npc` TINYINT(1) NOT NULL DEFAULT '0',
`is_awox` TINYINT(1) NOT NULL DEFAULT '0',
`is_solo` TINYINT(1) NOT NULL DEFAULT '0',
`dropped_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`destroyed_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`fitted_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`total_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`killmail_time` DATETIME NOT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`, `hash`),
INDEX `total_value` (`total_value`),
INDEX `killmail_time` (`killmail_time`),
INDEX `solar_system_id` (`solar_system_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_attackers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_done` BIGINT(20) UNSIGNED NOT NULL,
`final_blow` TINYINT(1) NOT NULL DEFAULT '0',
`security_status` DECIMAL(17,15) NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`weapon_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `weapon_type_id` (`weapon_type_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_attackers_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_victim` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_taken` BIGINT(20) UNSIGNED NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NOT NULL,
`ship_value` DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
`pos_x` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_y` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_z` DECIMAL(30,10) NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_victim_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This first query is where the problem started:
SELECT
*
FROM
killmails k
LEFT JOIN killmail_attackers ka ON k.id = ka.killmail_id
LEFT JOIN killmail_victim kv ON k.id = kv.killmail_id
WHERE
ka.character_id = ?
OR kv.character_id = ?
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
This worked okay, but long query times. We optimized to this
SELECT
killmails.*,
FROM (
SELECT killmail_victim.killmail_id FROM killmail_victim
WHERE killmail_victim.corporation_id = ?
UNION
SELECT killmail_attackers.killmail_id FROM killmail_attackers
WHERE killmail_attackers.corporation_id = ?
) SELECTED_KMS
LEFT JOIN killmails ON killmails.id = SELECTED_KMS.killmail_id
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
I saw a huge improvement in query times when looking up killmails for characters, however when I started querying for larger datasets like corporation and alliance killmails, the query slows down. This is because the queries that are union'd together can potentially return large sets of data and the time it takes to read all that into memory so that the SELECTED_KMS table can be created is what I believe is taking so much time. Most of the time, with alliances, my connection to the database times out from the application. One alliance returned 900K killmailIDs from one of the union'd tables, not sure what the other returned.
I can easily add limit statements to the internal queries, but this will introduce a lot of complications when I get to paginating the data or when I introduce a feature to search for KMs by date for example.
I am looking for suggestions on how this query can be optimized and still allow for easy pagination in the near future.
Thank You
Change INDEX(corporation_id) in both tables to INDEX(corporation_id, killmail_id) so that the inner queries will be "covering".
In general, INDEX(a) is useless when you also have INDEX(a,b). Any query that needs just a, can use either of those indexes. (This rule does not apply to b; only the "leftmost" column(s).)
Where does killmails.id come from? It's not AUTO_INCREMENT; it is not alone in the PRIMARY KEY, so there is no specified "uniqueness" constraint. Is it unique by some other design? Is it computed somewhere else in the code? (I ask because I need a feel for its uniqueness and other characteristics.)
Add INDEX(id, killmails_time).
What version are you using?
Perhaps UNION ALL give the same results? It would be faster because it would not need to de-dup.
How much RAM do you have? What is the value of innodb_buffer_pool_size?
Do you really need 8-byte BIGINTs? Even if your application is using longlong (or whatever it calls it), you can probably change the schema without changing the app.
Do you need this much precision and range? DECIMAL(30,10) -- it takes 14 bytes each. DOUBLE would give you about 16 significant digits in 8 bytes, with a wider range of values (up to about 10^308). What "units" are you using? (Overkill for light-years or parsecs; inadequate for miles or km. Perhaps AUs? Then the bottom digit would be a precision of a few meters?)
The last few questions are aimed at shrinking the table and seeing if we can avoid it being as I/O-bound as it apparently is now.
Important
innodb_buffer_pool_size = 128M is terribly small, especially for a 32GB machine, and especially if your dataset is much bigger than 128MB. If there are not any other apps running on the server, bump that setting up to 20G.

Mysql: Selecting id, then * on id is much faster than selecting *. Why?

I have a MySQL database table (roughly 100K rows):
id BIGINT(indexed), external_barcode VARCHAR(indexed), other simple columns, and a LongText column.
The LongText column is a JSON data dump. I save the large JSON objects because I will need to extract more of the data in the future.
When I run this query it takes 29+ seconds:
SELECT * FROM scraper_data WHERE external_barcode = '032429257284'
EXPLAIN
#id select_type table partitions type possible_keys key key_len ref rows filtered Extra
'1' 'SIMPLE' 'scraper_data' NULL 'ALL' NULL NULL NULL NULL '119902' '0.00' 'Using where'
This more complex query takes 0.00 seconds:
SELECT * FROM scraper_data WHERE id = (
SELECT id FROM scraper_data WHERE external_barcode = '032429257284'
)
EXPLAIN
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'PRIMARY', 'scraper_data', NULL, 'const', 'PRIMARY,id_UNIQUE', 'PRIMARY', '8', 'const', '1', '100.00', NULL
'2', 'SUBQUERY', 'scraper_data', NULL, 'ALL', NULL, NULL, NULL, NULL, '119902', '0.00', 'Using where'
Less than 6 rows are returned from these queries. Why is the LONGTEXT slowing down the first query given that its not being referenced in the where clause?
CREATE TABLE
CREATE TABLE `scraper_data` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`bzic` varchar(10) NOT NULL,
`pzic` varchar(10) DEFAULT NULL,
`internal_barcode` varchar(20) DEFAULT NULL,
`external_barcode_type` enum('upc','isbn','ean','gtin') DEFAULT NULL,
`external_barcode` varchar(15) DEFAULT NULL,
`url` varchar(255) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`category` varchar(3) DEFAULT NULL,
`description` text,
`logo_image_url` varchar(255) DEFAULT NULL,
`variant_image_urls` text,
`parent_brand` varchar(10) DEFAULT NULL,
`parent_brand_name` varchar(255) DEFAULT NULL,
`manufacturer` varchar(10) DEFAULT NULL,
`manufacturer_name` varchar(255) DEFAULT NULL,
`manufacturer_part_number` varchar(255) DEFAULT NULL,
`manufacturer_model_number` varchar(255) DEFAULT NULL,
`contributors` text,
`content_info` text,
`content_rating` text,
`release_date` timestamp NULL DEFAULT NULL,
`reviews` int(11) DEFAULT NULL,
`ratings` int(11) DEFAULT NULL,
`internal_path` varchar(255) DEFAULT NULL,
`price` int(11) DEFAULT NULL,
`adult_product` tinyint(4) DEFAULT NULL,
`height` varchar(255) DEFAULT NULL,
`length` varchar(255) DEFAULT NULL,
`width` varchar(255) DEFAULT NULL,
`weight` varchar(255) DEFAULT NULL,
`scraped` tinyint(4) NOT NULL DEFAULT '0',
`scraped_timestamp` timestamp NULL DEFAULT NULL,
`scrape_attempt_timestamp` timestamp NULL DEFAULT NULL,
`processed` tinyint(4) NOT NULL DEFAULT '0',
`processed_timestamp` timestamp NULL DEFAULT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`scrape_dump` longtext,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `url_UNIQUE` (`url`),
UNIQUE KEY `internal_barcode_UNIQUE` (`internal_barcode`),
KEY `bzic` (`bzic`),
KEY `pzic` (`pzic`),
KEY `internal_barcode` (`internal_barcode`),
KEY `external_barcode` (`external_barcode`,`external_barcode_type`) /*!80000 INVISIBLE */,
KEY `scrape_attempt` (`bzic`,`scraped`,`scrape_attempt_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=121674 DEFAULT CHARSET=latin1;
The second query could benefit from the cache that already contains the result of the first query.
In addition in the second subquery you just select use two column (id, external_barcode ) in these two column are in a index all the query result is obtained only with the index scan while in the first query for retrieve all the data the query must scan all the tables row ..
For avoiding the long time for the first query, you should add a proper index on external_barcode column
create index my_idx on scraper_data (external_barcode, id)
Your queries are not equivalent, and your second query will throw an error if you have more than one row with that barcode:
Error Code: 1242. Subquery returns more than 1 row
This is probably what happens here: you do not actually get a result, just an error. Since MySQL can stop the full table scan as soon as it finds a second row, you can get this error faster than a correct result, including "0.00s" if those rows are among the first rows that are scanned (for example in ids 1 and 2).
From the execution plan, you can see that both do a full table scan (which, up to current versions, includes reading the blob field), and thus should perform similarly fast (as the first entry in your 2nd explain plan is neglectable for only a few rows).
So with a barcode that doesn't throw an error, both of your queries, as well as the corrected 2nd query (where you use IN instead of =),
SELECT * FROM scraper_data WHERE id IN ( -- IN instead of = !!
SELECT id FROM scraper_data WHERE external_barcode = '032429257284'
)
as well as running your subquery
SELECT id FROM scraper_data WHERE external_barcode = '032429257284'
separately (which, if your assumption is correct, have to be even faster than your 2nd query) will have a similar (long) execution time.
As scaisEdge mentioned in his answer, an index on external_barcode will improve the performance significantly, as you do not not need to do a full table scan, as well as you do not need to read the blob field. You actually have such an index, but you disabled it (invisible). You can simply reenable it by using
ALTER TABLE scraper_data ALTER INDEX `external_barcode` VISIBLE;

Find row that have a duplicate field, the filed type is blob

I have a table with many many duplicated row, I cannot create a unique value for the blob field, because is too large.
How can I find and delete the duplicate rows where the blob field (answer) is duplicated?
This is the table structure :
CREATE TABLE `answers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_question` int(11) NOT NULL,
`id_user` int(11) NOT NULL,
`answer` blob NOT NULL,
`language` varchar(2) NOT NULL,
`datetime` datetime NOT NULL,
`enabled` int(11) NOT NULL DEFAULT '0',
`deleted` int(11) NOT NULL DEFAULT '0',
`spam` int(11) NOT NULL DEFAULT '0',
`correct` int(11) NOT NULL DEFAULT '0',
`notification_send` int(11) NOT NULL DEFAULT '0',
`correct_notification` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `id_question` (`id_question`),
KEY `id_user` (`id_user`),
KEY `enabled` (`enabled`)
) ENGINE=InnoDB AUTO_INCREMENT=1488 DEFAULT CHARSET=utf8mb4
probable you can use prefix of the column by substr() or left() and compare. How much size you want to take that depends on your data distribution or prefix uniqueness of the column data.
for uniqueness check you can fire the below query if the
select count(distinct left(answer, 128))/count(*), count(distinct left(answer, 256))/count(*) from answers.
This will provide you selectivity or data distribution in your column. suppose 128 gives you answer as 1 i.e. all unique if you take first 128 bytes then choose that amount of data from each row and work. Hope it helps.

optimize mysql schema, need advice

I was asked to help to check a table and improve performance.
The table is a large table with a 2.000.000 rows and is growing fast.
There are lot's of users who are using this table with a lot of update / insert and delete queries.
Maybe you can give me some good advice to improve performance and realibity
Here is the table definition:
CREATE TABLE `calculate` (
`GROUP_LINE_ID` BIGINT(250) NOT NULL DEFAULT '0',
`GROUP_LINE_PARENT_ID` BIGINT(250) NOT NULL DEFAULT '0',
`MOEDER_LINE_CODE` BIGINT(250) NOT NULL DEFAULT '0',
`CALC_ID` BIGINT(250) NOT NULL DEFAULT '0',
`GROUP_ID` BIGINT(250) NOT NULL DEFAULT '0',
`CODE` VARCHAR(250) DEFAULT NULL,
`DESCRIPTION` VARCHAR(250) DEFAULT NULL,
`RAW_AMOUNT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`AMOUNT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`UNIT` VARCHAR(100) DEFAULT NULL,
`MEN_HOURS` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`PRICE_PER_UNIT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`CONTRACTOR_UNIT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`POSTS_PER_UNIT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`SORT_INDEX` BIGINT(250) NOT NULL DEFAULT '0',
`FACTOR` DECIMAL(50,4) NOT NULL DEFAULT '0.0000',
`FACTOR_TYPE` INT(2) NOT NULL DEFAULT '0',
`ROUND_AT` DECIMAL(50,2) NOT NULL DEFAULT '0.00',
`MATERIAL_ID` BIGINT(250) NOT NULL DEFAULT '0',
`MINIMUM` DECIMAL(50,2) NOT NULL DEFAULT '0.00',
`LINE_TYPE` INT(1) NOT NULL DEFAULT '0',
`ONDERDRUKT` INT(5) NOT NULL DEFAULT '0',
`MARKED` INT(5) NOT NULL DEFAULT '0',
`IS_TEXT` INT(5) NOT NULL DEFAULT '0',
`BRUTO_PRICE` DECIMAL(20,2) NOT NULL DEFAULT '0.00',
`AMOUNT_DISCOUNT` DECIMAL(20,3) NOT NULL DEFAULT '0.000',
`FROM_CONSTRUCTOR` INT(5) NOT NULL DEFAULT '0',
`CHANGE_DATE` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`BEREKENING_VALUE` INT(5) NOT NULL DEFAULT '0',
`MAATVOERING_ID` BIGINT(250) NOT NULL DEFAULT '0',
`KOZIJN_CALC_ID` BIGINT(250) NOT NULL DEFAULT '0',
`IS_KOZIJN_CALC_TOTALS` INT(5) NOT NULL DEFAULT '0',
`EAN_CODE` VARCHAR(150) DEFAULT NULL,
`UURLOON_ID` BIGINT(20) NOT NULL DEFAULT '0',
`ORG_PRICE_PER_UNIT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`ORG_CONTRACTOR_UNIT` DECIMAL(50,3) NOT NULL DEFAULT '0.000',
`BTWCode` INT(5) NOT NULL DEFAULT '0',
`IS_CONTROLE_GETAL` INT(5) NOT NULL DEFAULT '0',
`AttentieRegel` INT(5) NOT NULL DEFAULT '0',
`KozijnSelectionRowId` BIGINT(250) NOT NULL DEFAULT '0',
`OfferteTekst` TEXT,
`VerliesFactor` DECIMAL(15,4) NOT NULL DEFAULT '0.0000',
PRIMARY KEY (`GROUP_LINE_ID`),
KEY `GROUP_LINE_PARENT_ID` (`GROUP_LINE_PARENT_ID`),
KEY `MOEDER_LINE_CODE` (`MOEDER_LINE_CODE`),
KEY `CALC_ID` (`CALC_ID`),
KEY `GROUP_ID` (`GROUP_ID`),
KEY `MATERIAL_ID` (`MATERIAL_ID`),
KEY `MAATVOERING_ID` (`MAATVOERING_ID`),
KEY `KOZIJN_CALC_ID` (`KOZIJN_CALC_ID`),
KEY `IS_KOZIJN_CALC_TOTALS` (`IS_KOZIJN_CALC_TOTALS`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
In my opinion the
change table engine to InnoDB
bigint(250) can be changed to int(10) of bigint(10)?
Please give me some advice
yes you are right..
1.remember that a primary key in a table is assigned at the last of every other key
so keep your primary key as less size as possible i think int is sufficient it can handle up to 2 billion records and big int is not needed
2.change key_buffer_size (up to 25% of your memory) if your server has many Myisam or only myisam table you can increase it up to 60-70%
try manual chache
SET GLOBAL keycache1.key_buffer_size=256*1024;
CACHE INDEX t1,t2 IN keycache1;
LOAD INDEX INTO CACHE t1, t2 IGNORE LEAVES;
(The IGNORE LEAVES modifier causes only blocks for the nonleaf nodes of the index to be reloaded)
3.As you mentioned your table is growing faster it is better to partition the table which will improve the performance
4.Perform table maintenance tasks like analyse the table frequently which will update the indexing,optimize the table and check it and repair it if there are any errors (optimize table frequently because there are many deletes as you said)
5.if you don't want to change the engine then turn on delay_key_write variable (specific to myisam) which makes the keys to be updated after table closed
6.Run procedure analyse() which suggest you the best data types
7.Create full text indexes to take advontage of myisam if possible and only if it is useful
8.Examine your Query cache (set the Query cache limit to on demand ) and make slow query log is actvated
9.Once examine ( and rewrite if needed ) all the queries using the table
finally if you want to change engine
change your table storage engine to innodb and increase the innodb_buffer_pool_size it may help you little bit
if number of accesses on the table are more then better to shift to innodb because myisam will implement table level locking due to which some queries are not logged in slow query log ( the initial time required to aquire the lock is not treated in execution time in MySQL )
Turn on the MySQL Slow Query Log to find the slowest queries.
If they are selects then run them through EXPLAIN to find out what indexes (if any) are being used. You may be able to turn some indexes into multi-column indexes and find some improvements that way. See Multiple Indexes vs Multi-Column Indexes for a good discussion on the differences.
Inserts, Updates, and Deletes are probably slowed due to your indexes. You need to figure out which indexes are not being used and drop them. Unfortunately there's not a simple way to do this other than running through your most popular queries.
Reducing the size of columns that are oversized is a good idea.
The only reason I know of these days for using MyISAM is when doing full text search. You should, as you suggest, switch to InnoDB. (ALTER TABLE calculate ENGINE = InnoDB;)
Is this a flattened table to avoid joins? Do you have to have the OfferteTekst column in this table? Even extracting that into a related table may help, but not if you'd only end up joining against it.

How do fields not selected for in a MySQL query affect query speed for the fields I am selecting on?

This is a theoretical question based on an application I have. I am wondering if there is some technical insight to be gained beyond just speed tests on my system.
I have the following two tables:
CREATE TABLE `files` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL DEFAULT '',
`processed` tinyint(1) unsigned NOT NULL DEFAULT '0',
`last_processed` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
and...
CREATE TABLE `file_metas` (
`file_id` int(10) unsigned NOT NULL,
`title` varchar(255) NOT NULL DEFAULT '',
`description` varchar(1000) NOT NULL DEFAULT '',
`keywords` varchar(1000) NOT NULL DEFAULT '',
PRIMARY KEY (`file_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The file_metas data is long text strings about each file from the files table. Each file only has one entry in the file_metas table so these two tables could be combined.
I'm wondering what affect adding the long text fields to the files table will have on the performance of select statements done on the files table when I'm not selecting for title, description, or keywords. I'm curious about the technical details. Does simply having the text fields in the table slow queries not involving those fields? How does this work in general with MySQL MyISAM tables? Is there any good reason to keep the file_metas data in a separate table?