Why the size of MYD file is so high?

Why the size of MYD file is so high? - mysql

While i was creating stress data for a table i found the following files are generated.
-rw-rw---- 1 mysql mysql 8858 Jul 28 06:47 card.frm
-rw-rw---- 1 mysql mysql 7951695624 Jul 29 20:48 card.MYD
-rw-rw---- 1 mysql mysql 51360768 Jul 29 20:57 card.MYI
Actually I inserted 1985968 number of records in this table. But the index file size is unbelievable.
Structure of the table is
create table card(
company_id int(10),
emp_number varchar(100),
card_date varchar(10),
time_entry text,
total_ot varchar(15),
total_per varchar(15),
leave_taken double,
total_lop double,
primary key (company_id,emp_number,card_date),
index (company_id,card_date)
);
Is there any way to reduce the filesize of the MYD?

Please note that .MYI is your index, and .MYD is your data. The only way to reduce the size of your .MYD is to delete rows or alter your column sizes.
50MB for an index on 2 million rows is not large.
Let's look at the size breakdown of your table:
company_id - 4 Bytes
emp_number - 101 Bytes
card_date - 11 Bytes
total_ot - 17 Bytes
total_per - 17 Bytes
leave_taken - 9 Bytes
total_lop - 9 Bytes
time_entry - avg(length(time_entry)) + 3 Bytes
This gives us a row length of 172 + time_entry bytes. If time_entry averages out at 100 bytes. You're looking at 272 * 2000000 = 544MB
Of significance to me is the number of VARCHARs. Does employee number need to be a varchar(100), or even a varchar at all? You're duplicating that data in it's entirety in your index on (company_id,emp_number,card_date) as you're indexing the whole column.
You probably don't need a varchar here, and you possibly don't need it included in the primary key.
Do you really need time_entry to be a TEXT field? This is likely the biggest consumer of space in your database.
Why are you using varchar(10) for card date? If you used DATETIME you'd only use 8 Bytes instead of 11, TIMESTAMP would be 4 Bytes, and DATE would be 3 Bytes.
You're also adding 1 Byte for every column that can be NULL.
Also try running ANALYZE/REPAIR/OPTIMIZE TABLE commands as well.

A lot depends on how big that time_entry text field can be. I'm going to assume it's small, less than 100 bytes. Then you have roughly 4 + 100 + 10 + 100 + 15 + 15 + 8 + 8 = roughly 300 bytes of data per record. You have 2 million records. I'd expect the database to be 600 megabytes. In fact you are showing 8000 megabytes of data in the MYD on disk, or a factor of 12x. Something's not right.
Your best diagnostic tool is show table status. In particular check Avg_row_length and Data_length, they will give you some insight into where the space is going.
If you're using MyISAM tables, you may find that myisamchk will help make the table smaller. This tool particularly helps if you inserted and then deleted a lot of rows from the database. "optimize table" can help too. MyISAM does support read-only compressed tables via myisampack. I'd treat that as a last resort, though.

Related

Is it faster to search by column integer or column string in mysql?

I have a table "transactions" with million records
id trx secret_string (varchar(50)) secret_id (int(2.))
1 80 52987624f7cb03c61d403b7c68502fb0 1
2 28 52987624f7cb03c61d403b7c68502fb0 1
3 55 8502fb052987624f61d403b7c67cb03c 2
4 61 52987624f7cb03c61d403b7c68502fb0 1
5 39 8502fb052987624f61d403b7c67cb03c 2
..
999997 27 8502fb052987624f61d403b7c67cb03c 2
999998 94 8502fb052987624f61d403b7c67cb03c 2
999999 40 52987624f7cb03c61d403b7c68502fb0 1
1000000 35 8502fb052987624f61d403b7c67cb03c 2
As you can notice, secret_string and secret_id will always match.
Let's say, I need to select records where secret_string = "52987624f7cb03c61d403b7c68502fb0".
Is it faster to do:
SELECT id FROM transactions WHERE secret_id = 1
Than:
SELECT id FROM transactions WHERE secret_string = "52987624f7cb03c61d403b7c68502fb0"
Or it does not matter? What about for other operations such as SUM(trx), COUNT(trx), AVG(trx), etc?
Column secret_id currently does not exist, but if it is faster to search records by it, I am planning to create it upon row insertions.
Thank you
I hope I make sense.

Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.
This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.
There is another reason to use an int, and that is to normalise the database. Instead of having the text '52987624f7cb03c61d403b7c68502fb0' stored thousands of times in the table,you should store it's id and have the secret string stored once in a separate table. It's the same deal for other operations such as SUM COUNT AVG.

As the others told you: selecting int is definitly faster than strings. However if you need to select by secret_string, all given strings look like a hex string, that said you can consider to cast those strings to an int (or big int) using hex('52987624f7cb03c61d403b7c68502fb0') and store those int values instead of strings

How to calculate mysql tables/database size

I am little confused. I have to estimate the size of the table to fit 2 million rows. I have no idea how much space primary and secondary indexes takes. Especially with composite primary and secondary indexes. Structure of the table is something like
Database Engine: innodb
create table abc(
a int,
b varchar(30),
c char(10),
d bigint(8),
FOREIGN KEY(a)
REFERENCES af(a_id)
ON DELETE RESTRICT
ON UPDATE RESTRICT,
primary key(a,b,c)
)
CREATE UNIQUE INDEX idx_abc
ON abc
( a ASC, d ASC);
CREATE INDEX idx_abc2
ON abc
( d );
Please help
Sonu

You can get the size for data/indexes using mysql.innodb_index_stats.
Warning, the size is given in pages units => You must multiply it by the pagesize which is usually of 16K.

To do an exact estimation you need to create a copy of the table and generate 2 000 000 volatile data into it, representing the actual size. Measure the table along with indexes as shown in other answers and then, knowing the answer, you can remove the copy.
If preciseness is not as important in your case, then multiply the number of bytes a record will occupy with the number of records and do the same with indexes.
record * 2 000 000 + index * 2 000 000 ~= 50 * 2 000 000 + 60 * 2 000 000 = 110 * 2 000 000 = 22 000 000

Delete and Insert takes long time in MySQL(InnoDB)

I've set of numbers and to each number in it there are few numbers associated with it. So I store it in a table like this:
NUMBERS ASSOCIATEDNUMBERS
1 3
1 7
1 8
2 11
2 7
7 9
8 13
11 17
14 18
17 11
17 18
Thus it's a number which has many associated numbers and vice versa. Both columns are indexed. (Thus enabling me to find number and its associated numbers and vice versa)
My create table looks like this:
CREATE TABLE `TABLE_B` (
`NUMBERS` bigint(20) unsigned NOT NULL,
`ASSOCIATEDNUMBERS` bigint(20) unsigned NOT NULL,
UNIQUE KEY `unique_number_associatednumber_constraint` (`NUMBERS`,`ASSOCIATEDNUMBERS`),
KEY `fk_AssociatedNumberConstraint` (`ASSOCIATEDNUMBERS`),
CONSTRAINT `fk_AssociatedNumberConstraint` FOREIGN KEY (`ASSOCIATEDNUMBERS`) REFERENCES `table_a` (`SRNO`),
CONSTRAINT `fk_NumberConstraint` FOREIGN KEY (`NUMBERS`) REFERENCES `table_a`` (`SRNO`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here TABLE_A has column SRNO which is AUTO_INCREMENT PRIMARY KEY and is first column in the table. (As per MySQL manual I haven't defined indexes on TABLE_B.NUMBERS and TABLE_B.ASSOCIATEDNUMBERS as foreign key constraints defines it automatically)
PROBLEM:
Whenever I need to change ASSOCIATEDNUMBERS for a number (in `NUMBERS') I just delete existing rows for that number from the table:
DELETE FROM TABLE_B WHERE NUMBERS= ?
and then insert rows for new set of ASSOCIATEDNUMBERS:
INSERT INTO TABLE_B (NUMBERS, ASSOCIATEDNUMBERS) VALUES ( ?, ?), (?, ?), (?, ?), ...
However, this takes long time. Especially when in my multi-threaded application I open multiple connections (each per thread) to the database, each running above two queries (but each with different number).
For example, if I open 40 connections, each connection to delete existing and insert 250 new associated numbers, it takes upto 10 to 15 seconds. If I increase number of connections, the time also increases.
Other Information:
SHOW GLOBAL STATUS LIKE 'Threads_running';
Shows upto 40 threads.
Innodb parameters:
innodb_adaptive_flushing, ON
innodb_adaptive_flushing_lwm, 10
innodb_adaptive_hash_index, ON
innodb_adaptive_max_sleep_delay, 150000
innodb_additional_mem_pool_size, 2097152
innodb_api_bk_commit_interval, 5
innodb_api_disable_rowlock, OFF
innodb_api_enable_binlog, OFF
innodb_api_enable_mdl, OFF
innodb_api_trx_level, 0
innodb_autoextend_increment, 64
innodb_autoinc_lock_mode, 1
innodb_buffer_pool_dump_at_shutdown, OFF
innodb_buffer_pool_dump_now, OFF
innodb_buffer_pool_filename, ib_buffer_pool
innodb_buffer_pool_instances, 8
innodb_buffer_pool_load_abort, OFF
innodb_buffer_pool_load_at_startup, OFF
innodb_buffer_pool_load_now, OFF
innodb_buffer_pool_size, 1073741824
innodb_change_buffer_max_size, 25
innodb_change_buffering, all
innodb_checksum_algorithm, crc32
innodb_checksums, ON
innodb_cmp_per_index_enabled, OFF
innodb_commit_concurrency, 0
innodb_compression_failure_threshold_pct, 5
innodb_compression_level, 6
innodb_compression_pad_pct_max, 50
innodb_concurrency_tickets, 5000
innodb_data_file_path, ibdata1:12M:autoextend
innodb_data_home_dir,
innodb_disable_sort_file_cache, OFF
innodb_doublewrite, ON
innodb_fast_shutdown, 1
innodb_file_format, Antelope
innodb_file_format_check, ON
innodb_file_format_max, Antelope
innodb_file_per_table, ON
innodb_flush_log_at_timeout, 1
innodb_flush_log_at_trx_commit, 2
innodb_flush_method, normal
innodb_flush_neighbors, 1
innodb_flushing_avg_loops, 30
innodb_force_load_corrupted, OFF
innodb_force_recovery, 0
innodb_ft_aux_table,
innodb_ft_cache_size, 8000000
innodb_ft_enable_diag_print, OFF
innodb_ft_enable_stopword, ON
innodb_ft_max_token_size, 84
innodb_ft_min_token_size, 3
innodb_ft_num_word_optimize, 2000
innodb_ft_result_cache_limit, 2000000000
innodb_ft_server_stopword_table,
innodb_ft_sort_pll_degree, 2
innodb_ft_total_cache_size, 640000000
innodb_ft_user_stopword_table,
innodb_io_capacity, 200
innodb_io_capacity_max, 2000
innodb_large_prefix, OFF
innodb_lock_wait_timeout, 50
innodb_locks_unsafe_for_binlog, OFF
innodb_log_buffer_size, 268435456
innodb_log_compressed_pages, ON
innodb_log_file_size, 262144000
innodb_log_files_in_group, 2
innodb_log_group_home_dir, .\
innodb_lru_scan_depth, 1024
innodb_max_dirty_pages_pct, 75
innodb_max_dirty_pages_pct_lwm, 0
innodb_max_purge_lag, 0
innodb_max_purge_lag_delay, 0
innodb_mirrored_log_groups, 1
innodb_monitor_disable,
innodb_monitor_enable,
innodb_monitor_reset,
innodb_monitor_reset_all,
innodb_old_blocks_pct, 37
innodb_old_blocks_time, 1000
innodb_online_alter_log_max_size, 134217728
innodb_open_files, 300
innodb_optimize_fulltext_only, OFF
innodb_page_size, 16384
innodb_print_all_deadlocks, OFF
innodb_purge_batch_size, 300
innodb_purge_threads, 1
innodb_random_read_ahead, OFF
innodb_read_ahead_threshold, 56
innodb_read_io_threads, 64
innodb_read_only, OFF
innodb_replication_delay, 0
innodb_rollback_on_timeout, OFF
innodb_rollback_segments, 128
innodb_sort_buffer_size, 1048576
innodb_spin_wait_delay, 6
innodb_stats_auto_recalc, ON
innodb_stats_method, nulls_equal
innodb_stats_on_metadata, OFF
innodb_stats_persistent, ON
innodb_stats_persistent_sample_pages, 20
innodb_stats_sample_pages, 8
innodb_stats_transient_sample_pages, 8
innodb_status_output, OFF
innodb_status_output_locks, OFF
innodb_strict_mode, OFF
innodb_support_xa, ON
innodb_sync_array_size, 1
innodb_sync_spin_loops, 30
innodb_table_locks, ON
innodb_thread_concurrency, 0
innodb_thread_sleep_delay, 10000
innodb_undo_directory, .
innodb_undo_logs, 128
innodb_undo_tablespaces, 0
innodb_use_native_aio, OFF
innodb_use_sys_malloc, ON
innodb_version, 5.6.28
innodb_write_io_threads, 16
UPDATE:
Here is "SHOW ENGINE InnoDB STATUS" output: http://pastebin.com/raw/E3rK4Pu5
UPDATE2:
The reason behind this was somewhere else and not actually DB. Some other function in my code was eating lots of CPU causing MySQL (which runs on same machine) to go slow. Thanks for all your answers and help.

It seems you are acquiring lock on the row/table before deleting/inserting so that's why it is causing you the issue.
Check
SELECT * from information_schema.GLOBAL_VARIABLES;
Also, check locks on table using
SHOW OPEN TABLES from <database name> where In_use>0
and lock type using
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS
So when you run the query add these on watch and you can also use tee command to store it in file.
So, there is one more thing that can cause this, although you have indexed the column, but there are limitations to mysql if data Read this for limitation.
Read this to install watch http://osxdaily.com/2010/08/22/install-watch-command-on-os-x/ and run watch script with time delay of 1 second and then run mysql queries in watch, if you want to store it in file use tee command and then write output on file. You can tail to get the data in file

MySQL record disk-usage calculation | Am I right?

I am trying to estimate the real disk-usage required space for each record of my table in MySQL RDBMS.
The table has a structure like this:
ID INT 4 byte;
VARCHAR(34) 34 byte;
INT 4 byte;
INT(5) 4 byte;
INT 4 byte;
INT 4 byte which is also a FOREIGN KEY;
So there are 5 INT fields and a VARCHAR of a maximum of 34 chars (i.e. 34 bytes).
I have 2 questions:
1) The total should be 54 bytes per record (with variable VARCHAR, of course) am I right when I am saying that, or there are also some over-head bytes which I should consider when estimating the disk-usage space?
2) I have also used INT(5) instead of CHAR(5) cause I need to store only exactly 5 digits in that field (I am going to do that by application, with regExp and string length, cause I know that INT(5) could be more than an int with 5 digits).
But could this be considered such as an optimization by the disk-usage space cause I am using an INT (4 bytes) instead of a CHAR(5) which is 5 bytes, i.e. 1 more byte per record?
Thanks for the attention!

One record itself will use
1 byte in offsets
0 bytes in NULLable bits
5 bytes in "extra bytes" header
4 bytes ID
6 bytes transaction id
7 bytes rollback pointer
0-3*34 bytes in VARCHAR(34) (one character may take up to 3 bytes because of UTF8)
4*4 bytes in other integers
Each distinct value of FK will lead to one record in a secondary index. it will use
5 bytes in "extra bytes" header
4 bytes INT for FK value
4 bytes INT for Primary key
Other overhead is page level:
120 bytes per page (16k) in headers
page fill factor 15/16 - i.e. one page may contain 15k in records.
And the last - add space used by non-leaf pages, which should be small anyway
So, answer to question - 1) yes there will be some overhead that you can calculate using information above.
2) CHAR(5) in UTF8 will add a byte for its length, so INT looks reasoaanle to use

Database size calculation?

What is the most accurate way to estimate how big a database would be with the following characteristics:
MySQL
1 Table with three columns:
id --> big int)
field1 --> varchar 32
field2 --> char 32
there is an index on field2
You can assume varchar 32 is fully populated (all 32 characters). How big would it be if each field is populated and there are:
1 Million rows
5 Million rows
1 Billion rows
5 Billion rows
My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields. Making it roughly:
1 + 32 + 32 = 65 * 1 000 000 = 65 million bytes for 1 million rows
= 62 Megabyte
Therefore:
62 Mb
310 Mb
310 000 Mb = +- 302Gb
1 550 000 Mb = 1513 Gb
Is this an accurate estimation?

If you want to know the current size of a database you can try this:
SELECT table_schema "Database Name"
, SUM(data_length + index_length) / (1024 * 1024) "Database Size in MB"
FROM information_schema.TABLES
GROUP BY table_schema

My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields.
You're way off. Please refer to the MySQL Data Type Storage Requirements documentation. In particular:
A BIGINT is 8 bytes, not 1.
The storage required for a CHAR or VARCHAR column will depend on the character set in use by your database (!), but will be at least 32 bytes (not bits!) for CHAR(32) and 33 for VARCHAR(32).
You have not accounted at all for the size of the index. The size of this will depend on the database engine, but it's definitely not zero. See the documentation on the InnoDB row structure for more information.

On the MySQL website you'll find quite comprehensive information about storage requirements:
http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
It also depends if you use utf8 or not.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why the size of MYD file is so high? - mysql

Related

Is it faster to search by column integer or column string in mysql?

How to calculate mysql tables/database size

Delete and Insert takes long time in MySQL(InnoDB)

MySQL record disk-usage calculation | Am I right?

Database size calculation?

Categories

Resources