I have a table "transactions" with million records
id trx secret_string (varchar(50)) secret_id (int(2.))
1 80 52987624f7cb03c61d403b7c68502fb0 1
2 28 52987624f7cb03c61d403b7c68502fb0 1
3 55 8502fb052987624f61d403b7c67cb03c 2
4 61 52987624f7cb03c61d403b7c68502fb0 1
5 39 8502fb052987624f61d403b7c67cb03c 2
..
999997 27 8502fb052987624f61d403b7c67cb03c 2
999998 94 8502fb052987624f61d403b7c67cb03c 2
999999 40 52987624f7cb03c61d403b7c68502fb0 1
1000000 35 8502fb052987624f61d403b7c67cb03c 2
As you can notice, secret_string and secret_id will always match.
Let's say, I need to select records where secret_string = "52987624f7cb03c61d403b7c68502fb0".
Is it faster to do:
SELECT id FROM transactions WHERE secret_id = 1
Than:
SELECT id FROM transactions WHERE secret_string = "52987624f7cb03c61d403b7c68502fb0"
Or it does not matter? What about for other operations such as SUM(trx), COUNT(trx), AVG(trx), etc?
Column secret_id currently does not exist, but if it is faster to search records by it, I am planning to create it upon row insertions.
Thank you
I hope I make sense.
Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.
This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.
There is another reason to use an int, and that is to normalise the database. Instead of having the text '52987624f7cb03c61d403b7c68502fb0' stored thousands of times in the table,you should store it's id and have the secret string stored once in a separate table. It's the same deal for other operations such as SUM COUNT AVG.
As the others told you: selecting int is definitly faster than strings. However if you need to select by secret_string, all given strings look like a hex string, that said you can consider to cast those strings to an int (or big int) using hex('52987624f7cb03c61d403b7c68502fb0') and store those int values instead of strings
I am little confused. I have to estimate the size of the table to fit 2 million rows. I have no idea how much space primary and secondary indexes takes. Especially with composite primary and secondary indexes. Structure of the table is something like
Database Engine: innodb
create table abc(
a int,
b varchar(30),
c char(10),
d bigint(8),
FOREIGN KEY(a)
REFERENCES af(a_id)
ON DELETE RESTRICT
ON UPDATE RESTRICT,
primary key(a,b,c)
)
CREATE UNIQUE INDEX idx_abc
ON abc
( a ASC, d ASC);
CREATE INDEX idx_abc2
ON abc
( d );
Please help
Sonu
You can get the size for data/indexes using mysql.innodb_index_stats.
Warning, the size is given in pages units => You must multiply it by the pagesize which is usually of 16K.
To do an exact estimation you need to create a copy of the table and generate 2 000 000 volatile data into it, representing the actual size. Measure the table along with indexes as shown in other answers and then, knowing the answer, you can remove the copy.
If preciseness is not as important in your case, then multiply the number of bytes a record will occupy with the number of records and do the same with indexes.
record * 2 000 000 + index * 2 000 000 ~= 50 * 2 000 000 + 60 * 2 000 000 = 110 * 2 000 000 = 22 000 000
I've set of numbers and to each number in it there are few numbers associated with it. So I store it in a table like this:
NUMBERS ASSOCIATEDNUMBERS
1 3
1 7
1 8
2 11
2 7
7 9
8 13
11 17
14 18
17 11
17 18
Thus it's a number which has many associated numbers and vice versa. Both columns are indexed. (Thus enabling me to find number and its associated numbers and vice versa)
My create table looks like this:
CREATE TABLE `TABLE_B` (
`NUMBERS` bigint(20) unsigned NOT NULL,
`ASSOCIATEDNUMBERS` bigint(20) unsigned NOT NULL,
UNIQUE KEY `unique_number_associatednumber_constraint` (`NUMBERS`,`ASSOCIATEDNUMBERS`),
KEY `fk_AssociatedNumberConstraint` (`ASSOCIATEDNUMBERS`),
CONSTRAINT `fk_AssociatedNumberConstraint` FOREIGN KEY (`ASSOCIATEDNUMBERS`) REFERENCES `table_a` (`SRNO`),
CONSTRAINT `fk_NumberConstraint` FOREIGN KEY (`NUMBERS`) REFERENCES `table_a`` (`SRNO`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here TABLE_A has column SRNO which is AUTO_INCREMENT PRIMARY KEY and is first column in the table. (As per MySQL manual I haven't defined indexes on TABLE_B.NUMBERS and TABLE_B.ASSOCIATEDNUMBERS as foreign key constraints defines it automatically)
PROBLEM:
Whenever I need to change ASSOCIATEDNUMBERS for a number (in `NUMBERS') I just delete existing rows for that number from the table:
DELETE FROM TABLE_B WHERE NUMBERS= ?
and then insert rows for new set of ASSOCIATEDNUMBERS:
INSERT INTO TABLE_B (NUMBERS, ASSOCIATEDNUMBERS) VALUES ( ?, ?), (?, ?), (?, ?), ...
However, this takes long time. Especially when in my multi-threaded application I open multiple connections (each per thread) to the database, each running above two queries (but each with different number).
For example, if I open 40 connections, each connection to delete existing and insert 250 new associated numbers, it takes upto 10 to 15 seconds. If I increase number of connections, the time also increases.
Other Information:
SHOW GLOBAL STATUS LIKE 'Threads_running';
Shows upto 40 threads.
Innodb parameters:
innodb_adaptive_flushing, ON
innodb_adaptive_flushing_lwm, 10
innodb_adaptive_hash_index, ON
innodb_adaptive_max_sleep_delay, 150000
innodb_additional_mem_pool_size, 2097152
innodb_api_bk_commit_interval, 5
innodb_api_disable_rowlock, OFF
innodb_api_enable_binlog, OFF
innodb_api_enable_mdl, OFF
innodb_api_trx_level, 0
innodb_autoextend_increment, 64
innodb_autoinc_lock_mode, 1
innodb_buffer_pool_dump_at_shutdown, OFF
innodb_buffer_pool_dump_now, OFF
innodb_buffer_pool_filename, ib_buffer_pool
innodb_buffer_pool_instances, 8
innodb_buffer_pool_load_abort, OFF
innodb_buffer_pool_load_at_startup, OFF
innodb_buffer_pool_load_now, OFF
innodb_buffer_pool_size, 1073741824
innodb_change_buffer_max_size, 25
innodb_change_buffering, all
innodb_checksum_algorithm, crc32
innodb_checksums, ON
innodb_cmp_per_index_enabled, OFF
innodb_commit_concurrency, 0
innodb_compression_failure_threshold_pct, 5
innodb_compression_level, 6
innodb_compression_pad_pct_max, 50
innodb_concurrency_tickets, 5000
innodb_data_file_path, ibdata1:12M:autoextend
innodb_data_home_dir,
innodb_disable_sort_file_cache, OFF
innodb_doublewrite, ON
innodb_fast_shutdown, 1
innodb_file_format, Antelope
innodb_file_format_check, ON
innodb_file_format_max, Antelope
innodb_file_per_table, ON
innodb_flush_log_at_timeout, 1
innodb_flush_log_at_trx_commit, 2
innodb_flush_method, normal
innodb_flush_neighbors, 1
innodb_flushing_avg_loops, 30
innodb_force_load_corrupted, OFF
innodb_force_recovery, 0
innodb_ft_aux_table,
innodb_ft_cache_size, 8000000
innodb_ft_enable_diag_print, OFF
innodb_ft_enable_stopword, ON
innodb_ft_max_token_size, 84
innodb_ft_min_token_size, 3
innodb_ft_num_word_optimize, 2000
innodb_ft_result_cache_limit, 2000000000
innodb_ft_server_stopword_table,
innodb_ft_sort_pll_degree, 2
innodb_ft_total_cache_size, 640000000
innodb_ft_user_stopword_table,
innodb_io_capacity, 200
innodb_io_capacity_max, 2000
innodb_large_prefix, OFF
innodb_lock_wait_timeout, 50
innodb_locks_unsafe_for_binlog, OFF
innodb_log_buffer_size, 268435456
innodb_log_compressed_pages, ON
innodb_log_file_size, 262144000
innodb_log_files_in_group, 2
innodb_log_group_home_dir, .\
innodb_lru_scan_depth, 1024
innodb_max_dirty_pages_pct, 75
innodb_max_dirty_pages_pct_lwm, 0
innodb_max_purge_lag, 0
innodb_max_purge_lag_delay, 0
innodb_mirrored_log_groups, 1
innodb_monitor_disable,
innodb_monitor_enable,
innodb_monitor_reset,
innodb_monitor_reset_all,
innodb_old_blocks_pct, 37
innodb_old_blocks_time, 1000
innodb_online_alter_log_max_size, 134217728
innodb_open_files, 300
innodb_optimize_fulltext_only, OFF
innodb_page_size, 16384
innodb_print_all_deadlocks, OFF
innodb_purge_batch_size, 300
innodb_purge_threads, 1
innodb_random_read_ahead, OFF
innodb_read_ahead_threshold, 56
innodb_read_io_threads, 64
innodb_read_only, OFF
innodb_replication_delay, 0
innodb_rollback_on_timeout, OFF
innodb_rollback_segments, 128
innodb_sort_buffer_size, 1048576
innodb_spin_wait_delay, 6
innodb_stats_auto_recalc, ON
innodb_stats_method, nulls_equal
innodb_stats_on_metadata, OFF
innodb_stats_persistent, ON
innodb_stats_persistent_sample_pages, 20
innodb_stats_sample_pages, 8
innodb_stats_transient_sample_pages, 8
innodb_status_output, OFF
innodb_status_output_locks, OFF
innodb_strict_mode, OFF
innodb_support_xa, ON
innodb_sync_array_size, 1
innodb_sync_spin_loops, 30
innodb_table_locks, ON
innodb_thread_concurrency, 0
innodb_thread_sleep_delay, 10000
innodb_undo_directory, .
innodb_undo_logs, 128
innodb_undo_tablespaces, 0
innodb_use_native_aio, OFF
innodb_use_sys_malloc, ON
innodb_version, 5.6.28
innodb_write_io_threads, 16
UPDATE:
Here is "SHOW ENGINE InnoDB STATUS" output: http://pastebin.com/raw/E3rK4Pu5
UPDATE2:
The reason behind this was somewhere else and not actually DB. Some other function in my code was eating lots of CPU causing MySQL (which runs on same machine) to go slow. Thanks for all your answers and help.
It seems you are acquiring lock on the row/table before deleting/inserting so that's why it is causing you the issue.
Check
SELECT * from information_schema.GLOBAL_VARIABLES;
Also, check locks on table using
SHOW OPEN TABLES from <database name> where In_use>0
and lock type using
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS
So when you run the query add these on watch and you can also use tee command to store it in file.
So, there is one more thing that can cause this, although you have indexed the column, but there are limitations to mysql if data Read this for limitation.
Read this to install watch http://osxdaily.com/2010/08/22/install-watch-command-on-os-x/ and run watch script with time delay of 1 second and then run mysql queries in watch, if you want to store it in file use tee command and then write output on file. You can tail to get the data in file
I am trying to estimate the real disk-usage required space for each record of my table in MySQL RDBMS.
The table has a structure like this:
ID INT 4 byte;
VARCHAR(34) 34 byte;
INT 4 byte;
INT(5) 4 byte;
INT 4 byte;
INT 4 byte which is also a FOREIGN KEY;
So there are 5 INT fields and a VARCHAR of a maximum of 34 chars (i.e. 34 bytes).
I have 2 questions:
1) The total should be 54 bytes per record (with variable VARCHAR, of course) am I right when I am saying that, or there are also some over-head bytes which I should consider when estimating the disk-usage space?
2) I have also used INT(5) instead of CHAR(5) cause I need to store only exactly 5 digits in that field (I am going to do that by application, with regExp and string length, cause I know that INT(5) could be more than an int with 5 digits).
But could this be considered such as an optimization by the disk-usage space cause I am using an INT (4 bytes) instead of a CHAR(5) which is 5 bytes, i.e. 1 more byte per record?
Thanks for the attention!
One record itself will use
1 byte in offsets
0 bytes in NULLable bits
5 bytes in "extra bytes" header
4 bytes ID
6 bytes transaction id
7 bytes rollback pointer
0-3*34 bytes in VARCHAR(34) (one character may take up to 3 bytes because of UTF8)
4*4 bytes in other integers
Each distinct value of FK will lead to one record in a secondary index. it will use
5 bytes in "extra bytes" header
4 bytes INT for FK value
4 bytes INT for Primary key
Other overhead is page level:
120 bytes per page (16k) in headers
page fill factor 15/16 - i.e. one page may contain 15k in records.
And the last - add space used by non-leaf pages, which should be small anyway
So, answer to question - 1) yes there will be some overhead that you can calculate using information above.
2) CHAR(5) in UTF8 will add a byte for its length, so INT looks reasoaanle to use
What is the most accurate way to estimate how big a database would be with the following characteristics:
MySQL
1 Table with three columns:
id --> big int)
field1 --> varchar 32
field2 --> char 32
there is an index on field2
You can assume varchar 32 is fully populated (all 32 characters). How big would it be if each field is populated and there are:
1 Million rows
5 Million rows
1 Billion rows
5 Billion rows
My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields. Making it roughly:
1 + 32 + 32 = 65 * 1 000 000 = 65 million bytes for 1 million rows
= 62 Megabyte
Therefore:
62 Mb
310 Mb
310 000 Mb = +- 302Gb
1 550 000 Mb = 1513 Gb
Is this an accurate estimation?
If you want to know the current size of a database you can try this:
SELECT table_schema "Database Name"
, SUM(data_length + index_length) / (1024 * 1024) "Database Size in MB"
FROM information_schema.TABLES
GROUP BY table_schema
My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields.
You're way off. Please refer to the MySQL Data Type Storage Requirements documentation. In particular:
A BIGINT is 8 bytes, not 1.
The storage required for a CHAR or VARCHAR column will depend on the character set in use by your database (!), but will be at least 32 bytes (not bits!) for CHAR(32) and 33 for VARCHAR(32).
You have not accounted at all for the size of the index. The size of this will depend on the database engine, but it's definitely not zero. See the documentation on the InnoDB row structure for more information.
On the MySQL website you'll find quite comprehensive information about storage requirements:
http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
It also depends if you use utf8 or not.