I'm newbie in MySql. Supposed I have a table like this:
**Month table**
-----------------
id month data
1 1 0.5
2 1 0.8
3 2 0.12
4 2 0.212
5 2 1.4
6 3 5.7
7 4 6.8
How can I split it into different tables based on month and have those tables available in disk. If that's possible, how can I later refer to these tables later. For example:
**January table**
-----------------
id month data
1 1 0.5
2 1 0.8
**February table**
-----------------
id month data
1 2 0.12
2 2 0.212
3 2 1.4
**March table**
-----------------
id month data
1 3 5.7
**April table**
-----------------
id month data
1 4 6.8
etc.
Thank you for your help.
I looked into partition by range but I don't think it serves my purpose because it is not creating smaller tables. I was thinking of writing a procedure but don't know where to start.
If your intent is to simply speed up queries by month, just add an index to your month or timestamp column. Partitions speed things up, but they can also slow things down.
I looked into partition by range but I don't think it serves my purpose because it is not creating smaller tables.
This is incorrect.
Partitioning takes this notion a step further, by enabling you to distribute portions of individual tables across a file system according to rules which you can set largely as needed. In effect, different portions of a table are stored as separate tables in different locations.
You can alter your existing table into 12 partitions by month. If you already have a month column...
alter table month_table
partition by list(month) (
partition January values in (1),
partition February values in (1),
-- etc ---
);
And if you have a date or timestamp column, turn it into a month.
alter table month_table
partition by list(month(created_at)) (
partition January values in (1),
partition February values in (1),
-- etc ---
);
...how can I later refer to these tables later
Generally, you don't refer to the individual partitions. You insert and query the main table. The point of partitioning is to be transparent. If there is an applicable where clause that will read from the related partition.
-- This will read from the February partition.
select * from month_table where "month" = 2;
-- This will read from both the February and March partitions.
select * from month_table where "month" in (2,3);
But you can query the individual partitions with a partition clause using the name of the partition.
SELECT * FROM month_table PARTITION (January);
You generally do not need to do this except to debug what is in each partition.
MySQL states
If you set a column to the value it currently has, MySQL notices this and does not update it.
So for example I have 20 columns and i am about to update them all but 10 of them will still have the same value. Will the performance or speed behaves the same way like you are updating just 10 columns? Or is it the other way around?
In your example, in the first case there will be 20 checks and 10 updates.
For 10 columns, there will be 10 checks and 10 updates.
So you "win" 10 checks in MySQL-side performance.
I've set of numbers and to each number in it there are few numbers associated with it. So I store it in a table like this:
NUMBERS ASSOCIATEDNUMBERS
1 3
1 7
1 8
2 11
2 7
7 9
8 13
11 17
14 18
17 11
17 18
Thus it's a number which has many associated numbers and vice versa. Both columns are indexed. (Thus enabling me to find number and its associated numbers and vice versa)
My create table looks like this:
CREATE TABLE `TABLE_B` (
`NUMBERS` bigint(20) unsigned NOT NULL,
`ASSOCIATEDNUMBERS` bigint(20) unsigned NOT NULL,
UNIQUE KEY `unique_number_associatednumber_constraint` (`NUMBERS`,`ASSOCIATEDNUMBERS`),
KEY `fk_AssociatedNumberConstraint` (`ASSOCIATEDNUMBERS`),
CONSTRAINT `fk_AssociatedNumberConstraint` FOREIGN KEY (`ASSOCIATEDNUMBERS`) REFERENCES `table_a` (`SRNO`),
CONSTRAINT `fk_NumberConstraint` FOREIGN KEY (`NUMBERS`) REFERENCES `table_a`` (`SRNO`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here TABLE_A has column SRNO which is AUTO_INCREMENT PRIMARY KEY and is first column in the table. (As per MySQL manual I haven't defined indexes on TABLE_B.NUMBERS and TABLE_B.ASSOCIATEDNUMBERS as foreign key constraints defines it automatically)
PROBLEM:
Whenever I need to change ASSOCIATEDNUMBERS for a number (in `NUMBERS') I just delete existing rows for that number from the table:
DELETE FROM TABLE_B WHERE NUMBERS= ?
and then insert rows for new set of ASSOCIATEDNUMBERS:
INSERT INTO TABLE_B (NUMBERS, ASSOCIATEDNUMBERS) VALUES ( ?, ?), (?, ?), (?, ?), ...
However, this takes long time. Especially when in my multi-threaded application I open multiple connections (each per thread) to the database, each running above two queries (but each with different number).
For example, if I open 40 connections, each connection to delete existing and insert 250 new associated numbers, it takes upto 10 to 15 seconds. If I increase number of connections, the time also increases.
Other Information:
SHOW GLOBAL STATUS LIKE 'Threads_running';
Shows upto 40 threads.
Innodb parameters:
innodb_adaptive_flushing, ON
innodb_adaptive_flushing_lwm, 10
innodb_adaptive_hash_index, ON
innodb_adaptive_max_sleep_delay, 150000
innodb_additional_mem_pool_size, 2097152
innodb_api_bk_commit_interval, 5
innodb_api_disable_rowlock, OFF
innodb_api_enable_binlog, OFF
innodb_api_enable_mdl, OFF
innodb_api_trx_level, 0
innodb_autoextend_increment, 64
innodb_autoinc_lock_mode, 1
innodb_buffer_pool_dump_at_shutdown, OFF
innodb_buffer_pool_dump_now, OFF
innodb_buffer_pool_filename, ib_buffer_pool
innodb_buffer_pool_instances, 8
innodb_buffer_pool_load_abort, OFF
innodb_buffer_pool_load_at_startup, OFF
innodb_buffer_pool_load_now, OFF
innodb_buffer_pool_size, 1073741824
innodb_change_buffer_max_size, 25
innodb_change_buffering, all
innodb_checksum_algorithm, crc32
innodb_checksums, ON
innodb_cmp_per_index_enabled, OFF
innodb_commit_concurrency, 0
innodb_compression_failure_threshold_pct, 5
innodb_compression_level, 6
innodb_compression_pad_pct_max, 50
innodb_concurrency_tickets, 5000
innodb_data_file_path, ibdata1:12M:autoextend
innodb_data_home_dir,
innodb_disable_sort_file_cache, OFF
innodb_doublewrite, ON
innodb_fast_shutdown, 1
innodb_file_format, Antelope
innodb_file_format_check, ON
innodb_file_format_max, Antelope
innodb_file_per_table, ON
innodb_flush_log_at_timeout, 1
innodb_flush_log_at_trx_commit, 2
innodb_flush_method, normal
innodb_flush_neighbors, 1
innodb_flushing_avg_loops, 30
innodb_force_load_corrupted, OFF
innodb_force_recovery, 0
innodb_ft_aux_table,
innodb_ft_cache_size, 8000000
innodb_ft_enable_diag_print, OFF
innodb_ft_enable_stopword, ON
innodb_ft_max_token_size, 84
innodb_ft_min_token_size, 3
innodb_ft_num_word_optimize, 2000
innodb_ft_result_cache_limit, 2000000000
innodb_ft_server_stopword_table,
innodb_ft_sort_pll_degree, 2
innodb_ft_total_cache_size, 640000000
innodb_ft_user_stopword_table,
innodb_io_capacity, 200
innodb_io_capacity_max, 2000
innodb_large_prefix, OFF
innodb_lock_wait_timeout, 50
innodb_locks_unsafe_for_binlog, OFF
innodb_log_buffer_size, 268435456
innodb_log_compressed_pages, ON
innodb_log_file_size, 262144000
innodb_log_files_in_group, 2
innodb_log_group_home_dir, .\
innodb_lru_scan_depth, 1024
innodb_max_dirty_pages_pct, 75
innodb_max_dirty_pages_pct_lwm, 0
innodb_max_purge_lag, 0
innodb_max_purge_lag_delay, 0
innodb_mirrored_log_groups, 1
innodb_monitor_disable,
innodb_monitor_enable,
innodb_monitor_reset,
innodb_monitor_reset_all,
innodb_old_blocks_pct, 37
innodb_old_blocks_time, 1000
innodb_online_alter_log_max_size, 134217728
innodb_open_files, 300
innodb_optimize_fulltext_only, OFF
innodb_page_size, 16384
innodb_print_all_deadlocks, OFF
innodb_purge_batch_size, 300
innodb_purge_threads, 1
innodb_random_read_ahead, OFF
innodb_read_ahead_threshold, 56
innodb_read_io_threads, 64
innodb_read_only, OFF
innodb_replication_delay, 0
innodb_rollback_on_timeout, OFF
innodb_rollback_segments, 128
innodb_sort_buffer_size, 1048576
innodb_spin_wait_delay, 6
innodb_stats_auto_recalc, ON
innodb_stats_method, nulls_equal
innodb_stats_on_metadata, OFF
innodb_stats_persistent, ON
innodb_stats_persistent_sample_pages, 20
innodb_stats_sample_pages, 8
innodb_stats_transient_sample_pages, 8
innodb_status_output, OFF
innodb_status_output_locks, OFF
innodb_strict_mode, OFF
innodb_support_xa, ON
innodb_sync_array_size, 1
innodb_sync_spin_loops, 30
innodb_table_locks, ON
innodb_thread_concurrency, 0
innodb_thread_sleep_delay, 10000
innodb_undo_directory, .
innodb_undo_logs, 128
innodb_undo_tablespaces, 0
innodb_use_native_aio, OFF
innodb_use_sys_malloc, ON
innodb_version, 5.6.28
innodb_write_io_threads, 16
UPDATE:
Here is "SHOW ENGINE InnoDB STATUS" output: http://pastebin.com/raw/E3rK4Pu5
UPDATE2:
The reason behind this was somewhere else and not actually DB. Some other function in my code was eating lots of CPU causing MySQL (which runs on same machine) to go slow. Thanks for all your answers and help.
It seems you are acquiring lock on the row/table before deleting/inserting so that's why it is causing you the issue.
Check
SELECT * from information_schema.GLOBAL_VARIABLES;
Also, check locks on table using
SHOW OPEN TABLES from <database name> where In_use>0
and lock type using
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS
So when you run the query add these on watch and you can also use tee command to store it in file.
So, there is one more thing that can cause this, although you have indexed the column, but there are limitations to mysql if data Read this for limitation.
Read this to install watch http://osxdaily.com/2010/08/22/install-watch-command-on-os-x/ and run watch script with time delay of 1 second and then run mysql queries in watch, if you want to store it in file use tee command and then write output on file. You can tail to get the data in file
Lets say that I have a table that looks like
1 | Test | Jan 10, 2017
...
10000 | Test | Jan 20, 2030
and I want to bucket the records in the table based on the 2nd column with a set amount of 10 buckets regardless of the values of the dates. All I require is that each bucket covers a time range of equal length.
I understand that I could do something with
GROUP BY
YEAR(datefield),
MONTH(datefield),
DAY(datefield),
HOUR(datefield),
and subtract the largest datefield with the smallest datefield and divide by 10 to get the time length covered in each bucket. However, is there already built-in functionality in MySQL that would do this as doing the manual subtraction and division might lead to smaller edge cases. Am I on the right track by doing the subtraction and division for bucketing into a constant number of buckets?
While i was creating stress data for a table i found the following files are generated.
-rw-rw---- 1 mysql mysql 8858 Jul 28 06:47 card.frm
-rw-rw---- 1 mysql mysql 7951695624 Jul 29 20:48 card.MYD
-rw-rw---- 1 mysql mysql 51360768 Jul 29 20:57 card.MYI
Actually I inserted 1985968 number of records in this table. But the index file size is unbelievable.
Structure of the table is
create table card(
company_id int(10),
emp_number varchar(100),
card_date varchar(10),
time_entry text,
total_ot varchar(15),
total_per varchar(15),
leave_taken double,
total_lop double,
primary key (company_id,emp_number,card_date),
index (company_id,card_date)
);
Is there any way to reduce the filesize of the MYD?
Please note that .MYI is your index, and .MYD is your data. The only way to reduce the size of your .MYD is to delete rows or alter your column sizes.
50MB for an index on 2 million rows is not large.
Let's look at the size breakdown of your table:
company_id - 4 Bytes
emp_number - 101 Bytes
card_date - 11 Bytes
total_ot - 17 Bytes
total_per - 17 Bytes
leave_taken - 9 Bytes
total_lop - 9 Bytes
time_entry - avg(length(time_entry)) + 3 Bytes
This gives us a row length of 172 + time_entry bytes. If time_entry averages out at 100 bytes. You're looking at 272 * 2000000 = 544MB
Of significance to me is the number of VARCHARs. Does employee number need to be a varchar(100), or even a varchar at all? You're duplicating that data in it's entirety in your index on (company_id,emp_number,card_date) as you're indexing the whole column.
You probably don't need a varchar here, and you possibly don't need it included in the primary key.
Do you really need time_entry to be a TEXT field? This is likely the biggest consumer of space in your database.
Why are you using varchar(10) for card date? If you used DATETIME you'd only use 8 Bytes instead of 11, TIMESTAMP would be 4 Bytes, and DATE would be 3 Bytes.
You're also adding 1 Byte for every column that can be NULL.
Also try running ANALYZE/REPAIR/OPTIMIZE TABLE commands as well.
A lot depends on how big that time_entry text field can be. I'm going to assume it's small, less than 100 bytes. Then you have roughly 4 + 100 + 10 + 100 + 15 + 15 + 8 + 8 = roughly 300 bytes of data per record. You have 2 million records. I'd expect the database to be 600 megabytes. In fact you are showing 8000 megabytes of data in the MYD on disk, or a factor of 12x. Something's not right.
Your best diagnostic tool is show table status. In particular check Avg_row_length and Data_length, they will give you some insight into where the space is going.
If you're using MyISAM tables, you may find that myisamchk will help make the table smaller. This tool particularly helps if you inserted and then deleted a lot of rows from the database. "optimize table" can help too. MyISAM does support read-only compressed tables via myisampack. I'd treat that as a last resort, though.