Why my data is too large relative at really size?

Why my data is too large relative at really size? - mysql

i have problem with my database, i collect dump to other database to analyze it and i have create table to reference date checked so when i see my interface phpmyadmin, i see my table size equal at 16Kio for two date with id !
i saw document here http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html
and the date type size is 3 bytes ... the différence is too big, and i have problem with all data !
the size of original dump is 1.9Mo for one dump, in my database the size is 5Mo !
i don't know where does the problem ?

There is no problem,
The database in MySQL is not only data, you must consider space used by:
Table definition
Table indexed (>= table data)
Table data
Each field type have different storage methods.
The dump file only contains table definitions and data as SQL inserts.
Checking your db structure and sample dump:
Your database are stored as InnoDB with UTF8, an your dumps are stored using MyISAM with latin1, is a big difference because UTF character sets use more space to store string/varchar data and InnoDB uses additional space because the table is stored physically inside InnoDB files.
I choose a dump file and one table then I create the same table using InnoDB with UTF8, see the difference in sizes:
mysql> call tools.sp_status(database());
+---------------------+--------+-------+---------+-------------------+
| Table Name | Engine | Rows | Size | Collation |
+---------------------+--------+-------+---------+-------------------+
| BDDJoueurs | MyISAM | 33981 | 2.47 Mb | latin1_swedish_ci |
| BDDJoueurs_unicode | InnoDB | 33981 | 6.03 Mb | utf8_unicode_ci |
+---------------------+--------+-------+---------+-------------------+
I think you are using InnoDB to analyze, maybe is a good idea change your consolidated data to MyISAM preserving your character set.
Note: I use a custom show table status.

Table size is not a simple multiple of the number of rows, more to do with the page size. 16KB sounds about right for a single page (see mysql documentation). Your two rows would take less than 1 page, so 1 page is sufficient.
There is other overhead too for indexes, metadata, etc.
BTW What is a Mo, a Kio?

Related

How to reclaim MySql disk space

I have a table in MySql server and the table contains around 1M rows. Only because of one column table is taking more disk space day by day. The datatype of this column is Mediumblob. Table size is around 90 GB.
After each row insertion, I do some processing then after I don't really require this column.
So for this column, if I set the value to NULL after processing the row, does MySql utilizes this empty space for next row insertion or not?
MySql Server details
Server version: 5.7
Engine: InnoDB
Hosting: Google Cloud Sql
EDIT 1:
I deleted 90% of rows from table then I ran OPTIMIZE TABLE table_name
but it has reduced only 4GB of disk space and it is not reclaiming the free disk space.
EDIT 2
I even deleted my database and created new DB and table but MySql server still showing 80GB disk space. Sizes of all databases of MySQL server
SELECT table_schema "database name",
sum( data_length + index_length ) / 1024 / 1024 "database size in MB",
sum( data_free )/ 1024 / 1024 "free space in MB"
FROM information_schema.TABLES
GROUP BY table_schema;
+--------------------+---------------------+------------------+
| database name | database size in MB | free space in MB |
+--------------------+---------------------+------------------+
| information_schema | 0.15625000 | 80.00000000 |
| app_service | 15.54687500 | 4.00000000 |
| mysql | 6.76713467 | 2.00000000 |
| performance_schema | 0.00000000 | 0.00000000 |
| sys | 0.01562500 | 0.00000000 |
+--------------------+---------------------+------------------+
Thanks

Edit: It turns out from comments below that the user's binary logs are the culprit. It makes sense that the binary logs would be large after a lot of DELETEs, and assuming that the MySQL instance is using row-based replication.
The answer is complex.
You can save space by using NULL instead of real values. InnoDB uses only 1 bit per column per row to indicate that the value is NULL (see my old answer to https://stackoverflow.com/a/230923/20860) for details.
But this will just make space in the page where that row was stored. Each page must store only rows from the same table. So if you set a bunch of them NULL, you make space in that page, which can be used for subsequent inserts for that table only. It won't use the gaps for rows that belong to other tables.
And it still may not be reused for any rows of your mediumblob table, because InnoDB stores rows in primary key order. The pages for a given table don't have to be consecutive, but I would guess the rows within a page may be. In other words, you might not be able to insert rows in primary key random order within a page.
I don't know this detail for certain, you'd have to read Jeremey Cole's research on InnoDB storage to know the answer. Here's an excerpt:
The actual on-disk format of user records will be described in a future post, as it is fairly complex and will require a lengthy explanation itself.
User records are added to the page body in the order they are inserted (and may take existing free space from previously deleted records), and are singly-linked in ascending order by key using the “next record” pointers in each record header.
It's still not quite clear whether rows can be inserted out of order, and reuse space on a page.
So it's possible you'll only accomplish fragmenting your pages badly, and new rows with high primary key values will be added to other pages anyway.
You can do a better effort of reclaiming the space if you use OPTIMIZE TABLE from time to time, which will effectively rewrite the whole table into new pages. This might re-pack the rows, fitting more rows into each page if you've changed values to NULL.
It would be more effective to DELETE rows you don't need, and then OPTIMIZE TABLE. This will eliminate whole pages, instead of leaving them fragmented.

Speeding up a MySql DELETE that relies on a BIT column

I’m using MySql 5.5.46 and have an InnoDB table with a Bit Column (named “ENABLED”). There is no index on this column. The table has 26 million rows, so understandably, the statement
DELETE FROM my_table WHERE ENABLED = 0;
takes a really long time. My question is, is there anything I can do (without upgrading MySQL, which is not an option at this time), to speed up the time it takes to run this query? My “innodb_buffer_pool_size” variable is set to the following:
show variables like 'innodb_buffer_pool_size';
+-------------------------+-------------+
| Variable_name | Value |
+-------------------------+-------------+
| innodb_buffer_pool_size | 11674845184 |
+-------------------------+-------------+

Do the DELETE in "chunks" of 1000, based on the PRIMARY KEY. See Delete Big. That article goes into details about efficient ways to chunk, and what to do about gaps in the PK.
(With that 11GB buffer_pool, I assume you have 16GB of RAM?)
In general, MySQL will do a table scan instead of using an index if the number of rows to be selected is more than about 20% of the total number of rows. Hence, almost never are "flag" fields worth indexing by themselves.

Mysql: Duplicate key error with autoincrement primary key

I have a table 'logging' in which we log visitor history. We have 14 millions pageviews in a day, so we insert 14 million records in table in a day, and traffic is highest in afternoon. From somedays we are facing the problems for duplicate key entry 'id', which according to me should not be the case, since id is autoincremented field and we are not explicitly passing id in insert query. Following are the details
logging (MyISAM)
----------------------------------------
| id | int(20) |
| virtual_user_id | varchar(1000) |
| visited_page | varchar(255) |
| /* More such columns are there */ |
----------------------------------------
Please let me know what is the problem here. Is keeping table in MyISAM a problem here.

Problem 1: size of your primary key
http://dev.mysql.com/doc/refman/5.0/en/integer-types.html
The max size of an INT regardless of the size you give it is 2147483647, twice that much if unsigned.
That means you get a problem every 153 days.
To prevent that you might want to change the datatype to an unsigned bigint.
Or for even more ridiculously large volumes even a unix timestamp + microtime as a composite key. Or a different DB solution altogether.
Problem 2: the actual error
It might be concurrency, even though I don't find that very plausible.
You'll have to provide the insert IDs / errors for that. Do you use transactions?
Another possibility is a corrupt table.
Don't know your mysql version, but this might work:
CHECK TABLE tablename
See if that has any complaints.
REPAIR TABLE tablename
General advice:
Is this a sensible amount of data to be inserting into a database, and doesn't it slow everything down too much anyhow?
I wonder how your DB performs with locking and all during the delete during for example an alter table.
The right way to do it totally depends on the goals and requirements of your system which I don't know, but here's an idea:
Log lines into a log. Import the log files in our own time. Don't bother your visitors with errors or delays when your DB is having trouble or when you need to do some big operation that locks everything.

Mysql Group by query is taking long time

I have a table "Words" in mysql database. This table contains 2 fields. word(VARCHAR(256)) and p_id(INTEGER).
Create table statement for the table:
CREATE TABLE `Words` (
`word` varchar(256) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`p_id` int(11) NOT NULL DEFAULT '0',
KEY `word_i` (`word`(255))
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Sample entries in the table are:
+------+------+
| word | p_id |
+------+------+
| a | 1 |
| a | 2 |
| b | 1 |
| a | 4 |
+------+------+
This table contains 30+ million entries in it. I am running a group by query and it is taking 90+ minutes for running that query. The group by query I am running is:
SELECT word,group_concat(p_id) FROM Words group by word;
To optimize this problem, I sent all the data in the table into a text file using the following query.
SELECT p_id,word FROM Words INTO OUTFILE "/tmp/word_map.txt";
After that I wrote a Perl script to read all the content in the file and parse that and make a hash out of it. It took very less time compared to the Group by query(<3min).In the end hash has 14million keys(words). It is occupying a lot of memory.So Is there any way to improve the performance of Group BY query so that I don't need to go through all the above mentioned steps?
EDT: I am adding the my.cnf file entries below.
[mysqld]
datadir=/media/data/.mysql_data/mysql
tmpdir=/media/data/.mysql_tmp_data
innodb_log_file_size=5M
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
group_concat_max_len=4M
max_allowed_packet=20M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
tmpdir=/media/data/.mysql_tmp_data/
Thanks,
Vinod

I think the index you want is:
create index words_word_pid on words(word, pid)
This does two things. First, the group by can be handled by an index scan rather than loading the original table and sorting the results.
Secondly, this index also eliminates the need to load the original data.
My guess is that the original data does not fit into memory. So, the processing goes through the index (efficiently), finds the word, and then needs to load the pages with the word on it. Well, eventually memory fills up and the page with the word is not in memory. The page is loaded from disk. And the next page is probably not in memory, and that page is loaded from disk. And so on.
You can fix this problem by increasing the memory size. You can also fix the problem by having an index that covers all the columns used in the query.

The problem is that it is hardly a frequent usecase for a database to output the whole 30M rows table into a file. The advantange of your approach with the Perl script is that you do not need random disk IO. To simulate the bahaviour in MySQL you will need to load everythin into an index (p_id, word) (the whole word, not a prefix), which might turn out an overkill for the database.
You can put only p_id into an index, this will speed up grouping, but will require a lot of random disk IO to fetch words for each row.
By the way, the covering index will take ~(4+4+3*256)*30M bytes, that is more than 23Gb of memory. It seems that the solution with the Perl script is the best you can do.
Another thing you should be aware of is that you will need to get more than 20Gb of result through a MySQL connection, and that those 20 Gb of result shoul be collected into a temporary table (and sorted by p_id if you do not append ORDER BY NULL). If you are going to download if through a MySQL binding to a programming language, you will need to force the binding use streaming (by default bindings usually get the whole resultset)

Index the table on the word column. This will accelerate the grouping substantially as the SQL engine can locate the records for grouping with minimal searching through the table.
CREATE INDEX word_idx ON Words(word);

Why is InnoDB table size much larger than expected?

I'm trying to figure out storage requirements for different storage engines. I have this table:
CREATE TABLE `mytest` (
`num1` int(10) unsigned NOT NULL,
KEY `key1` (`num1`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When I insert some values and then run show table status; I get the following:
+----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------+--------+---------+------------+---------+----------------+-------------+------------------+--------------+-----------+----------------+---------------------+---------------------+------------+-------------------+----------+----------------+---------+
| mytest | InnoDB | 10 | Compact | 1932473 | 35 | 67715072 | 0 | 48840704 | 4194304 | NULL | 2010-05-26 11:30:40 | NULL | NULL | latin1_swedish_ci | NULL | | |
Notice avg_row_length is 35. I am baffled that InnoDB would not make better use of space when I'm just storing a non-nullable integer.
I have run this same test on myISAM and by default myISAM uses 7 bytes per row on this table. When I run
ALTER TABLE mytest MAX_ROWS=50000000, AVG_ROW_LENGTH = 4;
causes myISAM to finally correctly use 5-byte rows.
When I run the same ALTER TABLE statement for InnoDB the avg_row_length does not change.
Why would such a large avg_row_length be necessary when only storing a 4-byte unsigned int?

InnoDB tables are clustered, that means that all data are contained in a B-Tree with the PRIMARY KEY as a key and all other columns as a payload.
Since you don't define an explicit PRIMARY KEY, InnoDB uses a hidden 6-byte column to sort the records on.
This and overhead of the B-Tree organization (with extra non-leaf-level blocks) requires more space than sizeof(int) * num_rows.

Here is some more info you might find useful.
InnoDB allocates data in terms of 16KB pages, so 'SHOW TABLE STATUS' will give inflated numbers for row size if you only have a few rows and the table is < 16K total. (For example, with 4 rows the average row size comes back as 4096.)
The extra 6 bytes per row for the "invisible" primary key is a crucial point when space is a big consideration. If your table is only one column, that's the ideal column to make the primary key, assuming the values in it are unique:
CREATE TABLE `mytest2`
(`num1` int(10) unsigned NOT NULL primary key)
ENGINE=InnoDB DEFAULT CHARSET=latin1;
By using a PRIMARY KEY like this:
No INDEX or KEY clause is needed, because you don't have a secondary index. The index-organized format of InnoDB tables gives you fast lookup based on the primary key value for free.
You don't wind up with another copy of the NUM1 column data, which is what happens when that column is indexed explicitly.
You don't wind up with another copy of the 6-byte invisible primary key values. The primary key values are duplicated in each secondary index. (That's also the reason why you probably don't want 10 indexes on a table with 10 columns, and you probably don't want a primary key that combines several different columns or is a long string column.)
So overall, sticking with just a primary key means less data associated with the table + indexes. To get a sense of overall data size, I like to run with
set innodb_file_per_table = 1;
and examine the size of the data/database/*table*.ibd files. Each .ibd file contains the data for an InnoDB table and all its associated indexes.
To quickly build up a big table for testing, I usually run a statement like so:
insert into mytest
select * from mytest;
Which doubles the amount of data each time. In the case of the single-column table using a primary key, since the values had to be unique, I used a variation to keep the values from colliding with each other:
insert into mytest2
select num1 + (select count(*) from mytest2) from mytest2;
This way, I was able to get average row size down to 25. The space overhead is based on the underlying assumption that you want to have fast lookup for individual rows using a pointer-style mechanism, and most tables will have a column whose values serve as pointers (i.e. the primary key) in addition to the columns with real data that gets summed, averaged, and displayed.

IN addition to Quassnoi's very fine answer, you should probably try it out using a significant data set.
What I'd do is, load 1M rows of simulated production data in, then measure the table size and use that as a guide.
That's what I've done in the past anyway

MyISAM
MyISAM, except in really old versions, uses a 7-byte "pointer" for locating a row, and a 6-byte pointer inside indexes. These defaults lead to a huge max table size. More details: http://mysql.rjweb.org/doc.php/limits#myisam_specific_limits . The kludgy way to change those involves the ALTER .. MAX_ROWS=50000000, AVG_ROW_LENGTH = 4 that you discovered. The server multiplies those values together to compute how many bytes the data pointer needs to be. Hence, you stumbled on how to shrink the avg_row_length.
But you actually needed to declare a table with fewer than 7 bytes to hit it! The pointer size shows in multiple places:
Free space links in the .MYD default to 7 bytes. So, when you delete a row, a link is provided to the next free spot. That link needs to be 7 bytes (by default), hence the row size was artificially extended from the 4-byte INT to make room for it! (There are more details having to do with whether the column is NULLable , etc.
FIXED vs DYNAMIC row -- When the table is FIXED size, the "pointer" is a row number. For DYNAMIC, it is a byte offset into the .MYD.
Index entries must also point to data rows with a pointer. So your ALTER should have shrunk the .MYI file as well!
There are more details, but MyISAM is likely to go away, so this ancient history is not likely to be of concern to anyone.
InnoDB
https://stackoverflow.com/a/64417275/1766831

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008