How to calculate mysql tables/database size - mysql

I am little confused. I have to estimate the size of the table to fit 2 million rows. I have no idea how much space primary and secondary indexes takes. Especially with composite primary and secondary indexes. Structure of the table is something like
Database Engine: innodb
create table abc(
a int,
b varchar(30),
c char(10),
d bigint(8),
FOREIGN KEY(a)
REFERENCES af(a_id)
ON DELETE RESTRICT
ON UPDATE RESTRICT,
primary key(a,b,c)
)
CREATE UNIQUE INDEX idx_abc
ON abc
( a ASC, d ASC);
CREATE INDEX idx_abc2
ON abc
( d );
Please help
Sonu

You can get the size for data/indexes using mysql.innodb_index_stats.
Warning, the size is given in pages units => You must multiply it by the pagesize which is usually of 16K.

To do an exact estimation you need to create a copy of the table and generate 2 000 000 volatile data into it, representing the actual size. Measure the table along with indexes as shown in other answers and then, knowing the answer, you can remove the copy.
If preciseness is not as important in your case, then multiply the number of bytes a record will occupy with the number of records and do the same with indexes.
record * 2 000 000 + index * 2 000 000 ~= 50 * 2 000 000 + 60 * 2 000 000 = 110 * 2 000 000 = 22 000 000

Related

Is it faster to search by column integer or column string in mysql?

I have a table "transactions" with million records
id trx secret_string (varchar(50)) secret_id (int(2.))
1 80 52987624f7cb03c61d403b7c68502fb0 1
2 28 52987624f7cb03c61d403b7c68502fb0 1
3 55 8502fb052987624f61d403b7c67cb03c 2
4 61 52987624f7cb03c61d403b7c68502fb0 1
5 39 8502fb052987624f61d403b7c67cb03c 2
..
999997 27 8502fb052987624f61d403b7c67cb03c 2
999998 94 8502fb052987624f61d403b7c67cb03c 2
999999 40 52987624f7cb03c61d403b7c68502fb0 1
1000000 35 8502fb052987624f61d403b7c67cb03c 2
As you can notice, secret_string and secret_id will always match.
Let's say, I need to select records where secret_string = "52987624f7cb03c61d403b7c68502fb0".
Is it faster to do:
SELECT id FROM transactions WHERE secret_id = 1
Than:
SELECT id FROM transactions WHERE secret_string = "52987624f7cb03c61d403b7c68502fb0"
Or it does not matter? What about for other operations such as SUM(trx), COUNT(trx), AVG(trx), etc?
Column secret_id currently does not exist, but if it is faster to search records by it, I am planning to create it upon row insertions.
Thank you
I hope I make sense.
Int comparisons are faster than varchar comparisons, for the simple fact that ints take up much less space than varchars.
This holds true both for unindexed and indexed access. The fastest way to go is an indexed int column.
There is another reason to use an int, and that is to normalise the database. Instead of having the text '52987624f7cb03c61d403b7c68502fb0' stored thousands of times in the table,you should store it's id and have the secret string stored once in a separate table. It's the same deal for other operations such as SUM COUNT AVG.
As the others told you: selecting int is definitly faster than strings. However if you need to select by secret_string, all given strings look like a hex string, that said you can consider to cast those strings to an int (or big int) using hex('52987624f7cb03c61d403b7c68502fb0') and store those int values instead of strings

Best Method for Storing Many Measurements in Database

So I have many data measurement values for different kinds of measurements.
For simplicity sake let's say they're height and weight values for two people.
Which do you think (and why) is the best method for storing the data? In actuality we're talking about 1000s of patients and a lot of data.
Method 1
TableNums
name id
height 1
weight 2
height
id value
1 140
2 130
weight
id value
1 70
2 60
In this method I have a separate table for each measurement type. This one seems good for readability and adding new measurements in the future. But it would also make for a lot of tables.
or
Method 2
TableNums
name id
height 1
weight 2
attributes
id type_id value unique_id
1 1 140 1
2 1 130 2
1 2 70 3
2 2 60 4
This method seems less readable but would require only one table for the measurements.
Which do you guys think is better practice?
Thanks,
Ben
I recommend something like this:
Table Meausurent:
MeasurementId PK
MeasurementType varchar -- height, weight, etc
MeasurementUnit varchar -- kg, cm, etc
Table patientMeasurement:
PatientId -- FK to patient
MeasurementId -- FK to measurement
value float
MeasurementDateTime datetime
other fields.
The PK of patientMeasurement could be composite (patientID, MeasurementId, MeasurementDateTime) or a separate field

Error 126 - Incorrect key file for table - 4,8 GB table size

I want to remove duplicates from a table that contains 280.717.107 entries.
The table consist in 3 fields (no primary key) user_id, from_user_id, value.
At some point there are some repetitive entries that I want to remove.
Let's say something like this:
user_id from_user_id value
1 2 4
2 2 4
3 2 4
1 2 4 #duplicate
5 2 4
8 2 4
9 2 4
9 2 4 #duplicate
My table is having 4,8 GB size (I dumped it)
So I went to the server (not phpMyAdmin) and in MySQL I did the following:
CREATE TABLE temp_table SELECT DISTINCT * FROM my_table;
At one point I get this error message :
"Error 126 - Incorrect key file for table"
Some people say that this message might be because the memory is full.
My question is can I bypass this memory crash somehow and create this new table with my distinct entries?
You could try doing it in batches by applying a filter like
where user_id <= 1000
and increasing the value each time. So the next would be
where user_id > 1000 and user_id <= 2000
Like you mentioned in your comment, limit and offset would also work.

Database size calculation?

What is the most accurate way to estimate how big a database would be with the following characteristics:
MySQL
1 Table with three columns:
id --> big int)
field1 --> varchar 32
field2 --> char 32
there is an index on field2
You can assume varchar 32 is fully populated (all 32 characters). How big would it be if each field is populated and there are:
1 Million rows
5 Million rows
1 Billion rows
5 Billion rows
My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields. Making it roughly:
1 + 32 + 32 = 65 * 1 000 000 = 65 million bytes for 1 million rows
= 62 Megabyte
Therefore:
62 Mb
310 Mb
310 000 Mb = +- 302Gb
1 550 000 Mb = 1513 Gb
Is this an accurate estimation?
If you want to know the current size of a database you can try this:
SELECT table_schema "Database Name"
, SUM(data_length + index_length) / (1024 * 1024) "Database Size in MB"
FROM information_schema.TABLES
GROUP BY table_schema
My rough estimate works out to: 1 byte for id, 32 bits each for the other two fields.
You're way off. Please refer to the MySQL Data Type Storage Requirements documentation. In particular:
A BIGINT is 8 bytes, not 1.
The storage required for a CHAR or VARCHAR column will depend on the character set in use by your database (!), but will be at least 32 bytes (not bits!) for CHAR(32) and 33 for VARCHAR(32).
You have not accounted at all for the size of the index. The size of this will depend on the database engine, but it's definitely not zero. See the documentation on the InnoDB row structure for more information.
On the MySQL website you'll find quite comprehensive information about storage requirements:
http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
It also depends if you use utf8 or not.

Why the size of MYD file is so high?

While i was creating stress data for a table i found the following files are generated.
-rw-rw---- 1 mysql mysql 8858 Jul 28 06:47 card.frm
-rw-rw---- 1 mysql mysql 7951695624 Jul 29 20:48 card.MYD
-rw-rw---- 1 mysql mysql 51360768 Jul 29 20:57 card.MYI
Actually I inserted 1985968 number of records in this table. But the index file size is unbelievable.
Structure of the table is
create table card(
company_id int(10),
emp_number varchar(100),
card_date varchar(10),
time_entry text,
total_ot varchar(15),
total_per varchar(15),
leave_taken double,
total_lop double,
primary key (company_id,emp_number,card_date),
index (company_id,card_date)
);
Is there any way to reduce the filesize of the MYD?
Please note that .MYI is your index, and .MYD is your data. The only way to reduce the size of your .MYD is to delete rows or alter your column sizes.
50MB for an index on 2 million rows is not large.
Let's look at the size breakdown of your table:
company_id - 4 Bytes
emp_number - 101 Bytes
card_date - 11 Bytes
total_ot - 17 Bytes
total_per - 17 Bytes
leave_taken - 9 Bytes
total_lop - 9 Bytes
time_entry - avg(length(time_entry)) + 3 Bytes
This gives us a row length of 172 + time_entry bytes. If time_entry averages out at 100 bytes. You're looking at 272 * 2000000 = 544MB
Of significance to me is the number of VARCHARs. Does employee number need to be a varchar(100), or even a varchar at all? You're duplicating that data in it's entirety in your index on (company_id,emp_number,card_date) as you're indexing the whole column.
You probably don't need a varchar here, and you possibly don't need it included in the primary key.
Do you really need time_entry to be a TEXT field? This is likely the biggest consumer of space in your database.
Why are you using varchar(10) for card date? If you used DATETIME you'd only use 8 Bytes instead of 11, TIMESTAMP would be 4 Bytes, and DATE would be 3 Bytes.
You're also adding 1 Byte for every column that can be NULL.
Also try running ANALYZE/REPAIR/OPTIMIZE TABLE commands as well.
A lot depends on how big that time_entry text field can be. I'm going to assume it's small, less than 100 bytes. Then you have roughly 4 + 100 + 10 + 100 + 15 + 15 + 8 + 8 = roughly 300 bytes of data per record. You have 2 million records. I'd expect the database to be 600 megabytes. In fact you are showing 8000 megabytes of data in the MYD on disk, or a factor of 12x. Something's not right.
Your best diagnostic tool is show table status. In particular check Avg_row_length and Data_length, they will give you some insight into where the space is going.
If you're using MyISAM tables, you may find that myisamchk will help make the table smaller. This tool particularly helps if you inserted and then deleted a lot of rows from the database. "optimize table" can help too. MyISAM does support read-only compressed tables via myisampack. I'd treat that as a last resort, though.