I want to convert a CSV database into a MySQL one, I know I will never add any new row in the database tables. I know the max ID of each table, for example : 9898548.
What should be the proper way to compute the int size ? Does a CEIL(LOG2(last_id)) could be sufficient for this ? With my example, it would be LOG2(9898548) = 23.2387 so int(24) ? is this correct ?
When you're defining your table and you know your max values you can refer to the max table sizes. See http://dev.mysql.com/doc/refman/5.7/en/integer-types.html for a table of numeric sizes.
IDs are usually positive so you can use the unsigned numbers. In your case 9898548 is less than 16777215 (the unsigned MEDIUMINT max value) so that would be the most space efficient storage option. So your calculation is correct. You need 24 bits or 3 bytes, or a UNSIGNED MEDIUMINT.
CREATE TABLE your_table (id UNSIGNED MEDIUMINT PRIMARY KEY);
The brackets with numbers inside are to help MySQL display the number correctly, they don't do anything to the storage size. So INT(11) and INT (24), can both sure the same range of numbers. But the one defined INT (11) will only display a number with a column width of equivalent to 11 digits even if the number is smaller. See
http://dev.mysql.com/doc/refman/5.7/en/numeric-type-attributes.html
"This optional display width may be used by applications to display integer values having a width less than the width specified for the column by left-padding them with spaces"
Yes, in this case, you need an integer type with a least 24 bits (equals 3 bytes). The smallest in MySQL satisfying this is UNSIGNED MEDIUMINT, according to the documentation.
Edit: Added the UNSIGNED.
Related
Is it possible to use the Locate() function on TEXT column, or is there any alternative to it for TEXT fields.
the thing is we have LARGE varchars (65kb) that we use to track for subscriptions, so we add subscription_ids inside 1 long string in varchar.
this string can hold up to 5000 subscription_ids in 1 row. we use LOCATE to see if a user is subscribed.
if a subscription_id is found inside the varchar string.
the problem is that we plan to have more than 500,000 rows like this, it seems this can have a big impact on performance.
so we decided to move to TEXT instead, but now there is a problem with indexation and how to LOCATE sub-text inside a TEXT column.
Billions of subscriptions? Please show an abbreviated example of a TEXT value. Have you tried FIND_IN_SET()?
Is one TEXT field showing up to 5000 subscriptions for one user? Or is it the other way -- up to 5K users for one magazine?
In any case, it would be better to have a table with 2 columns:
CREATE TABLE user_sub (
user_id INT UNSIGNED NOT NULL,
sub_id INT UNSIGNED NOT NULL,
PRIMARY KEY(user_id, sub_id),
INDEX(sub_id, user_id)
) ENGINE=InnoDB;
The two composite indexes let you very efficiently find the 5K subscriptions for a user or the 500K users for a sub.
Shrink the less-500K id to MEDIUMINT UNSIGNED (16M limit instead of 4 billion; 3 bytes each instead of 4).
Shrink the less-5K id to SMALLINT UNSIGNED (64K limit instead of 4B; 2 bytes each instead of 4).
If you desire, you can use GROUP_CONCAT() to reconstruct the commalist. Be sure to change group_concat_max_len to a suitably large number (default is only 1024 bytes.)
I'm hoping that there is a simple fix for this. I have a database column in which I store a number. I knew that the numbers would get pretty big, so I set the field to 'bigint'. However it will not store a number larger than 9223372036854775808.
Why is this?
Quoting from the manual:
BIGINT[(M)] [UNSIGNED] [ZEROFILL]
A large integer. The signed range is -9223372036854775808 to 9223372036854775807. The unsigned range is 0 to 18446744073709551615.
You've hit the bigint maximum size. This is a limitation due to the way the number is stored on the computer. It's the maximum size number you can represent with 8 bytes.
If you need to store a bigger number, consider using another method. You could use varchars but you will need to convert if you're doing math operations on it.
I think you're using a signed BIGINIT:
Range of a signed bigint: -9223372036854775808 - 9223372036854775807
Range of an unsigned bigint: 0- 18446744073709551615
Use ALTER TABLE to modify the column:
ALTER TABLE t1 MODIFY col1 BIGINT UNSIGNED;
I have two tables:
table A with columns, INT t_a, INT t_b, INT t_c, INT t_d, VARCHAR t_var
table B with columns, INT t_a, INT t_b, INT t_c, INT t_d, CHAR t_cha
If I select column t_a, will there be any performance difference between table A and table B?
In theory there should be a performance penalty when dealing with VARCHARs as you can not work with fixed adress computation. But in practice this is nowadays not visible.
no difference. between char and varchar when selecting
Yes, but it is quite subtle. I presume that the character fields actually have lengths associated with them.
There is a difference in how the data is stored. The char field will store all the characters in the database, even when they are spaces at the end. The varchar() field will only store the length needed for the fields.
So, if you had a table that contained US state names, then:
create table states (
stateid int,
statename char(100)
);
Would occupy something like 100*50 + 100*4 = 5,400 bytes in the database. With a varchar(), the space usage would be much less.
In larger tables, this can increase the number of pages needed to store the data. This additional storage can slow down the query, by some amount. (It could be noticeable on a table with a large number of records and lots of such wasted space).
See the below screenshot. Note that the insert statement lists the trade_id as 4404689907. The subsequent select lists the trade_id as 2147483647. Anyone have any idea what's going on here?
Your column in a signed INT which holds integers up to 2147483647. Your value is clearly larger than that. Even an unsigned INT only holds a value up to 4294967295. You will need to use BIGINT for that data.
See Integer Types (Exact Value)
Funny thing I've found abount mysql. MySQL has a 3 byte numeric type - MEDIUMINT. Its range is from -8388608 to 8388607. It seems strange to me. Size of numeric types choosen for better performance, I thought data should be aligned to a machine word or double word. And if we need some restriction rules for numeric ranges, it must be external relative to datatype. For example:
CREATE TABLE ... (
id INT RANGE(0, 500) PRIMARY KEY
)
So, does anyone know why 3 bytes? Is there any reason?
The reason is so that if you have a number that falls within a 3 byte range, you don't waste space by storing it using 4 bytes.
When you have twenty billion rows, it matters.
The alignment issue you mentioned applies mostly to data in RAM. Nothing forces MySQL to use 3 bytes to store that type as it processes it.
This might have a small advantage in using disk cache more efficiently though.
We frequently use tinyint, smallint, and mediumint as very significant space savings. Keep in mind, it makes your indexes that much smaller.
This effect is magnified when you have really small join tables, like:
id1 smallint unsigned not null,
id2 mediumint unsigned not null,
primary key (id1, id2)
And then you have hundreds of millions or billions of records.