I have stored a value as varchar and as bigint in a MySQL DB:
userID_as_varchar varchar(50) DEFAULT NULL,
userID_as_bigint bigint(20) DEFAULT NULL,
+--------------------+---------------------------+
| userID_as_varchar | userID_as_bigint |
+--------------------+---------------------------+
| 917876131364446205 | 917876131364446200 |
+--------------------+---------------------------+
For any reason, I can't query the full userID_as_bigint value in full precision with SQL, but with R.
Behaviour SQL:
If I query the data or cast it it's always the "rounded" value.
Tested in phpMyAdmin and directly with sql command in shell.
Behaviour R:
If I query the field with R (RMySQL package) the value is complete 917876131364446205
Can anyone explain this behaviour or know a way how to get the full value with SQL.
Best regards.
Not quite sure what you mean, here's a test:
create table test(t1 varchar(50), t2 bigint);
Query OK, 0 rows affected (0.03 sec)
mysql> desc test
-> ;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| t1 | varchar(50) | YES | | NULL | |
| t2 | bigint(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.02 sec)
mysql> insert into test values('917876131364446205', 917876131364446205);
Query OK, 1 row affected (0.01 sec)
mysql> select * from test;
+--------------------+--------------------+
| t1 | t2 |
+--------------------+--------------------+
| 917876131364446205 | 917876131364446205 |
+--------------------+--------------------+
1 row in set (0.00 sec)
I need to reduce the size of MySQL database. I recoded some information which striped ';' and ':' from sources column (~10% char reduction). After doing so, the size of the table is exactly the same as before. How is it possible? I'm using MyISAM engine.
btw: Unfortunately, I cannot compress the tables with myisampack.
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (4 min 11.30 sec)
Records: 41917131 Duplicates: 0 Warnings: 0
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test') ORDER BY TABLE_ROWS DESC LIMIT 10;
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| test | 41.917 | 0.857 | 1.075 |
| homologs_9606 | 41.917 | 0.887 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.01 sec)
mysql> select * from homologs_9606 limit 10;
+---------+---------+-------+--------------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+--------------------------------+
| 5635338 | 1028608 | 0.000 | 10:,1 |
| 5644385 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5652325 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5641128 | 1028612 | 1.000 | 8:1,10 |
| 5636414 | 1028616 | 0.038 | 8:0.038,104;10:,1 |
| 5636557 | 1028616 | 0.000 | 8:,4 |
| 5637419 | 1028616 | 0.011 | 5:,1;8:0.011,91;10:,1 |
| 5641196 | 1028616 | 0.080 | 5:1,1;8:0.074,94;10:,1;11:,4 |
| 5642914 | 1028616 | 0.000 | 8:,3 |
| 5643778 | 1028616 | 0.056 | 8:0.057,70;10:,1 |
+---------+---------+-------+--------------------------------+
10 rows in set (4.55 sec)
mysql> select * from test limit 10;
+---------+---------+-------+-------------------------+
| protid1 | protid2 | CS | sources |
+---------+---------+-------+-------------------------+
| 5635338 | 1028608 | 0.000 | 10,1 |
| 5644385 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5652325 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5641128 | 1028612 | 1.000 | 81,10 |
| 5636414 | 1028616 | 0.038 | 80.038,10410,1 |
| 5636557 | 1028616 | 0.000 | 8,4 |
| 5637419 | 1028616 | 0.011 | 5,180.011,9110,1 |
| 5641196 | 1028616 | 0.080 | 51,180.074,9410,111,4 |
| 5642914 | 1028616 | 0.000 | 8,3 |
| 5643778 | 1028616 | 0.056 | 80.057,7010,1 |
+---------+---------+-------+-------------------------+
10 rows in set (0.00 sec)
mysql> describe test;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | YES | PRI | NULL | |
| protid2 | int(10) unsigned | YES | PRI | NULL | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql> describe homologs_9606;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | NO | PRI | 0 | |
| protid2 | int(10) unsigned | NO | PRI | 0 | |
| CS | float(4,3) | YES | | NULL | |
| sources | varchar(100) | YES | | NULL | |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
EDIT1: Added average column length.
mysql> select AVG(LENGTH(sources)) from test;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 5.2177 |
+----------------------+
1 row in set (10.04 sec)
mysql> select AVG(LENGTH(sources)) from homologs_9606;
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
| 6.8792 |
+----------------------+
1 row in set (9.95 sec)
EDIT2: I was able to strip some more MB by setting NOT NULL to all columns.
mysql> drop table test
Query OK, 0 rows affected (0.42 sec)
mysql> CREATE table test (protid1 INT UNSIGNED NOT NULL DEFAULT '0', protid2 INT UNSIGNED NOT NULL DEFAULT '0', CS FLOAT(4,3) NOT NULL DEFAULT '0', sources VARCHAR(100) NOT NULL DEFAULT '0', PRIMARY KEY (protid1, protid2), KEY `idx_protid2` (protid2)) ENGINE=MyISAM CHARSET=ascii;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606;
Query OK, 41917131 rows affected (2 min 7.84 sec)
mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test');
Records: 41917131 Duplicates: 0 Warnings: 0
+---------------+--------------+---------+----------+
| name | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| homologs_9606 | 41.917 | 0.887 | 1.075 |
| test | 41.917 | 0.842 | 1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.02 sec)
They are not exactly the same. Your query clearly shows that test is about 30 MB smaller than homologs_9606:
+---------------+--------------+---------+
| name | million rows | data GB |
+---------------+--------------+---------+
| test | 41.917 | 0.857 | <-- 0.857 < 0.887
| homologs_9606 | 41.917 | 0.887 |
+---------------+--------------+---------+
How much storage should we expect for your table? Let us check Data Type Storage Requirements:
INTEGER(10): 4 bytes
FLOAT(4): 4 bytes
VARCHAR(100): L+1
where L is the number of character bytes, which is usually one byte per character but sometimes more if you use a Unicode character set.
Your rows on average will need:
INTEGER + INTEGER + FLOAT + VARCHAR =
4 + 4 + 4 + (L + 1) = L + 13 bytes
We can infer your original average L as (0.887*1024^3 / 41917131) - 13 = 9.72. You say that you stripped 10% from sources, which means your new L is 9.72*0.9 = 8.75. That gives an expected new total storage requirement of ((8.75 + 13) * 41917131) / 1024^3 = 0.849 GB
I suspect that the difference (between 0.849 and 0.857) might be due to the fact that test have two columns set as NULLable that homologs_9606 do not have, but I do not know enough about the MyISAM engine to calculate this exactly. I can however guess! On a minimum you would need 1 bit per column per row to store a NULL state, which in your case means two bits per row or 2*41917131 = 83834262 bits = 10 479 283 bytes = 0.010 GB. The total 0.849+0.010 = 0.859 shoots slightly over the goal (about 2 MB too much). But I have made some roundings and your 10% figure is also an estimate so I am sure the rest is lost in translation.
Another reason could be if you use a Unicode character set on sources in test, in which case some characters may use more than one byte each, but since the NULLable columns seems to account for everything I do not think this is the case for your table.
Summary
Your two tables are not the same size, they differ by 30 MB.
The size of your new table is around the expected size.
You can save some more space in your new table by making protid1 and protid2 into NOT NULL columns.
The "table" is stored in a .MYD file. This file will never shrink due to UPDATEs or DELETEs. SHOW TABLE STATUS (or the equivalent query into information_schema) may show Data_length shrinking, but Data_free will increase.
You can shrink the .MYD file by doing OPTIMIZE TABLE. But that will copy the table over, thereby needing extra disk space during the process. And this action is only very rarely worth doing.
Changing to NOT NULL may not free up space if you had a lot of nulls -- "" takes 1 or 2 bytes for a VARCHAR because of the length. (And your code may need to handle '' differently than NULL.)
The space taken for each row is actually 1 byte more than previously mentioned -- this byte handles knowing whether the row exists or is the beginning of a hole.
For large text fields, I like to do this to save space. (This applies to both MyISAM and InnoDB.) Compress the text and store it into a BLOB column (instead of TEXT). For most text, that is a 3:1 shrinkage. It takes a little extra code and CPU time in the client, but it saves a lot of I/O in the server. Often the net result is "faster". I would not use it for the varchar you have; I would only do it on columns bigger than, say, 50 characters average.
Back to the original question. It sounds like there were only about 30M colons and semicolons in the entire table. Could it be that the first 10 rows are not representative?
For some reasons it seems that the rows are not being updated. Any idea why this would happen ?
UPDATE hts SET assigned='1' AND Owner='ms' WHERE hid='217477'
Query OK, 0 rows affected (0.16 sec)
Rows matched: 1 Changed: 0 Warnings: 0
select assigned, Owner from hts where hid='217477';
+----------+-------+
| assigned | Owner |
+----------+-------+
| NULL | NULL |
+----------+-------+
Show columns from hts
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| hid | varchar(25) | YES | UNI | NULL | |
| assigned | int(11) | NO | | 0 | |
| Owner | varchar(10) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
two things you can try..
first try removing the AND from the SET.. usually you do that with a comma..
UPDATE hts SET assigned=1, Owner='ms' WHERE hid='217477'
second try removing the quotes form the hid if it is an INT and not a VARCHAR
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
not sure why you are storing integers as strings.. when in doubt you should ALWAYS store data by its intended datatype.
RECOMMENDATION: change the datatypes if they are varchar to int. your update would look like this.
UPDATE hts SET assigned=1, Owner='ms' WHERE hid=217477
assigned should be integer as well as hid
I knew boolean in mysql as tinyint (1).
Today I see a table with defined an integer like tinyint(2), and also others like int(4), int(6) ...
What does the size means in field of type integer and tinyint ?
The (m) indicates the column display width; applications such as the MySQL client make use of this when showing the query results.
For example:
| v | a | b | c |
+-----+-----+-----+-----+
| 1 | 1 | 1 | 1 |
| 10 | 10 | 10 | 10 |
| 100 | 100 | 100 | 100 |
Here a, b and c are using TINYINT(1), TINYINT(2) and TINYINT(3) respectively. As you can see, it pads the values on the left side using the display width.
It's important to note that it does not affect the accepted range of values for that particular type, i.e. TINYINT(1) still accepts [-128 .. 127].
It means display width
Whether you use tinyint(1) or tinyint(2), it does not make any difference.
I always use tinyint(1) and int(11), I used several mysql clients (navicat, sequel pro).
It does not mean anything AT ALL! I ran a test, all above clients or even the command-line client seems to ignore this.
But, display width is most important if you are using ZEROFILL option, for example your table has following 2 columns:
A tinyint(2) zerofill
B tinyint(4) zerofill
both columns has the value of 1, output for column A would be 01 and 0001 for B, as seen in screenshot below :)
mysql> CREATE TABLE tin3(id int PRIMARY KEY,val TINYINT(10) ZEROFILL);
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO tin3 VALUES(1,12),(2,7),(4,101);
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM tin3;
+----+------------+
| id | val |
+----+------------+
| 1 | 0000000012 |
| 2 | 0000000007 |
| 4 | 0000000101 |
+----+------------+
3 rows in set (0.00 sec)
mysql>
mysql> SELECT LENGTH(val) FROM tin3 WHERE id=2;
+-------------+
| LENGTH(val) |
+-------------+
| 10 |
+-------------+
1 row in set (0.01 sec)
mysql> SELECT val+1 FROM tin3 WHERE id=2;
+-------+
| val+1 |
+-------+
| 8 |
+-------+
1 row in set (0.00 sec)
About the INT, TINYINT... These are different data types, INT is 4-byte number, TINYINT is 1-byte number. More information here - INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT.
The syntax of TINYINT data type is TINYINT(M), where M indicates the maximum display width (used only if your MySQL client supports it).
Numeric Type Attributes.
Is there any way to convert the warning that MySQL is issuing about an invalid datetime into a hard error? I've tried using SET sql_mode='TRADITIONAL'; which apparently is supposed to turn (some) things that are warnings into errors, but it does not have any effect here. This is MySQL 5.1.56. Something that works on a session-level would be ideal, but I'll take what I can get.
mysql> describe test_table2;
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| value | int(11) | YES | | NULL | |
| name | varchar(16) | YES | | NULL | |
| sometime | datetime | YES | | NULL | |
+----------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> select * from test_table2;
+-------+-------+---------------------+
| value | name | sometime |
+-------+-------+---------------------+
| 1 | one | 2002-09-01 10:00:00 |
| 2 | two | 2002-09-02 11:00:00 |
| 3 | three | 2002-09-03 12:00:00 |
| 4 | four | 2002-01-04 13:00:00 |
| 5 | five | 2002-01-05 14:00:00 |
+-------+-------+---------------------+
5 rows in set (0.00 sec)
mysql> select * from test_table2 where sometime = 'foo';
Empty set, 2 warnings (0.00 sec)
Warning (Code 1292): Incorrect datetime value: 'foo' for column 'sometime' at row 1
Warning (Code 1292): Incorrect datetime value: 'foo' for column 'sometime' at row 1
With SET sql_mode='TRADITIONAL', doing an INSERT with an invalid date causes an error, but doing a SELECT with an invalid date still causes a warning. You can trigger the error by passing the (possibly invalid) date value to this query first:
CREATE TEMPORARY TABLE IF NOT EXISTS date_guard (date DATE) SELECT 'foo' AS date;
where 'foo' is the date value you want to validate.
Who is supposed to see the error?
If this is a fixed string 'foo' just try converting 'foo' to a date and see if you can a valid result (i.e. not 00-00-000). Do a pre-query to check the validity of the date, and then continue after.
I have not been able to make MySQL give an error in this case (or even convert the invalid date to a NULL - it insists on making it 00-00-0000).