Why Are These Tables the Same Size? - mysql

I was trying to measure the difference between TINYINT and INT when I came across something interesting. For tables with small numbers of columns, the choice of data type does not seem to affect the size of the table.
Server version: 5.1.41-3ubuntu12.10 (Ubuntu)
Example:
mysql> describe tinyint_test;
+----------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| test_int | tinyint(4) | YES | | NULL | |
+----------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> describe tinyint_id_test;
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id | tinyint(4) | YES | | NULL | |
+-------+------------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> describe int_test;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| not_id | int(11) | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> select * from tinyint_test;
+------+----------+
| id | test_int |
+------+----------+
| 1 | 1 |
| 2 | 2 |
| 3 | 127 |
| 10 | 50 |
+------+----------+
4 rows in set (0.00 sec)
mysql> select * from tinyint_id_test;
+------+
| id |
+------+
| 1 |
| 2 |
| 127 |
| 50 |
+------+
4 rows in set (0.00 sec)
mysql> select * from int_test;
+--------+
| not_id |
+--------+
| 1 |
| 2 |
| 127 |
| 50 |
+--------+
4 rows in set (0.00 sec)
mysql> SELECT TABLE_NAME, DATA_LENGTH FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA like '%test%';
+-----------------+-------------+
| TABLE_NAME | DATA_LENGTH |
+-----------------+-------------+
| int_test | 28 |
| tinyint_id_test | 28 |
| tinyint_test | 28 |
+-----------------+-------------+
3 rows in set (0.00 sec)
I vaguely suspect that there might be an internal column in each row, or that the minimum data size for a given row must be at least the size of a full INT, but neither of these suspicions really account for what's happening here. What could be the case is my choice of DATA_LENGTH is the incorrect tool for measuring the true size of the tables, in which case an acceptable answer would point me in the right direction for actually measuring these tables.
EDIT:
I can generate a table of a different size by using two INTs:
mysql> describe int_id_test;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| test_int | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
2 rows in set (0.01 sec)
mysql> select * from int_id_test;
+------+----------+
| id | test_int |
+------+----------+
| 1 | 1 |
| 2 | 2 |
| 3 | 127 |
| 10 | 50 |
+------+----------+
4 rows in set (0.00 sec)
mysql> SELECT TABLE_NAME, DATA_LENGTH FROM INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA like '%test%';
+-----------------+-------------+
| TABLE_NAME | DATA_LENGTH |
+-----------------+-------------+
| int_id_test | 36 |
| int_test | 28 |
| tinyint_id_test | 28 |
| tinyint_test | 28 |
+-----------------+-------------+
4 rows in set (0.01 sec)

the data_length column is how much hard drive space the operating system allocates
for a table.
mysql database page sizes configurable default is 16KB, the three table's data may used same pages, so the data_length are same!!
edit:
innodb engine default page size is 16KB, i don't know this size for other engines

I have found a work around for this problem as well as something of an explanation.
After looking at the table structure in a hex editor (on my linux machines these were located in /var/lib/mysql/[DATABASE NAME]/[TABLE NAME].MYD), I found that in all cases the records were created using a minimum of 7 bytes for a row, regardless of the actual data types involved. Any extra bytes that were not used by the table were zeroed out.
Here is an example with a smaller data set to illustrate:
mysql> describe int_test_2;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
+-------+---------+------+-----+---------+-------+
1 row in set (0.00 sec)
mysql> select * from int_test_2;
+------+
| id |
+------+
| 1 |
| 2 |
+------+
2 rows in set (0.00 sec)
Looking at this guy in a hex editor, we see:
fd01 0000 0000 00fd 0200 0000 0000
Using information from Neo's link, I was able to decode this row:
fd Record header bits.
01000000 Integer value "1" (little endian)
0000 Wasted Space!
fd Record header bits.
02000000 Integer value "2" (little endian)
0000 Wasted Space!
However, notice the following:
mysql> alter table int_test_2 MAX_ROWS=50000000, AVG_ROW_LENGTH=4;
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
Now, the MYD file looks like this:
fd01 0000 00fd 0200 0000
That is, it uses the correct sizes.

One thing to note is that the number in brackets does not effect the size of that column, i.e an INT(4) is the same size as an INT(11) in terms of storage, all the number in brackets does is pad the returned value with spaces so that it fills 11 or 4 characters.
I suspect if you trully want to work out the size of the tables, you will need to look in the MySQL file itself and see how they are stored. All the data is stored in /var/lib/mysql/ - ibdata & ib_logfile are the main files. Open this in a text editor (Caution - this file may be HUGE depending on the sizes of your databases.. also DO NOT modify this file!!)
All the tables and cells are stored in here, however they are not delimeted, so its very difficult to see where one column ends and the next begins - it is all based on the data size which you are trying to establish. If you know the data in the table you should be able to work out the structure.
Edit: I think some of the data in these files may be stored in hex, so if it doesnt immediately make sense, try a hex editor.

Related

How do I add Auto-Increment ID Column to a table generated from SQL Query? [duplicate]

This question already has an answer here:
MySql - using dynamic table names in one query
(1 answer)
Closed 1 year ago.
I have a table generated from SQL Query itself. Now I need to add an auto incremental id column into this table.
Usual syntax to add auto incremental id column is-
ALTER TABLE *Table_Name* ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
But I don't have a specific table name, the table is generated from a query.
You can do it with a query like this:
## Your Query to generate the Tablename
SELECT "myTable" INTO #mytab;
EXECUTE IMMEDIATE CONCAT("ALTER TABLE ",#mytab," ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST");
sample
MariaDB [bernd]> desc myTable;
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| insertat | date | YES | | NULL | |
| myval | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
2 rows in set (0.02 sec)
MariaDB [bernd]> select * from myTable;
+------------+-------+
| insertat | myval |
+------------+-------+
| 2021-01-01 | 44 |
| 2021-01-02 | 99 |
| 2021-01-02 | 134 |
| 2021-01-03 | 45 |
| 2021-01-04 | 2 |
| 2021-01-04 | 17 |
+------------+-------+
6 rows in set (0.06 sec)
MariaDB [bernd]> ## Your Query to generate the Tablename
MariaDB [bernd]> SELECT "myTable" INTO #mytab;
Query OK, 1 row affected (0.01 sec)
MariaDB [bernd]>
MariaDB [bernd]> EXECUTE IMMEDIATE CONCAT("ALTER TABLE ",#mytab," ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST");
Query OK, 0 rows affected (0.14 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [bernd]> desc myTable;
+----------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| insertat | date | YES | | NULL | |
| myval | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
MariaDB [bernd]> select * from myTable;
+----+------------+-------+
| id | insertat | myval |
+----+------------+-------+
| 1 | 2021-01-01 | 44 |
| 2 | 2021-01-02 | 99 |
| 3 | 2021-01-02 | 134 |
| 4 | 2021-01-03 | 45 |
| 5 | 2021-01-04 | 2 |
| 6 | 2021-01-04 | 17 |
+----+------------+-------+
6 rows in set (0.00 sec)
MariaDB [bernd]>

Mysql replace matching and replacing invalid characters

I have a weird hyphen in one of the mysql tables created by Racktables. I am trying to replace it with a normal one, but I seem to be missing something:
mysql> update Port set reservation_comment = replace(reservation_comment,'–','-');
Query OK, 0 rows affected (0.01 sec)
Rows matched: 2358 Changed: 0 Warnings: 0
As you can see the "bad" hyphen gets matched but not replaced.
I have tried changing single quotes to double quotes and escaping the hyphens, but there is no real change.
Here goes some sample data:
| 2690 | 767 | R131226-005-23Ha | 1 | 24 | NULL | C130527-059 | |
| 2691 | 768 | R131226-005-24Ha | 1 | 24 | NULL | C130527-036 | |
| 2692 | 770 | R131226-006-01Ha | 1 | 24 | NULL | C140305�001 | |
| 2693 | 773 | R131226-006-04Ha | 1 | 24 | NULL | C140305�004 | |
| 2694 | 784 | R131226-006-15Ha | 1 | 24 | NULL | C140305�015 | |
| 2695 | 785 | R131226-006-16Ha | 1 | 24 | NULL | C140305�016 | |
| 2696 | 793 | R131226-006-24Ha | 1 | 24 | NULL | C140305�024 | |
| 2697 | 771 | R131226-006-02Ha | 1 | 24 | NULL | C140305�002 | |
| 2698 | 772 | R131226-006-03Ha | 1 | 24 | NULL | C140305-003
The hyphen I am trying to replace is encoded differently it seems
EDIT:
SO, all syntax below is good. The problem is that I cannot match the bad hyphen. If I try to replace the "good" one all is fine:
mysql> UPDATE Port SET reservation_comment=REPLACE(reservation_comment,'-','[[good_hyphen]]');
Query OK, 367 rows affected (0.34 sec)
Rows matched: 2358 Changed: 367 Warnings: 0
but if I try to replace the bad one:
mysql> UPDATE Port SET reservation_comment=REPLACE(reservation_comment,'–','[[bad_hyphen]]');
Query OK, 0 rows affected (0.01 sec)
Rows matched: 2358 Changed: 0 Warnings: 0
I will look into the encoding of that character and into different ways to match it.
That is matching all rows, regardless of the hyphen as there is no WHERE clause. It is just not changing rows where there is no bad hyphen.
It may be that the bad hyphen is getting stored as an unknown character, so will not be targeted by your REPLACE. You may have to change the charset/collation on your column to get it to import correctly, before changing.. or adjust the way that Racktables outputs the data.
Ok, I solved it by doing something similar to this post once I identified the character:
mysql> select hex("–");
+------------+
| hex("–") |
+------------+
| E28093 |
+------------+
1 row in set (0.00 sec)
Then I verified that matched:
mysql> select x'E28093';
+-----------+
| x'E28093' |
+-----------+
| – |
+-----------+
1 row in set (0.00 sec)
And finally the substitution:
mysql> UPDATE Port SET reservation_comment=REPLACE(reservation_comment,x'E28093','-');
Query OK, 33 rows affected (1.55 sec)
Rows matched: 2358 Changed: 33 Warnings: 0

ALTER TABLE increase INDEX LENGTH

I have a mysql table named t_media_items. I have index on 3 cols (parent_id, type, weight). The index size is 2.52MB.
mysql> show indexes from t_media_items;
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_media_items | 0 | PRIMARY | 1 | id | A | 113779 | NULL | NULL | | BTREE | | |
| t_media_items | 1 | idx_ptw | 1 | parent_id | A | 16254 | NULL | NULL | | BTREE | | |
| t_media_items | 1 | idx_ptw | 2 | type | A | 16254 | NULL | NULL | | BTREE | | |
| t_media_items | 1 | idx_ptw | 3 | weight | A | 113779 | NULL | NULL | | BTREE | | |
+---------------+------------+--------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.01 sec)
mysql> SELECT table_name AS "Tables", round(((index_length) / 1024 / 1024), 2) SIB
FROM information_schema.TABLES
WHERE table_schema = "XXXX" and table_name='t_media_items'
ORDER BY (index_length ) DESC;
+---------------+------+
| Tables | SIB |
+---------------+------+
| t_media_items | 2.52 |
+---------------+------+
1 row in set (0.00 sec)
I tried to alter length on another column, name "rand_key". The strange issue is after columned alter, the INDEX size suddenly increase to 5.52MB, even the "rand_key" is not part of the index.
mysql> ALTER TABLE `t_media_items` CHANGE `rand_key` `rand_key` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL;
Query OK, 108503 rows affected (7.24 sec)
Records: 108503 Duplicates: 0 Warnings: 0
Here is INDEX_LENGTH after ALTER
mysql> SELECT table_name AS "Tables", round(((index_length) / 1024 / 1024), 2) SIB
FROM information_schema.TABLES
WHERE table_schema = "tallcat" and table_name='t_media_items'
ORDER BY (index_length ) DESC;
+---------------+------+
| Tables | SIB |
+---------------+------+
| t_media_items | 5.52 |
+---------------+------+
1 row in set (0.00 sec)
Could anyone help me explain the issue? Thank you
ALTER TABLE performs a table restructure, which is make an empty clone of the table, apply the alter, and copy data to it. Indexes are rebuild as a by-product of this process.
But as it builds the index incrementally while filling the table with data, it doesn't take advantage of fast index creation. The indexes are stored less compactly.
I can't tell if this explains the more than 2x increase in size, but it's possible.
I'm assuming that you're using MySQL 5.5 or later, or 5.1 with the InnoDB plugin. Earlier versions do not support fast index creation anyway.

str_to_date returns null in query but fine on its own

I'm trying to sort some results by time. I've gathered str_to_date is the way to go, but I appear to be using it wrong, and I can't tell for sure, but I think it's converting to NULL and then not sorting in a meaningful way:
mysql> SELECT member_id, result_result, str_to_date('result_result','%i:%s.%f') FROM results WHERE workout_id = '2' ORDER BY str_to_date('result_result','%i:%s.%f') LIMIT 5;
+-----------+---------------+-----------------------------------------+
| member_id | result_result | str_to_date('result_result','%i:%s.%f') |
+-----------+---------------+-----------------------------------------+
| 0 | 1:35.0 | NULL |
| 1 | 1:35.0 | NULL |
| 3 | 1:40 | NULL |
| 4 | 1:37.8 | NULL |
| 7 | 1:27.3 | NULL |
+-----------+---------------+-----------------------------------------+
5 rows in set, 5 warnings (0.00 sec)
but the two result types seem to convert fine if I do it manually:
mysql> select str_to_date('1:40','%i:%s.%f');
+--------------------------------+
| str_to_date('1:40','%i:%s.%f') |
+--------------------------------+
| 00:01:40 |
+--------------------------------+
1 row in set (0.00 sec)
mysql> select str_to_date('1:35.0','%i:%s.%f');
+----------------------------------+
| str_to_date('1:35.0','%i:%s.%f') |
+----------------------------------+
| 00:01:35 |
+----------------------------------+
1 row in set (0.00 sec)
Any ideas what's happening / how to fix it?
Thanks!
You don't need the quotes inside the function. Try
str_to_date( result_result, '%i:%s.%f' )

Convert lat/lng pairs using GeomFromText('POINT(1 1)') and insert in another column

In my previous question Search for range Latitude/Longitude coordinates My solution was to create the table below.
mysql> select * from spatial_table where MBRContains(GeomFromText('LINESTRING(9 9, 11 11)'), my_spots);
+------+---------------------------------+
| id | my_spots | my_polygons |
+------+-------------+-------------------+
| 1 | $# $# $# $# |
+------+-------------+-------------------+
Now I need to convert and move my existing lat/lng pairs in the table below to spatial_table. How would I structure my query to acheive this? I am currently using the queries below to insert.
mysql> insert into spatial_table values (1, GeomFromText('POINT(1 1)'), GeomFromText('POLYGON((1 1, 2 2, 0 2, 1 1))'));
Query OK, 1 row affected (0.00 sec)
mysql> insert into spatial_table values (1, GeomFromText('POINT(10 10)'), GeomFromText('POLYGON((10 10, 20 20, 0 20, 10 10))') );
Query OK, 1 row affected (0.00 sec)
Existing table:
+-------------+---------+--------+-----------+----- ------+-------------+--------------+
| location_id | country | region | city | latitude | longitude | name |
+=============|=========|========|===========|============|=============|==============|
| 316625 | US | CA | Santa Cruz| 37.044799 | -122.102096 | Rio Theatre |
+-------------+---------+--------+-----------+------------+-------------+--------------+
Here is the secret recipe to success :)
My original table:
mysql> describe gls;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| location_id | int(255) | NO | PRI | 0 | |
| country | varchar(255) | NO | | | |
| region | varchar(255) | NO | | | |
| city | varchar(255) | NO | | | |
| latitude | float(13,10) | NO | | | |
| longitude | float(13,10) | NO | | | |
+-------------+--------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
step 1: Add new POINT column
mysql> alter table gls add my_point point;
Query OK, 247748 rows affected (4.77 sec)
Records: 247748 Duplicates: 0 Warnings: 0
Step 2: Update my_point with values from lat/lng fields.
UPDATE gls SET my_point = PointFromText(CONCAT('POINT(',gls.longitude,' ',gls.latitude,')'));
Step 3: check
mysql> select aswkt(my_point) from gls where city='Santa Cruz';
+--------------------------------------+
| aswkt(my_point) |
+--------------------------------------+
| POINT(-122.1020965576 37.0447998047) |
| POINT(-66.25 -12.2833003998) |
| POINT(-2.3499999046 42.6666984558) |
+--------------------------------------+
Let's say you have a table like this:
mysql> select * from spatial_table;
+------+---------------------------+-----------------------------------------------------------------------------------+---------+--------+
| id | my_spots | my_polygons | lon | lat |
+------+---------------------------+-----------------------------------------------------------------------------------+---------+--------+
| 1 | ?? ?? | ?? ?? # # # ?? ?? | -122.11 | -37.11 |
| 1 | $# $# | $# $# 4# 4# 4# $# $# | -122.11 | -37.11 |
+------+---------------------------+-----------------------------------------------------------------------------------+---------+--------+
2 rows in set (0.00 sec)
If you want to make a geometry column with the lon lat values (as points only, syntax is a little different for other kinds of geometries), you can do this:
mysql> alter table spatial_table add column (go_slugs geometry);
This is a geometry type, if it is all single locations you could make the column type point. Then just update the new column:
mysql> update spatial_table set go_slugs = point(lon, lat);
Use the aswkt function to get human readable data to confirm this is correct:
mysql> select aswkt(go_slugs) from spatial_table;
+-----------------------------------------------+
| aswkt(go_slugs) |
+-----------------------------------------------+
| POINT(-122.11000061035156 -37.11000061035156) |
| POINT(-122.11000061035156 -37.11000061035156) |
| POINT(-123.4000015258789 37.79999923706055) |
+-----------------------------------------------+
3 rows in set (0.00 sec)