In MySQL 5.7, a table defined as following shown
CREATE TABLE `person` (
`person_id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(64) DEFAULT NULL,
PRIMARY KEY (`person_id`),
KEY `ix_name` (`name`)
) ENGINE=InnoDB CHARSET=utf8
And then we prepared two records for testing, the value of name field (with varchar type) are
123456789123456789
1
respectively.
Case 1
select * from person where name = 123456789123456789-1;
Note that we are using a number instead of string inside the where clause. The record with name 123456789123456789 returned, and it seemed that -1 in the end are ignored!
Furthermore, we add another record with name = 123456789123456788, and this time the above select returns two records, including both 123456789123456789 and 123456789123456788;
The output looks so strange!
Case 2
select * from person where name = 123456789123456789-123456789123456788;
We could get the record with name 1, and in this case it seems that the - act as a minus operator.
Why the behavior of - in two cases are so different!
I can't immediately tell you what the type of 123456789123456789-1 is but for the comparison operation, we're almost certainly falling through most of the more "normal" data type conversion rules for mysql and ending up at:
In all other cases, the arguments are compared as floating-point (real) numbers.
Because one of the argument for the comparison (name) is a string type and the other is numeric, nothing else matches. So both get converted to floats and float types don't have too many digits of precision. Certainly less than the 18 required to represent 123456789123456789 and 123456789123456788 as two different numbers.
Look here:
SELECT person_id, name, name + 0.0, 123456789123456789-1 + 0.0, name = 123456789123456789-1
FROM person
ORDER BY person_id;
Perhaps, before comparing name = 123456789123456789-1 MySQL converts name and 123456789123456789-1 to DOUBLE as I showed in select. So some digits are lost.
Demo.
Related
Here is a data fields definition:
Field Name
Field Description
Field Type (format)
Max Size
May be NULL
Key
tag
The unique identifier (name) for a tag in a specific taxonomy release.
ALPHANUMERIC
256
No
*
version
For a standard tag, an identifier for the taxonomy; otherwise the accession number where the tag was defined.
ALPHANUMERIC
20
No
*
ddate
The end date for the data value, rounded to the nearest month end.
DATE (yyyymmdd)
8
No
*
qtrs
The count of the number of quarters represented by the data value, rounded to the nearest whole number. “0” indicates it is a point-in-time value.
NUMERIC
8
No
*
uom
The unit of measure for the value.
ALPHANUMERIC
20
No
*
coreg
If specified, indicates a specific co-registrant, the parent company, or other entity (e.g., guarantor). NULL indicates the consolidated entity.
NUMERIC
256
Yes
*
value
The value. This is not scaled, it is as found in the Interactive Data file, but is limited to four digits to the right of the decimal point.
NUMERIC(28,4)
16
Yes
footnote
The text of any superscripted footnotes on the value, as shown on the statement page, truncated to 512 characters, or if there is no footnote, then this field will be blank.
ALPHANUMERIC
512
Yes
The field definition is SEC U.S. Securities and Exchange Commission's official material:
sec official material
For coreg ,it's field type is numeric ,max size 256 ,how to write the create statement?
CREATE TABLE `num` (
`id` INT NOT NULL AUTO_INCREMENT,
`tag` VARCHAR(256) NOT NULL,
`version` VARCHAR(20) NOT NULL,
`ddate` DATE NOT NULL,
`qtrs` DECIMAL(8) NOT NULL,
`uom` VARCHAR(20) NOT NULL,
`coreg` ?,
`value` DECIMAL(28,4),
`footnote` VARCHAR(512),
PRIMARY KEY (id)
);
To write the field definiton as below?
`coreg` NUMERIC(256)
In MySQL the maximum number of digits for decimal (numeric) type is 65.
So, you can't technically define a column as NUMERIC(256).
11.1.3 Fixed-Point Types (Exact Value) - DECIMAL, NUMERIC
The maximum number of digits for DECIMAL is 65
It doesn't really make sense to me to have the "the parent company, or other entity (e.g., guarantor)" defined as a number, even as a really long number.
Maybe there is a typo and really it should read "ALPHANUMERIC", i.e. a text value.
If this value will never be interpreted as a number and there will never be attempts to make some calculations with this number (as the field description implies), then it should be stored as a text (varchar(256)); maybe with some extra checks that you can store only digits 0-9 and not any symbol there.
It probably means it's just a long sequence of digits. You would typically store it as a NUMERIC but a size of 256 digits is beyond MySQL's limit for numeric types. You can store it, however, as a VARCHAR(256) and add a CHECK constraint on it.
Note: CHECK constraints are enforced only in MySQL 8.0 (8.0.3?) and newer.
For example:
create table t (
coreg varchar(256) check (coreg regexp '^[0-9]+$')
);
insert into t (coreg) values ('123');
insert into t (coreg) values ('x456'); -- fails
insert into t (coreg) values ('7y89'); -- fails
insert into t (coreg) values ('012z'); -- fails
insert into t (coreg) values ('345 '); -- fails
See running example in db<>fiddle.
I've got the following table in MySQL (MySQL Server 5.7):
CREATE TABLE IF NOT EXISTS SIMCards (
SIMCardID INTEGER UNSIGNED PRIMARY KEY AUTO_INCREMENT,
ICCID VARCHAR(50) UNIQUE NOT NULL,
MSISDN BIGINT UNSIGNED UNIQUE);
INSERT INTO SIMCards (ICCID, MSISDN) VALUES
(89441000154687982548, 905511528749),
(89441000154687982549, 905511528744),
(89441000154687982547, 905511528745);
I then run the following query:
SELECT SIMCardID FROM SIMCards WHERE ICCID = 89441000154687982549;
However, rather than returning just the relevant row, it returns all of them. If I surround the ICCID in quotes, it works fine, e.g.:
SELECT SIMCardID FROM SIMCards WHERE ICCID = '89441000154687982549';
Why does the first SELECT query not work as I expected?
An integer in MySQL has a maximum value (unsigned) of 4294967295. Your IDs are substantially larger than that number. As a result, if you select * from your database by integer, your behavior is going to be undefined because the number you are selecting by cannot be represented by an integer.
I'm not sure exactly why you are getting the results that you are getting, but I do know that trying to select by an integer when your data can't be represented by an integer will definitely not work.
Edit to add detail I forgot: even a bigint in MySQL is not large enough to represent your IDs. So you need to make sure and just always use strings.
I had a table with 3 columns and 3600K rows. Using MySQL as a key-value store.
The first column id was VARCHAR(8) and set to primary key.The 2nd and 3rd columns were MEDIUMTEXT. When calling SELECT * FROM table WHERE id=00000 MySQL took like 54 sec ~ 3 minutes.
For testing I created a table containing VARCHAR(8)-VARCHAR(5)-VARCHAR(5) where data casually generated from numpy.random.randint. SELECT takes 3 sec without primary key. Same random data with VARCHAR(8)-MEDIUMTEXT-MEDIUMTEXT, the time cost by SELECT was 15 sec without primary key.(note: in second test, 2nd and 3rd column actually contained very short text like '65535', but created as MEDIUMTEXT)
My question is: how can I achieve similar performance on my real data? (or, is it impossible?)
If you use
SELECT * FROM `table` WHERE id=00000
instead of
SELECT * FROM `table` WHERE id='00000'
you are looking for all strings that are equal to an integer 0, so MySQL will have to check all rows, because '0', '0000' and even ' 0' will all be casted to integer 0. So your primary key on id will not help and you will end up with a slow full table. Even if you don't store values that way, MySQL doesn't know that.
The best option is, as all comments and answers pointed out, to change the datatype to int:
alter table `table` modify id int;
This will only work if your ids casted as integer are unique (so you don't have e.g. '0' and '00' in your table).
If you have any foreign keys that references id, you have to drop them first and, before recreating them, change the datatype in the other columns too.
If you have a known format you are storing your values (e.g. no zeros, or filled with 0s up to the length of 8), the second best option is to use this exact format to do your query, and include the ' to not cast it to integer. If you e.g. always fill 0 to 8 digits, use
SELECT * FROM `table` WHERE id='00000000';
If you never add any zeros, still add the ':
SELECT * FROM `table` WHERE id='0';
With both options, MySQL can use your primary key and you will get your result in milliseconds.
If your id column contains only numbers so define it as int , because int will give you better performance ( it is more faster)
Make the column in your table (the one defined as key) integer and retry. Check first performance by running a test within your DB (workbench or simple command line). You should get a better result.
Then, and only if needed (I doubt it though), modify your python to convert from integer to string (and/or vise-versa) when referencing the key column.
I had a very complicated problem, but i narrowed it down to this, First, let me give you some test data:
Run this:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`value` text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
INSERT INTO test (value) VALUES
(1),
('1'),
('1,2'),
('3');
Now run this query:
SELECT * FROM test WHERE value = 1;
I would expect in this case to get only the first two rows, where the value is either entered as a numeric 1 or a '1' char, but for some reason this is what i get:
1, 1
2, 1
3, 1,2
My question is, why do i get the third row?
Note: This is my version of mysql: 5.6.28-0ubuntu0.14.04.1
Also, I already solved my original problem by using FIND_IN_SET and I am aware that it's not a very good idea to have this comma separated list type structure, ie, it should probably have been done with a join table in the first place. Unfortunately I'm working within a system that is very large and making that change is not practical at this time.
I'm just interested in why this specific behavior happens.
The reason you get the third row is implicit datatype conversion performed by MySQL. Your query has a predicate (condition) in the WHERE clause
WHERE value = 1
On the right side of the equality comparison operator (the equal sign), we have a numeric literal. On the left side, we have a column that is datatype TEXT.
It's not possible for MySQL to do a comparison of those two different datatypes.
So, MySQL converts one side or the other to a type that is compatible, so a comparison can be performed. In this case, MySQL is converting the value from the column to be numeric, so it compare to the numeric literal.
As a demonstration of what that looks like, we can add a zero (forcing MySQL to do a conversion), and exhibit the results in a SELECT.
SELECT t.value, t.value + 0 FROM test t
t.value t.value + 0
------- -----------
1 1
1 1
1,2 1
3 3
It's documented in the MySQL Reference Manual somewhere, how MySQL does the conversion. At a risk of misstating what the manual says: MySQL reads the string character by character from left to right, until it encounters a character where it can no longer convert to numeric.
In the case if the string '1,2', that happens to be the comma character. That's where MySQL stops. So the conversion returns a numeric value of 1. You would be right to point out that other databases would throw an error attempting to do a conversion of that string to numeric. But MySQL doesn't throw an error or warning.
Reference: Type Conversion in Expression Evaluation http://dev.mysql.com/doc/refman/5.7/en/type-conversion.html
Basically, the predicate in your query is equivalent to specifying:
WHERE value + 0 = 1
Which forces a conversion of the contents of the column value to numeric, and then a comparison to the numeric literal.
That's why the third row is being returned.
To get a different result, consider comparing to a string literal
WHERE value = '1'
Lets say I use coalesce() to combine two columns into one in select and subsequently a view constructed around such select.
Tables:
values_int
id INTEGER(11) PRIMARY KEY
value INTEGER(11)
values_varchar
id INTEGER(11) PRIMARY KEY
value VARCHAR(255)
vals
id INTEGER(11) PRIMARY KEY
value INTEGER(11) //foreign key to both values_int and values_varchar
The primary keys between values_int and values_varchar are unique and that allows me to do:
SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
FROM vals
JOIN values_int ON values_int.id = vals.value
JOIN values_varchar ON values_varchar.id = vals.value
This produces nice assembled view with ID column and combined value column that contains actual values from two other tables combined into single column.
What type does this combined column have?
When turned into view and then queried with a WHERE clause using this combined "value" column, how is that actually handled type-wise? I.e. WHERE value > 10
Som rambling thoughts in the need (most likely wrong):
The reason I am asking this is that the alternative to this design have all three tables merged into one with INT values in one column and VARCHAR in the other. That would of course produce a lots of NULL values in both columns but saved me the JOINs. For some reason I do not like that solution because it would require additional type checking to choose the right column and deal with the NULL values but maybe this presented design would require the same too (if the resulting column is actually VARCHAR). I would hope that it actually passes the WHERE clause down the view to the source (so that the column does NOT have a type per se) but I am likely wrong about that.
You query should be explicit to be clear, In this case mysql is using varchar.
I would write this query like this to be clear
coalesce(values_int.value,cast(values_varchar.value as integer), 0)
or
coalesce(cast(values_int.value as varchar(20)),values_varchar.value,'0')
you should put in that last value unless you want the column to be null if both columns are null.
Returns the data type of expression with the highest data type precedence. If all expressions are nonnullable, the result is typed as nonnullable.
So in your case the type will be VARCHAR(255)
Lets say I use coalesce() to combine two columns into one
NO, that's not the use of COALESCE function. It's used for choosing a provided default value if the column value is null. So in your case, if values_int.value IS NULL then it will select the value in values_varchar.value
coalesce(values_int.value, values_varchar.value) AS value
If you want to combine the data then use concatenation operator (OR) CONCAT() function rather like
concat(values_int.value, values_varchar.value) AS value
Verify it yourself. An easy way to check in MySQL is to DESCRIBE a VIEW you create to capture your dynamic column:
mysql> CREATE VIEW v AS
-> SELECT vals.id, coalesce(values_int.value, values_varchar.value) AS value
-> FROM vals
-> JOIN values_int ON values_int.id = vals.value
-> JOIN values_varchar ON values_varchar.id = vals.value;
Query OK, 0 rows affected (0.01 sec)
Now DESCRIBE v will show you what's what. Note that under MySQL 5.1, I see the column as varbinary(255), but under 5.5 I see varchar(255).