Mysql TINYINT & VARCHAR performance - mysql

I have a column build in mysql db, it store the value - 10000 in TINYINT, what if I change it to 10k VARCHAR which one will be better performance?
ex. 10000, 20000, 30000.... or 10k, 20k, 30k...

If you are just selecting the value then it doesn't matter.
But if you are using it in a where condition then the performance will be better using an int. Example:
select * from your_table
where your_column > 1000
will only work if the column is an int and you don't need to convert it back to a number.
Generally - if it is a number - store it as number.

The bigger question is -- would it help your project? With TINYINT you can perform math operations on it. You can't do that with VARCHAR, since your numbers would be a string. Sure you could manipulate the string into a number, but that'll cost you in performance as well as the need to make extra code (which can unnecessarily complicate the SQL or other language).
If you're going to make a value be a number -- then by all means make it an integer. There's a reason really really smart people made a difference between string's & int's.

Related

Is it better to define a "year" column to be of type Integer or String?

I do realize that it is better for a column to be an Integer if one has to perform mathematical calculations on it.
I probably have to perform mathematical calculations on the "year" column but minimally. So would it be better to store it as a String or Integer?
Thanks.
Save it as an integer.
While there may be some application where you are reading and serving this data so frequently that the int->string conversion is a problem... that is going to be an edge case.
On the other side
Integers provide smaller options than strings in data storage (such as TINYINT)
You avoid conversions due to math
It's going to confuse/annoy/frustrate all the developers that come after you when they query a data type that is naturally a number and get a string.
If you are not expecting your YEAR variable to ever contain non-digit values then yes you should store it as a number.
I would not store it as INT since I doubt year will reach the limit that INT has to offer. I would save it as SMALLINT or even TINYINT either should be unsigned.
SMALLINT UNSIGNED gives you max value of 65535, unless you are storing years that exceed the year 65535 this should suffice.
You could go crazy and save it as a YEAR!
This limits you to 1901-2155.
If this is too restrictive, I prefer a CHAR(4) to an INT; MySQL DATETIME comparisons are done in a string like manner..
You can do things like
WHERE year < CURDATE()
without worries then.

How much does performance change with a VARCHAR or INT column - MySQL

I have many tables, with millions of lines, with MySQL. Those tables are used to store log lines.
I have a field "country" in VARCHAR(50). There is an index on this column.
Would it change the performances a lot to store a countryId in INT instead of this country field ?
Thank you !
Your question is a bit more complicated than it first seems. The simple answer is that Country is a string up to 50 characters. Replacing it by a 4-byte integer should reduce the storage space required for the field. Less storage means less I/O overhead in processing the query and smaller indexes. There are outlier cases of course. If country typically has a NULL value, then the current storage might be more efficient than having an id.
It gets a little more complicated, though, when you think about keeping the field up-to-date. One difference with a reference table is that the countries are now standardized, rather than being ad-hoc names. In general, this is a good thing. On the other hand, countries do change over time, so you have to be prepared to add a "South Sudan" or "East Timor" now and then.
If your database is heavy on inserts/updates, then changing the country field requires looking in the reference table for the correct value -- and perhaps inserting a new record there.
My opinion is "gosh . . . it would have been a good idea to set the database up this way in the beginning". At this point, you need to understand the effects on the application of maintaining a country reference table for the small performance gain of making the data structure more efficient and more accurate.
Indexes on INT values shows better performance than Indexes applied on string data types (VARCHAR).
because searching/matching an integer is always faster than a string and search algorithm implemented underneath of indexing works on same principle.
In your case you will get better performance with index on INT type than VARCHAR.

Fast search solution for numeric type of large mysql table?

I have large mysql database (5 million rows) and data is phone number.
I tried many solution but it's still slow. Now, I'm using INT type and LIKE sql query for store and searching phone number.
Ex: SELECT phonenumber FROM tbl_phone WHERE phonenumber LIKE '%4567'
for searching phone numbers such as 170**4567**, 249**4567**,...
I need a solution which make it run faster. Help me, please!
You are storing numbers as INT, but querying then as CHAR (the LIKE operator implicitly converts INTs to CHARs) and it surely is not optimal. If you'd like to keep numbers as INT (probably the best idea in IO performance therms), you'd better change your queries to use numerical comparisons:
-- instead of CHAR operators
WHERE phone_number LIKE '%4567'
WHERE phone_number LIKE '1234%'
-- use NUMERIC operators
WHERE phone_number % 10000 = 4567
WHERE phone_number >= 12340000 -- considering 8 digit numbers
Besides choosing a homogeneous way to store and query data, you should keep in mind to create the appropriate index CREATE INDEX IDX0 ON table (phone_number);.
Unfortunately, even then your query might not be optimal, because of effects similar to #ron have commented about. In this case you might have to tune your table to break this column into more manageable columns (like national_code, area_code and phone_number). This would allow an index efficient query by area-codes, for example.
Check the advice here How to speed up SELECT .. LIKE queries in MySQL on multiple columns?
Hope it helps!
I would experiment with using REGEXP, rather than LIKE as in the following example:
SELECT `field` WHERE `field` REGEXP '[0-9]';
Other than that, indeed, create an index if your part of the phone search pattern has a constant length.
Here is also a link to MySQL pattern mathching document.
That LIKE predicate is operating on a string, so you've got an implicit conversion from INT to VARCHAR happening. And that means an index on the INT column isn't going to help, even for a LIKE predicate that has leading characters. (The predicate is not sargable.)
If you are searching for the last digits of the phone number, the only way (that I know of) to get something like that to be fast would be to add another VARCHAR column, and store the reversed phone number in it, where the order of the digits is backwards.
Create an index on that VARCHAR column, and then to find phone number that end with '4567':
WHERE reverse_phone LIKE '7654%'
-or-
WHERE reverse_phone LIKE CONCAT(REVERSE('4567'),'%')

Does the performance of a MySQL VARCHAR(n) table dgrade regularly with the value of n?

Or is MySQL optimized for certain values of n, e.g., powers of 2?
MySQL keeps extra information about exactly how long the string in a certain VARCHAR field is. It needs either 1 or 2 bytes to store this data. Upto 255 bytes, MySQL will need only 1 extra byte to store the length of the VARCHAR. Above 255, it will use 2 bytes. So there's one instance where a certain n will matter. But other than that, shorter is much better because MySQL will need to use much more memory when doing things involving this VARCHAR column. Doing things like sorting or operations that use in-memory temporary tables will cost a lot more if you have a VARCHAR(1000) vs VARCHAR(20).
There is a good few pages that cover this in the Schema Optimization and Indexing chapter of High Performance MySQL from Oreilly.
It isn't optimised for any kind of pattern of sizes - it's a simple case of larger varchar fields take longer to handle (oo-er). The only way to optimise it is to limit it to whatever is the longest it ever needs to be.

Facebook user_id : big_int, int or string?

Facebook's user id's go up to 2^32 .. which by my count it 4294967296.
mySQL's unsigned int's range is 0 to 4294967295 (which is 1 short - or my math is wrong)
and its unsigned big int's range is 0 to 18446744073709551615
int = 4 bytes, bigint = 8 bytes
OR
Do I store it as a string?
varchar(10) = ? bytes
How will it effect efficiency, I heard that mysql handle's numbers far better than strings (performance wise). So what do you guys recommend
Because Facebook assigns the IDs, and not you, you must use BIGINTs.
Facebook does not assign the IDs sequentially, and I suspect they have some regime for assigning numbers.
I recently fixed exactly this bug, so it is a real problem.
I would make it UNSIGNED, simply because that is what it is.
I would not use a string. That makes comparisons painful and your indexes clunkier than they need to be.
You can't use INT any more. Last night I had two user ids that maxed out INT(10).
I use a bigint to store the facebook id, because that's what it is.
but internally for the primary and foreign keys of the tables, i use a smallint, because it is smaller. But also because if the bigint should ever have to become a string (to find users by username instead of id), i can easily change it.
so i have a table that looks like this:
profile
- profile_key smallint primary key
- profile_name varchar
- fb_profile_id bigint
and one that looks like this
something_else
- profile_key smallint primary key
- something_else_key smallint primary key
- something_else_name varchar
and my queries for a singe page could be something like this:
select profile_key, profile_name
from profile
where fb_profile_id = ?
now i take the profile_key and use it in the next query
select something_else_key, something_else_name
from something_else
where profile_key = ?
the profile table almost always gets queried for almost any request anyway, so i don't consider it an extra step.
And ofcourse it is also quite ease to cache the first query for some extra performance.
If you are reading this in 2015 when facebook has upgraded their API to 2.0 version. They have added a note in their documentation stating that their ids would be changed and would have an app scope. So maybe there is huge possibility later in the future that they might change all the ids to Alpha numeric.
https://developers.facebook.com/docs/apps/upgrading#upgrading_v2_0_user_ids
So I would suggest to keep the type to varchar and avoid any future migration pains
Your math is a little wrong... remember that the largest number you can store in N bytes is 2^(N) - 1... not 2^(N). There are 2^N possible numbers, however the largest number you can store is 1 less that.
If Facebook uses an unsigned big int, then you should use that. They probably don't assign them sequentially.
Yes, you could get away with a varchar... however it would be slower (but probably not as much as you are thinking).
Store them as strings.
The Facebook Graph API returns ids as strings, so if you want comparisons to work without having to cast, you should use strings. IMO this trumps other considerations.
I would just stick with INT. It's easy, it's small, it works and you can always change the column to a larger size in the future if you need to.
FYI:
VARCHAR(n) ==> variable, up to n + 1 bytes
CHAR(n) ==> fixed, n bytes
Unless you expect more than 60% of the world's population to sign up, int should do?