I need to store some social ids (facebook/google/twitter user id, facebook place id, ecc..) in my MySQL DB. I found a lot of questions about this here on stackoverflow, but I didn't find a satisfactory answer. For example you can't be sure 100% that facebook id will always be a unsigned bigint, from facebook documentation the facebook id is described as "numeric string". Google id seems one digit bigger than bigint.
I believe that an index on a varchar column is slower than an index on a bigint column, for this reason I thought that using bigint, when possible, would be better than varchar. But I realized that you can store a varchar as a binary with the appropriate attribute.
For this reason I was thinking about use a varchar for all these social ids and (since the ordering is not an issue) store it as binary (attribute=binary), this because I need a fast index on them.
what's your thoughts: is this a good and fast solution?
Thanks
I use varchar. You are right about the differences, but even more importantly, there is no guarantee that the current type will stay the same. For instance, Facebook changes the size in the past and they mentioned somewhere that they may include chars in it.
An index is an index, if done right, there is no need to worry about its performance. No real difference between an index on numbers or varchar.
bigint + INDEX KEY + INNODB = Fast
Related
I want to create a system of online billboard where everyone can post a topic as my project.
I try to design the database using SQL to store the information of each topic, including the topic's id as primary key.
At first I design the id using integer datatype with auto-increment, as I think it's the simplest way. Then I thought about it and found out that the integer has limit(the number may be high but it is there), so I'm here finding another method.
Now I think of some pseudo-random algorithms, or use the hashing of topic's name but still not clear.
I also find the GUID from research in here, but not sure can it be used.
I wish you suggest me some ways of how to deal with the limit size of integer as primary key, or suggest me any keywords for me to do further research.
This answer assumes MySQL/MariaDB, because it uses the terminology "auto-increment" for such columns (as opposed to other databases that use identity or serial).
If int isn't big enough, you can use bigint.
Although I might consider it unlikely that you'll exceed the thresholds for int (it works for many applications), bigint would require great effort on you and your computers part for a long, long time to exceed the maximum value.
This is explained in the documentation.
With int, the maximum value supported by SQL Server is 2,147,483,647.
Just for completeness, I will also add that yet another option is to change the data type of the column to bigint (maximum value 9,223,372,036,854,775,807 - this will allow you to insert a million rows per second, for almost 300,000 years in a row).
Or if you fear that you will overflow even that, you can consider using decimal(38,0) - the maximum here is a number consisting of 38 9's (which will allow you to maintain that same pace for a whopping 31,709,791,983,764,586,504,312,531 years).
http://sqlblog.com/blogs/hugo_kornelis
I'm working tons of phone numbers, and many are international.
I've changed my phone numbers table structure to have 5 columns:
`phonenumbers`.`phoneID`
`phonenumbers`.`countrycode`
`phonenumbers`.`areacode`
`phonenumbers`.`phonenumber`
`phonenumbers`.`ext`
At the moment the phoneID is the only column that's an INT, since it's the primary key.
Should I change the other columns to integers? I've heard indexes work best with numeric values, and I'm only storing numbers in each of the columns (no dashes, parenthesis, spaces, etc)
I'm still learning how MySQL works with indexes, so I'm curious how others work with searching for numbers. In this case, I'm sure I'll be searching for numbers that start with a certain known areacode and part of a known phonenumber, or an entire phonenumber.
The part that gets me with indexing and table columns like phone numbers is that I don't always know how long a phonenumber will be. Since countries have different lengths for areacodes and phonenumbers.
In summary, INT vs VARCHAR indexing with numbers.
Phone numbers are not integers, so don't store them as one, it'll just cause you trouble. The obvious cases are when you have to handle phone numbers too big to fit in an int, or phone numbers starting with a 0.
Moreover, as you want to do prefix matches (phonenumber like '800%'), mysql will be able to use indexes if you're using varchar columns.
You have to figure out how you're querying this data, if you're frequently doing queries like where countrycode='1' and areacode='123' and phonenumber like '2%' , you'd want a compound index on (countrycode,areacode,phonenumber) , and if you're also often doing queries on only the phonenumber, you'd want an additional index only on the phonenumber column, but this is something you have to work out depeding on the amount of data you have and queries you do - work with EXPLAIN to learn how your indexes are used and where they are needed.
Use varchar for representing phone numbers NOT integers. Otherwise you will find your design decision will come back to bite you.
Also: "I've heard indexes work best with numeric values" - well, that's not strictly accurate: yes the index will take up less space, and more rows will fit per page etc, but an index on a varchar column works perfectly well.
Worry about index size and performance when (1) you have a huge amount of data and (2) when you have measured a performance problem.
In my opinion you have a lot of attributes, that you don´t need, and for phone numbers i usualy use an auto-increment key for id and the phone number is a varchar. This makes it easier the validation with the use of a programming language. It´s my opinion...
Use a BIGINT UNSIGNED simple because this forces you to normalize your data. Force your user to store the phonenumber in root level. That means at country level. You could store the country prefix in a separate column to ease the usage.
Everybody types phone-numbers in different ways and this makes it almost impossible to search the data.
E.g. %020123456% will not match 02 0123456. Are you going to search all combinations or just parse it?
This i know from experience, we had to fix manually about 1,000 phonenumbers which we could not script out when installing an auto-dialer.
I use id's for almost all my tables, you never know when they come handy. But today I read this...
Be extra careful to make sure that, according to convention, your ‘id’ column (or primary key) is:
char(36) and never varchar(36)
CakePHP will work with both definitions, however you will be sacrificing about 50% of the performance of your DB (MySQL in particular). This will be most evident in more complicated SELECT’s, which might require some JOIN’s or calculations.
I wonder... why even use something text-based, when you only have to save integers? I care a great deal about using the right formats for the right content, so I wonder if char gives any performance improvements over integers?
I would strongly suggestest using ints. I am doing some modelling for my thesis and I work on large datasets. I had to create a table with about ~70.000.000 rows. My primary key was varchar + int. At the beginning one cycle of creating 5-digit number of rows took 5 minutes, soon it became 40. Dropping the primary key fixed my performance issue. I guess that it is because ensuring uniqueness and it was becoming more and more time consuming. I had no similar issues when my primary key was int.
it is personal experience though, so maybe someone can give more theoretic and reliable answer.
char doesn't give any improvement over integer. But it's useful when you need to prevent users from knowing or tampering with other rows that you don't want them to.
Let's say each user have a profile picture with the naming /img/$id.jpg (the simplest case, since you don't have to store any data in DB for this information. Of course, there are other ways) If you use integer, someone can loop through all profile pictures that you have. With UUID, they can't.
When you have a lot of records, the auto increment int is better for performance. You can put the uuid in another field (secret_key, for example).
I've been running a small web-based experiemnt using Facebook UIDs to verify unique users.
Recently I've discovered that UIDs can be bigger than I realised among some users, so my int-based system is now inadequate and I need to convert to bigint.
I can't risk losing the results I already have, but need to convert the table so that the index containing the uid is now bigint. Are there any particular issues changing the type of an index column, and would it be as simple as:
ALTER TABLE breadusers MODIFY userid bigint;
?
In theory this should be absolutely fine, although it the data really matters, I presume you have a recent backup anyway in case something goes awry.
That said, I'd probably be tempted to store the Facebook UID as a string (i.e.: in a VARCHAR field) and simply have a generic auto-incremented ID field. Then again, that's an answer to a different question. :-)
For the Facebook UID part, I would suggest you to go for BIGINT(64).
Here is the answer from Facebook Blog:
https://developers.facebook.com/blog/post/45/
Facebook's user id's go up to 2^32 .. which by my count it 4294967296.
mySQL's unsigned int's range is 0 to 4294967295 (which is 1 short - or my math is wrong)
and its unsigned big int's range is 0 to 18446744073709551615
int = 4 bytes, bigint = 8 bytes
OR
Do I store it as a string?
varchar(10) = ? bytes
How will it effect efficiency, I heard that mysql handle's numbers far better than strings (performance wise). So what do you guys recommend
Because Facebook assigns the IDs, and not you, you must use BIGINTs.
Facebook does not assign the IDs sequentially, and I suspect they have some regime for assigning numbers.
I recently fixed exactly this bug, so it is a real problem.
I would make it UNSIGNED, simply because that is what it is.
I would not use a string. That makes comparisons painful and your indexes clunkier than they need to be.
You can't use INT any more. Last night I had two user ids that maxed out INT(10).
I use a bigint to store the facebook id, because that's what it is.
but internally for the primary and foreign keys of the tables, i use a smallint, because it is smaller. But also because if the bigint should ever have to become a string (to find users by username instead of id), i can easily change it.
so i have a table that looks like this:
profile
- profile_key smallint primary key
- profile_name varchar
- fb_profile_id bigint
and one that looks like this
something_else
- profile_key smallint primary key
- something_else_key smallint primary key
- something_else_name varchar
and my queries for a singe page could be something like this:
select profile_key, profile_name
from profile
where fb_profile_id = ?
now i take the profile_key and use it in the next query
select something_else_key, something_else_name
from something_else
where profile_key = ?
the profile table almost always gets queried for almost any request anyway, so i don't consider it an extra step.
And ofcourse it is also quite ease to cache the first query for some extra performance.
If you are reading this in 2015 when facebook has upgraded their API to 2.0 version. They have added a note in their documentation stating that their ids would be changed and would have an app scope. So maybe there is huge possibility later in the future that they might change all the ids to Alpha numeric.
https://developers.facebook.com/docs/apps/upgrading#upgrading_v2_0_user_ids
So I would suggest to keep the type to varchar and avoid any future migration pains
Your math is a little wrong... remember that the largest number you can store in N bytes is 2^(N) - 1... not 2^(N). There are 2^N possible numbers, however the largest number you can store is 1 less that.
If Facebook uses an unsigned big int, then you should use that. They probably don't assign them sequentially.
Yes, you could get away with a varchar... however it would be slower (but probably not as much as you are thinking).
Store them as strings.
The Facebook Graph API returns ids as strings, so if you want comparisons to work without having to cast, you should use strings. IMO this trumps other considerations.
I would just stick with INT. It's easy, it's small, it works and you can always change the column to a larger size in the future if you need to.
FYI:
VARCHAR(n) ==> variable, up to n + 1 bytes
CHAR(n) ==> fixed, n bytes
Unless you expect more than 60% of the world's population to sign up, int should do?