My question is more advisory than technical.
I'm writing a Facebook app in which I am fetching some information about the user, including facebook_id.
I was wondering if I should keep the user id as INT or VARCHAR in the MySQL database?
Facebook uses 64-bit integers (bigint) for their user ids. So you use Bigint UNSIGNED in MySQL.
"As a reminder, in the very near future, we will be rolling out 64 bit user IDs. We encourage you to test your applications by going to www.facebook.com/r.php?force_64bit and create test accounts with 64 bit UIDs."
Edit: Facebook usernames is not the same thing as the user id. The username is of course varchar but will not be returned as the id.
Although unlikely, facebook could change the format of their ID's, so to make it future proof, I'd use a varchar.
similar question to: Facebook user_id : big_int, int or string?
"I would not use a string. That makes comparisons painful and your indexes clunkier than they need to be."
To quote facebook's upgrade notes regarding graph API v2.0 (effective May 2015):
All IDs are strings. In v1.0, IDs were often large numbers.
Which (to me) implies that userids are not guaranteed to be numbers. In fact, facebook recommend that you use strings:
https://developers.facebook.com/docs/graph-api/reference/v2.2/user#fields
Name:id
Description: id The id of this person's user account.
This ID is unique to each app and cannot be used across different apps(...)
type: string
Although I must admit I've never seen an alphanumeric id.
Use BIGINT(64) to store Facebook User IDs.
Here you go: https://developers.facebook.com/blog/post/45/
For new data types, as they are grahp obj ID, I believe it is safe to save them as BINT.
However, for old "id", e.g. pic, save them as string (you can easily see that they are in the format xxxxxx_xxxxxxxx)
I'd use INT, because it's not so big. Searching in INT is faster and betterfor ordering
Related
I'm taking the Meta Data Engineer Professional Certificate and I was just given this prompt in a lab:
Mr. Carl needs to have a new table to store the contact details of each customer including customer account number, customer phone number and customer email address.
You are required to choose a relevant data type for each of the columns.
Solution:
Account number: INTEGER
Phone number: INTEGER
Email: VARCHAR
Prior to reading the solution I selected VARCHAR(10) as the datatype for storing phone numbers as I thought they should be treated as string data. My reasoning is that there's no reason to perform any sort of mathematical operation on a phone number, and they're often typed with other characters like "(" or "-".
Is there any compelling reason for storing a phone number as an INT? Do you agree with the solution to this prompt? What is the best practice for storing phone numbers?
Is "Meta Data Engineer Professional Certificate" aimed at MySQL?
General Professional: If not MySQL-specific, then you need to understand that "INTEGER" is implemented in different ways by different database engines.
MySQL Professional: INTEGER, in MySQL, maps to INT SIGNED, which is limited to about 2 billion--That is only 9 digits. I don't know what the max phone number is worldwide, but I know that 10 is needed.
BIGINT gives you about 18 digits (in 8 bytes), but that seems silly. For the reasons already mentioned VARCHAR(...) is reasonable. (Perhaps a limit of 20 would be quite sufficient.) In that case, a 10-digit number would take 11 bytes (1 for length, plus 10 for the number.)
Arguably, you could say, for example DECIMAL(15) to allow up to 15 digits in a 7-byte column.
(I prefer VARCHAR, in spite of it taking the most space.)
Either way: It is a bad test question if it does not understand the two cases I present here.
Non digits: 'typed with other characters like "(" or "-"' -- That brings up a different issue. It comes under the general heading of GIGO. Cleanse the data before storing it into the database.
If you ever needed to compare two phone numbers for equality, you would wish you had removed all non-digits. (Or added them in some canonical way, such as US: "(800)543-1212"
User input: If you ever create a UI for entering phone numbers, dates, SSNs, (or other numbers with some structure), DO NOT require the user to follow some punctuation rules. DO allow a variety of typical formats. (OK, Dates are tricky because there are incompatible orderings. But what if I type "1-1-2021", will you spit at me not having the leading zeroes?
Indexing: VARCHAR, DECIMAL, INT, etc are all indexable. Any speed difference is not significant.
Extensions: Without VARCHAR, how would you represent the "extension" in "(800)543-1212x543"? Might this point be the deciding factor in favor of VARCHAR? And you should write a bug report against that 'Certification' test?
Duplicate?: Which is best data type for phone number in MySQL and what should Java type mapping for it be? covers most of what I have said, and hints that [perhaps] VARCHAR(20) is sufficient. (The quoted 15, excludes the international prefix.)
In my opinion, there is no absolute best choice in this. Both have pros and cons. Personally, I'm in favor of using varchar. Though special characters like hyphen can cause dupes when mishandled (it's a rare case and it's the user to blame as it's required to verify the input before submitting),it does have the merit of formatting the phone which improves the readability. e.g area_code-tel: xxx-xxxxxxxx (without it it's near impossible to separate the area code and the phone number as both can have a varied length). About indexing,though numerics does have advantages over strings, I'm not sure if a phone number would be used as an index. There are more worthy candidates such as ID or date, but what would a phone number do? Usually we look for the phone based on indexed column such as ID, but how often do we get something based on phone number? Unless we want to list all phones from a particular area, we don't really need it to be indexed. Then it actually would be more fitting to use special characters like hyphen to help determine the area part.
P.S Like Ken White kindly suggested, there are cases when phone numbers should be indexed, especially when they are more suitable to be an identifier.
Storing phone numbers as strings can be a disaster, the first things coming up to my mind are:
You can get dupes easily, maybe someone types the number with (
and/or - and another user does type the same number without those
characters, long story short you end up with a duplicate.
Thinking about a way to normalize the phone number using an integer
makes too more sense in terms of normalization and non duplication.
Also think about a search with the scenario above, what would you use ? a like a numeric operator ? spread casts ? Messy...
Now comes the important thing and it is related to the indexing, the
int will be faster. The longer is the varchar the slower it gets
however you are limiting its length.
The validation can be on the UI with a field mask, or using a regex on the logic whatever makes more sense for you.
Hope i helped a little bit :)
I need to store some social ids (facebook/google/twitter user id, facebook place id, ecc..) in my MySQL DB. I found a lot of questions about this here on stackoverflow, but I didn't find a satisfactory answer. For example you can't be sure 100% that facebook id will always be a unsigned bigint, from facebook documentation the facebook id is described as "numeric string". Google id seems one digit bigger than bigint.
I believe that an index on a varchar column is slower than an index on a bigint column, for this reason I thought that using bigint, when possible, would be better than varchar. But I realized that you can store a varchar as a binary with the appropriate attribute.
For this reason I was thinking about use a varchar for all these social ids and (since the ordering is not an issue) store it as binary (attribute=binary), this because I need a fast index on them.
what's your thoughts: is this a good and fast solution?
Thanks
I use varchar. You are right about the differences, but even more importantly, there is no guarantee that the current type will stay the same. For instance, Facebook changes the size in the past and they mentioned somewhere that they may include chars in it.
An index is an index, if done right, there is no need to worry about its performance. No real difference between an index on numbers or varchar.
bigint + INDEX KEY + INNODB = Fast
I'm using Google App Engine and storing users. Before, I was using MySQL with an auto-incrementing int for my userId field, but now GAE auto generates a key for each new entity I store, such as g5kZXZ-aGVsbG93b3JsZHIKCxIEVXNlchgNDA, but they also automatically generate an auto-incrementing ID int too.
Which one should I use in my client code to store as the userId? Are there any advantages to using the long key that GAE generates, or is using the small int ID the same thing in terms of performance and look ups? Are there any advantages to one over the other, or is there practically no difference?
Edit: Sorry my question was not clear enough. Here's what I'm asking:
It's not about length, but does having the lookup key put me a step ahead of not having it? Because if I wanted to look up a user, I'd have to look him up by email, but now that I have the key for that row in the "table", does this give me any sort of advantage?
Either one is fine, there's no performance difference between using a long string of letters or a short one to identify users.
Remember that the generated entity ID is not guaranteed to be a monotonically increasing value.
I've been running a small web-based experiemnt using Facebook UIDs to verify unique users.
Recently I've discovered that UIDs can be bigger than I realised among some users, so my int-based system is now inadequate and I need to convert to bigint.
I can't risk losing the results I already have, but need to convert the table so that the index containing the uid is now bigint. Are there any particular issues changing the type of an index column, and would it be as simple as:
ALTER TABLE breadusers MODIFY userid bigint;
?
In theory this should be absolutely fine, although it the data really matters, I presume you have a recent backup anyway in case something goes awry.
That said, I'd probably be tempted to store the Facebook UID as a string (i.e.: in a VARCHAR field) and simply have a generic auto-incremented ID field. Then again, that's an answer to a different question. :-)
For the Facebook UID part, I would suggest you to go for BIGINT(64).
Here is the answer from Facebook Blog:
https://developers.facebook.com/blog/post/45/
I'm a beginning programmer, building a non-commercial web-site.
I need a user ID, and I thought it would be logical to use for that a simple INTEGER field with an auto-increment. Does that make sense? The user ID will be not be directly used by users (they'll have to select a user-name); should I care about where they start at (presumably 1)?
Any other best practices I should incorporate in building my 'Users' table?
Thanks!
JDelage
Your design is correct. Your internal PK should be a meaningless number, not seen by users of the system and maintained automatically. It doesn't matter if it starts at 1 and it doesn't matter if it's sequential or not, or if you have "holes" in the sequence. (For cases in which you do expose the number to end users, it is sometimes important that the numbers be neither sequential nor fully-populated so that they are not guessable).
Users should identify themselves to the system with another, meaningful piece of the information (such as an email address). That piece of information should either be guaranteed unique (using a UNIQUE index) or else your front end must provide an interface for disambiguation.
Among the benefits of this design are:
The meaningful identifier for the account can be changed by updating one value in one record of one table, rather than requiring updates all around the database.
Your PK value, which will appear many, many times in the database, is a small and efficiently indexed integer while your user-facing identifier can be of any type you want including a longish text string.
You can use a non-unique identifier with disambiguation if the application calls for it.
auto_increment is okay.
But, you shouldn't care of it's particular number.
Extremely contrary, you should never be concerned of the identifier's particular value. Take is as an abstract identifier only.
Though I doubt it can be invisible to users. Do you have another identifier to use? Auto_inqrement identifiers are usually visible to users as well. For example your ID here is 98361, nobody is hiding it. It is very handy to use such numbers, being unique and unchanged, forever bound to particular table row, it will always identify the same matter (a user, or an article, etc).
An auto incrementing field is fine unless you need to do things like share this ID across multiple databases then you will probably need to create the id value yourself. Also beware of exporting and importing data. If you are not careful all the id values will get reassigned.
In general I avoid auto incrementing fields so I have more control over how the id values are generated. Which is not to say I care what the values are just that they are unique. These are internal values the end user should never see.
Yes, that is correct. Auto-Increment starts at 1, usually. It's not usually accepted to have 0 as an ID.
If you are storing passwords, do not store them as clear text, use md5 (most popular) or some other hash.
Yes, auto incrementing is fine, Problably you will be saving passwords as well, make sure these have some kind of protection, hashing (md5) or encrypting is fine.
Also make sure you index the columns you will use to perform lookups, such as email etc... to avoid full table scans.