Best way to avoid fake users - unique

I'm developing a web app where it's crucial that the users can't register twice (e.g., with different emails).
I'd like to use some automated way to check it's unique.
Checking Personal Identification Number (like in Passport), isn't a good idea, because it must be done manually, since gov sites use captchas.
What is the best way to achieve it?
To be more specific, what is a field that I can use with UNIQUE constraint in my users table?

I am afraid there is no single answer to your question. Tax systems use VAT code, medical systems use Social Security Numbers, Universities use Student-ID etc.
Depending on your application, you could try to see if OpenID would be suitable. It provides an easy sign-in mechanism for your site. OpenID will give you a lot of information about the user and also the advantage of access to social networking data which may be valuable to your application.

You can for example use DBMS and keep an eye on users registrations. For more info see msdn SQL Server

Related

Possible to make a one account per person website?

Is it possible to make a website with only one account per person?
Any suggestion is good.
Thanks
If you don't want people to go generating 100 accounts a minute, you'll need something like captcha, which is very easy to add on to your website.
You can do other things, like associate each account with an email address, and make the user verify that that email actually exists by sending a link out to that email address so that when they click on it, it verifies the connection.
To associate exactly one account per person, you are going to need to use some sort of official identity, and usually for smaller websites that doesn't make sense. By official identity I mean verify their credit card or government identification (social security?), but then you run into a lot of problems because people won't want to do this, and it is going to cost money to make sure that these identities are real. Also, if you really need something like this, you're going to have to beef up the security of your website.
An alternative is to require a user to put in a verification number which you send to them via SMS, and ensure the phone number they enter is unique, phones/simcards are relatively cheap these days, but most people wouldn't go through the effort of spending $5 on a new simcard to get a duplicate account on your site.
(and if they would, sell accounts for $3 and undercut the cost of a sim ;)
Ask for identifying information for example credit card data that you can verify and allow registrations only with this identifying information. Of course credit card data can be stolen, so you cannot be 100% sure about anything in internet or in world generally.
Whatever you do, people will still try to buck the system. If you use Email, then people can have multiple free emails, if you block free emails, lots of people dont have any other emails so you block them. If you use IP you block anyone with a shared IP such as ISPs who enforce proxy servers for their clients.
Unless you start asking for social security numbers or NI or whatever your country has, and you start alienating people as a rule because they would consider that irrelevant personal information.
You just have to hope people treat your site fairly, and know some wont.
One potential solution these days is to only create user accounts via federation of identity from a much stricter service provider. For example, verify your users via a facebook account using oauth. I believe facebook is pretty good at detecting shills/spammers and you can harness their resources in your service too.
I think you have 2 options, neither of which will solve the whole problem:
Log the IP address and prevent multiple logins from one IP. Not great, as this will probably backfire if college kids use your site and the dorm uses shared IPs.
Log the MAC address and prevent multiple logins from one MAC. Better, but it will prevent multiple people from the same household from using your site, and most houses have more than one computer/mobile device.
You could always combine the two options, but again with multiple mobile devices it's possible to circumnavigate that.
you could make a code that only lets 1 account be made per computer used. you have to be able to get the computers IP address and block it from making more than 1 account.

How to defend against users with Multiple Accounts?

We have a service where we literally give away free money.
Naturally said service is ripe for abuse. To defend against this we do the following:
log ip address
use unique email addresses (only 1 acct/email addy)
collect more info like st. address, phone number, etc.
use signup captcha
BHOs (I've seen poker rooms use these)
Now, let's get real here -- NONE of this will stop a determined user.
Obviously ip addresses can be changed via a proxy (which could be blacklisted via akismet) but change anyways if the user has a dynamic ip or if more than one user is behind a NAT'd network (can we say almost everyone?)
I can sign up for thousands of unique email addresses each hour -- this is no defense.
I can put in fake information taken from lists for street addresses and phone numbers.
I can buy captchas from captcha solving services (1k for $5).
bhos seem only effective for downloadable software -- this is a website
What are some other ways to prevent multiple users from abusing the service? How do all the PPC people control click fraud?
I know we could actually call the person but I don't think we are trying to do that anytime soon.
Thanks,
It's pretty difficult to generate lots of fake phone numbers that can send and receive SMS messages. SMS verification could go a long way towards cutting down on fraud. Of course, it also limits you to giving away free money to cell phone owners.
I think only way is to bind your users accounts to 'real world' information, like his/her passport number, for instance. Of course, you'll need to make sure that information is securely stored and to find some way to validate it.
Re: signing up for new email accounts...
A user doesn't even need to do that. Please feel free to send your mail to brian_s#mailinator.com, or feydr.asks.a.question#spamherelots.com, or stackoverflow#safetymail.info, or my_arbitrary_username#zippymail.info. I haven't registered any of those email addresses, but all of them will work.
Those domains are owned by ManyBrain, and they (and probably others as well) set the domain to accept any email user. ManyBrain in particular then makes the inboxes for those emails publicly accessible without any registration (stripping everything by text from the email and deleting old mail). Check it out: admin#mailinator.com's email inbox!
Others have mentioned ways to try and keep user identities unique. This is just one more reason to not trust email addresses.
First, I suppose (hope) that you don't literally give away free money but rather give it to use your service or something like that.
That matters as there is a big difference between users trying to just get free money from you they can spend on buying expensive cars vs only spending on your service which would be much more limited.
Obviously many more user will try to fool the system in the former than in the latter case.
Why it matters? Because it is all about the balance between your control vs your user annoyance. I see many answers concentrating on the control part, so let's go through annoyance, shall we?
Log IP address. What if I am the next guy on the computer in say internet shop and the guy before me already used that IP? The other guy left your hot page that I now see but I am screwed because the IP is blocked. Yes, I can go to another computer but it is annoyance and I may have other things to do.
Collecting physical Adresses. For what??? Are you going to visit me? Or start sending me spam letters? Let me guess, more often than not you get addresses with misprints at best and fake ones at worst. In fact, it is much less hassle for me to give you fake address and not dealing with whatever possible spam letters I'll have to recycle in environment-friendly way. :)
Collecting phone numbers. Again, why shall I trust your site? This is the real story. I gave my phone nr to obscure site, then later I started receiving occasional messages full of nonsense like "hit the fly". That I simply deleted. Only later and by accident to discover that I was actually charged 2 euros to receive each of those messages!!! Do I want to get those hassles? Obviously not! So no, buddy, sorry to disappoint but I will not give your site my phone number unless your company is called Facebook or Google. :)
Use signup captcha. I love that :). So what are we trying to achieve here? Will the user who is determined to abuse your service, have problems to type in a couple of captchas? I doubt it. But what about the "good user"? Are you aware how annoying captchas are for many users??? What about users with impaired vision? But even without it, most captchas are so bad that they make you feel like you have impaired vision! The best advice I can give - if you care about user experience, avoid captchas as plague! If you have any doubts, do your online research first!
See here more discussion about control vs annoyance and here some more thoughts about being user-friendly.
You have to bind their information to something that is 'real world', as Rubens says. Of course, you also need to be able to verify this information (I can just make up passport numbers all day if you don't check to make sure they're correct).
How do you deliver the money? Perhaps you can index this off the paypal account, mailing address, or whatever you're sending the money to?
Sometimes the only way to prevent people abusing a system is to not have the system in the first place.
If you're doing what you say you're doing, "giving away money to people", then surprise surprise, there will be tons of people with more time available to try to find ways to game the system than you will have to fix it.
I guess it will never be possible to have an identification system which identifies fake identities that is:
cheap to run (I think it's called "operational cost"?)
cheap to implement (ideally one time cost - how do you call that?)
has no Type-I/Type-II errors
is scalable
But I think you could prevent users from having too many (to say a quite random number: more than 50) accounts.
You might combine the following approaches:
IP address: can be bypassed with VPN
CAPTCHA: can be bypassed with human farms (see this article, for example - although they claim that their test can't be that easily passed to other humans, I doubt this is true)
Ability-based identification: can be faked when you know what is stored and how exactly the identification works by randomly (but with a given distribution) acting (example: brainauth.com)
Real-world interaction: Although this might be the best one, but I guess it is expensive and not many users will accept it. Also, for some users/countries it might not be possible. (example: Postident in Germany, where the Post wants to see your identity card. I guess this can only be faced in massive scale by the government.)
Other sites/resources: This basically transforms the problem for other sites. You can use services, where it is not allowed/uncommon/expensive to have much more than 1 account
Email
Phone number: e.g. by using SMS, see Multi-factor authentication
Bank account: PayPal; transfer not much money or ask them to transfer a random (small) amount to you (which you will send back).
Social based
When you take the social graph (vertices are people, edges are connections), you will expect some distribution. You know that you are a single human and you know some other people. So you have a "network of trust" (in quotes, because I think this might be used in other context as well). Now you might not trust people / networks how interact heavily with your service, but are either isolated (no connection) or who connect a large group with another large group ("articulation points"). You also might not trust fast growing, heavily interacting new, isolated graphs.
When a user provides content that is liked by many other users (who you trust), this might be an indicator that there is a real human creating it.
We had a similar issue recently on our website, it is really a hassle to solve this issue if you are providing a business over one time or monthly recurring free credits system.
We are using a fraud detection solution https://fraudradar.io for a while and that helped us a lot to clean out most of the spam activities. It is pretty customizable with:
IP checks
Email domain validity
Regex rules
Whitelisting options per IP, email domain etc.
Simple API to communicate through
I would suggest to check that out.

Generally a Good Idea to Always Hash Unique Identifiers in URL?

Most sites which use an auto-increment primary-key display it openly in the url.
i.e.
example.org/?id=5
This makes it very easy for anyone to spider a site and collect all the information by simply incrementing the value of id. I can understand where in some cases this is a bad thing if permissions/authentication are not setup correctly and anyone could view anything by simply guessing the id, but is it ever a good thing?
example.org/?id=e4da3b7fbbce2345d7772b0674a318d5
Is there ever a situation where hashing the id to prevent crawling is bad-practice (besides losing the time it takes to setup this functionality)? Or is this all a moot topic because by putting something on the web you accept the risk of it being stolen/mined?
Generally with web-sites you're trying to make them easy to crawl and get access to all the information so that you can get good search rankings and drive traffic to your site. Good web developers design their HTML with search engines in mind, and often also provide things like RSS feeds and site maps to make it easier to crawl content. So if you're trying to make crawling more difficult by not using sequential identifiers then (a) you aren't making it more difficult, because crawlers work by following links, not by guessing URLs, and (b) you're trying to make something more difficult that you also spend time trying to make easier, which makes no sense.
If you need security then use actual security. Use checks of the principal to authorize or deny access to resources. Obfuscating URLs is no security at all.
So I don't see any problem with using numeric identifiers, or any value in trying to obfuscate them.
Using a hash like MD5 or SHA on the ID is not a good idea:
there is always the possibility of collisions. That is, two different IDs hash to the same value.
How are you going to unhash it back to the actual ID?
A better approach if you're set on avoiding incrementing IDs would be to use a GUID, or just a random value when you create the ID.
That said, if your application security relies on people not guessing an ID, that shows some flaws elsewhere in the system. My advice: stick to the plain and easy auto-incrementing ID and apply some proper access control.
I think hashing for publicly accessible id's is not a bad thing, but showing sequential id's will in some cases be a bad thing. Even better, use GUID/UUIDs for all your IDs. You can even use sequential GUIDS in a lot of technologies, so it's faster (insert-stage) (though not as good in a distributed environment)
Hashing or randomizing identifiers or other URL components can be a good practice when you don't want your URLs to be traversable. This is not security, but it will discourage the use (or abuse) of your server resources by crawlers, and can help you to identify when it does happen.
In general, you don't want to expose application state, such as which IDs will be allocated in the future, since it may allow an attacker to use a prediction in ways that you didn't forsee. For example, BIND's sequential transaction IDs were a security flaw.
If you do want to encourage crawling or other traversal, a more rigorous way would be to provide links, rather than by providing an implementation detail which may change in the future.
Using sequential integers as IDs can make many things cheaper on your end, and might be a resonable tradeoff to make.
My opinion is that if something is on the web, and is served without requiring authorization, it was put with the intention that it should be publicly accessible. Actively trying to make it more difficult to access seems counter-intuitive.
Often, spidering a site is a Good Thing. If you want your information available as much as possible, you want sites like Google to gather data on your site, so that others can find it.
If you don't want people to read through your site, use authentication, and deny access to people who don't have access.
Random-looking URLs only give the impression of security, without giving the reality. If you put account information (hidden) in a URL, everyone will have access to that web spider's account.
My general rule is to use a GUID if I'm showing something that has to be displayed in a URL and also requires credentials to access or is unique to a particular user (like an order id). http://site.com/orders?id=e4da3b7fbbce2345d7772b0674a318d5
That way another user won't be able to "peek" at the next order by hacking the url. They may be denied access to someone else's order, but throwing a zillion letters and numbers at them is a pretty clear way to say "don't mess with this".
If I'm showing something that's public and not tied to a particular user, then I may use the integer key. For example, for displaying pictures, you might wish to allow your users to hack the url to see the next picture.
http://example.org/pictures?id=4, http://example.org/pictures?id=5, etc.
(I actually wouldn't do either as a simple GET parameter, I'd use mod_rewrite (or something) to make readable urls. Something like http://example.org/pictures/4 -> /pictures.php?picture_id=4, etc.)
Hashing an integer is a poor implementation of security by obscurity, so if that's the goal, a true GUID or even a "sequential" GUID (whether via NEWSEQUENTIALID() or COMB algorithm) is much better.
Either way, no one types URLs anymore, so I don't see much sense in worrying about the difference in length.

How do I explain APIs to a non-technical audience?

A little background: I have the opportunity to present the idea of a public API to the management of a large car sharing company in my country. Currently, the only options to book a car are a very slow web interface and a hard to reach call center. So I'm excited of the possiblity of writing my own search interface, integrating this functionality into other products and applications etc.
The problem: Due to the special nature of this company, I'll first have to get my proposal trough a comission, which is entirely made up of non-technical and rather conservative people. How do I explain the concept of an API to such an audience?
Don't explain technical details like an API. State the business problem and your solution to the business problem - and how it would impact their bottom line.
For years, sales people have based pitches on two things: Features and Benefit. Each feature should have an associated benefit (to somebody, and preferably everybody). In this case, you're apparently planning to break what's basically a monolithic application into (at least) two pieces: a front end and a back end. The obvious benefits are that 1) each works independently, so development of each is easier. 2) different people can develop the different pieces, 3) it's easier to increase capacity by simply buying more hardware.
Though you haven't said it explicitly, I'd guess one intent is to publicly document the API. This allows outside developers to take over (at least some) development of the front-end code (often for free, no less) while you retain control over the parts that are crucial to your business process. You can more easily [allow others to] add new front-end code to address new market segments while retaining security/certainty that the underlying business process won't be disturbed in the process.
HardCode's answer is correct in that you should really should concentrate on the business issues and benefits.
However, if you really feel you need to explain something you could use the medical receptionist analogue.
A medical practice has it's own patient database and appointment scheduling system used by it's admin and medical staff. This might be pretty complex internally.
However when you want to book an appointment as a patient you talk to the receptionist with a simple set of commands - 'I want an appointment', 'I want to see doctor X', 'I feel sick' and they interface to their systems based on your medical history, the symptoms presented and resource availability to give you an appointment - '4:30pm tomorrow' - in simple language.
So, roughly speaking using the receptionist is analogous to an exterior program using an API. It allows you to interact with a complex system to get the information you need without having to deal with the internal complexities.
They'll be able to understand the benefit of having a mobile phone app that can interact with the booking system, and an API is a necessary component of that. The second benefit of the API being public is that you won't necessarily have to write that app, someone else will be able to (whether or not they actually do is another question, of course).
You should explain which use cases will be improved by your project proposal. An what benefits they can expect, like customer satisfaction.

Should you worry about fake accounts/logins on a website?

I'm specifically thinking about the BugMeNot service, which provides user name and password combos to a good number of sites. Now, I realize that pay-for-content sites might be worried about this (and I would suspect that most watch for shared accounts), but how about other sites? Should administrators be on the lookout for these accounts? Should web developers do anything differently to take them into account (and perhaps prevent their use)?
I think it depends on the aim of your site. If usage analytics are all-important, then this is something you'd have to watch out for. If advertising is your only revenue stream, then does it really matter which username someone uses?
Probably the best way to discourage use of bugmenot accounts is to make it worthwhile to have an actual account. E.g.: No one would use that here, since we all want rep and a profile, or if you're sending out useful emails, people want to receive them.
Ask yourself the question "Why do we require users to register to access my site?" Once you have business reason for this requirement, then you can try to work out what the effect of having some part of that bypassed by suspect account information.
Work on the basis that at least 10 to 15 percent of account information will be rubbish - and if people using the site can't see any benefit to them personally for registering, and if the registration process is even remotely tedious or an imposition, then accept that you will be either driving more potential visitors away, or increasing your "crap to useful information" ratio.
Not make registration mandatory to read something? i.e. Ask people to register when you are providing some functionality for them that 'saves' some settings, data, etc. I would imagine site like stackoverflow gets less fake registrations (reading questions doesn't require an account) than say New York Times, where you need to have an account to read articles.
If that is not upto your control, you may consider removing dormant accounts. i.e. Removing accounts after a certain amount of inactivity.
That entirely depends.
Most sites that find themselves listed in bugmenot.com tend to be the ones that require registration for in order to access otherwise-free content.
If registration is required in order to interact with the site (ie, add comments/posts/etc), then chances are most people would rather create their own account than use one that has been made public.
So before considering whether to do things like automatically check bugmenot - think about whether their are problems with your business model.
There are a few situations where pay-to-access content sites (I'm thinking things like, ahem, 'adult' sites) end up with a few user accounts being published publically (usually because someone has brute-forced some account details), and in that case there may be a argument for putting significant effort into it.
From an administrator viewpoint absolutely. That registration is required for a reason, even if it's something just as simple as user tracking/profile maintaining. Several thousand people using that login entirely defeats the purpose. IP tracking could help mitigate this problem, but it would definitely be hard to eliminate entirely.
No need to worry about BugMeNot: http://www.bugmenot.com/report.php
With bugmenot, keep in mind that this service is not actually there to harm the sites, but rather to make using them easier. You can request to block your site if it is pay-per-view, community-based (i.e. a forum or Wiki) or the account includes sensible information (like banking). This means in virtually all situations where you would think that bugmenot is a bad thing, bugmenot does not want to be used. So maybe things are not as bad as you might think.