Should you worry about fake accounts/logins on a website? - language-agnostic

I'm specifically thinking about the BugMeNot service, which provides user name and password combos to a good number of sites. Now, I realize that pay-for-content sites might be worried about this (and I would suspect that most watch for shared accounts), but how about other sites? Should administrators be on the lookout for these accounts? Should web developers do anything differently to take them into account (and perhaps prevent their use)?

I think it depends on the aim of your site. If usage analytics are all-important, then this is something you'd have to watch out for. If advertising is your only revenue stream, then does it really matter which username someone uses?
Probably the best way to discourage use of bugmenot accounts is to make it worthwhile to have an actual account. E.g.: No one would use that here, since we all want rep and a profile, or if you're sending out useful emails, people want to receive them.

Ask yourself the question "Why do we require users to register to access my site?" Once you have business reason for this requirement, then you can try to work out what the effect of having some part of that bypassed by suspect account information.
Work on the basis that at least 10 to 15 percent of account information will be rubbish - and if people using the site can't see any benefit to them personally for registering, and if the registration process is even remotely tedious or an imposition, then accept that you will be either driving more potential visitors away, or increasing your "crap to useful information" ratio.

Not make registration mandatory to read something? i.e. Ask people to register when you are providing some functionality for them that 'saves' some settings, data, etc. I would imagine site like stackoverflow gets less fake registrations (reading questions doesn't require an account) than say New York Times, where you need to have an account to read articles.
If that is not upto your control, you may consider removing dormant accounts. i.e. Removing accounts after a certain amount of inactivity.

That entirely depends.
Most sites that find themselves listed in bugmenot.com tend to be the ones that require registration for in order to access otherwise-free content.
If registration is required in order to interact with the site (ie, add comments/posts/etc), then chances are most people would rather create their own account than use one that has been made public.
So before considering whether to do things like automatically check bugmenot - think about whether their are problems with your business model.
There are a few situations where pay-to-access content sites (I'm thinking things like, ahem, 'adult' sites) end up with a few user accounts being published publically (usually because someone has brute-forced some account details), and in that case there may be a argument for putting significant effort into it.

From an administrator viewpoint absolutely. That registration is required for a reason, even if it's something just as simple as user tracking/profile maintaining. Several thousand people using that login entirely defeats the purpose. IP tracking could help mitigate this problem, but it would definitely be hard to eliminate entirely.

No need to worry about BugMeNot: http://www.bugmenot.com/report.php

With bugmenot, keep in mind that this service is not actually there to harm the sites, but rather to make using them easier. You can request to block your site if it is pay-per-view, community-based (i.e. a forum or Wiki) or the account includes sensible information (like banking). This means in virtually all situations where you would think that bugmenot is a bad thing, bugmenot does not want to be used. So maybe things are not as bad as you might think.

Related

Best way to avoid fake users

I'm developing a web app where it's crucial that the users can't register twice (e.g., with different emails).
I'd like to use some automated way to check it's unique.
Checking Personal Identification Number (like in Passport), isn't a good idea, because it must be done manually, since gov sites use captchas.
What is the best way to achieve it?
To be more specific, what is a field that I can use with UNIQUE constraint in my users table?
I am afraid there is no single answer to your question. Tax systems use VAT code, medical systems use Social Security Numbers, Universities use Student-ID etc.
Depending on your application, you could try to see if OpenID would be suitable. It provides an easy sign-in mechanism for your site. OpenID will give you a lot of information about the user and also the advantage of access to social networking data which may be valuable to your application.
You can for example use DBMS and keep an eye on users registrations. For more info see msdn SQL Server

is it possible to check if someone is trying to vote from the same computer

From my knowledge the answer to this question is no, but i might be missing something. There are some polling sites which claim that voting from the same computer is forbidden and will result in a ban. How can they detect that? A cheater may use routable IPs, different operating systems, different browsers, proxies etc.
Cookies, and assumption that the multi-account cheater will sooner or later forget to clear cookies or use wrong browser and they will got him.
To sum up all that I could find: the answer is no. You can't be 100% sure that the user is trying to vote from the same computer. Advanced solutions to prevent this is to check the IP and, using maybe a java applet, find out the user's computer account name, and maybe set a limit to the number of votes coming from that IP anyway. Also, a human should be screening the results anyway and see how many of those IPs belong to institutions or college campus blocks, and make decisions to keep those votes or not, if you really want to be thorough. Gathering as much information about the voter as you can is useful (OS, browser, referrer, plug-ins etc.) Other tricks involve checking if the voter has visited several resources from your server first before validating his vote and not giving negative feedback when this fails to make fraud attempts less likely.

How to defend against users with Multiple Accounts?

We have a service where we literally give away free money.
Naturally said service is ripe for abuse. To defend against this we do the following:
log ip address
use unique email addresses (only 1 acct/email addy)
collect more info like st. address, phone number, etc.
use signup captcha
BHOs (I've seen poker rooms use these)
Now, let's get real here -- NONE of this will stop a determined user.
Obviously ip addresses can be changed via a proxy (which could be blacklisted via akismet) but change anyways if the user has a dynamic ip or if more than one user is behind a NAT'd network (can we say almost everyone?)
I can sign up for thousands of unique email addresses each hour -- this is no defense.
I can put in fake information taken from lists for street addresses and phone numbers.
I can buy captchas from captcha solving services (1k for $5).
bhos seem only effective for downloadable software -- this is a website
What are some other ways to prevent multiple users from abusing the service? How do all the PPC people control click fraud?
I know we could actually call the person but I don't think we are trying to do that anytime soon.
Thanks,
It's pretty difficult to generate lots of fake phone numbers that can send and receive SMS messages. SMS verification could go a long way towards cutting down on fraud. Of course, it also limits you to giving away free money to cell phone owners.
I think only way is to bind your users accounts to 'real world' information, like his/her passport number, for instance. Of course, you'll need to make sure that information is securely stored and to find some way to validate it.
Re: signing up for new email accounts...
A user doesn't even need to do that. Please feel free to send your mail to brian_s#mailinator.com, or feydr.asks.a.question#spamherelots.com, or stackoverflow#safetymail.info, or my_arbitrary_username#zippymail.info. I haven't registered any of those email addresses, but all of them will work.
Those domains are owned by ManyBrain, and they (and probably others as well) set the domain to accept any email user. ManyBrain in particular then makes the inboxes for those emails publicly accessible without any registration (stripping everything by text from the email and deleting old mail). Check it out: admin#mailinator.com's email inbox!
Others have mentioned ways to try and keep user identities unique. This is just one more reason to not trust email addresses.
First, I suppose (hope) that you don't literally give away free money but rather give it to use your service or something like that.
That matters as there is a big difference between users trying to just get free money from you they can spend on buying expensive cars vs only spending on your service which would be much more limited.
Obviously many more user will try to fool the system in the former than in the latter case.
Why it matters? Because it is all about the balance between your control vs your user annoyance. I see many answers concentrating on the control part, so let's go through annoyance, shall we?
Log IP address. What if I am the next guy on the computer in say internet shop and the guy before me already used that IP? The other guy left your hot page that I now see but I am screwed because the IP is blocked. Yes, I can go to another computer but it is annoyance and I may have other things to do.
Collecting physical Adresses. For what??? Are you going to visit me? Or start sending me spam letters? Let me guess, more often than not you get addresses with misprints at best and fake ones at worst. In fact, it is much less hassle for me to give you fake address and not dealing with whatever possible spam letters I'll have to recycle in environment-friendly way. :)
Collecting phone numbers. Again, why shall I trust your site? This is the real story. I gave my phone nr to obscure site, then later I started receiving occasional messages full of nonsense like "hit the fly". That I simply deleted. Only later and by accident to discover that I was actually charged 2 euros to receive each of those messages!!! Do I want to get those hassles? Obviously not! So no, buddy, sorry to disappoint but I will not give your site my phone number unless your company is called Facebook or Google. :)
Use signup captcha. I love that :). So what are we trying to achieve here? Will the user who is determined to abuse your service, have problems to type in a couple of captchas? I doubt it. But what about the "good user"? Are you aware how annoying captchas are for many users??? What about users with impaired vision? But even without it, most captchas are so bad that they make you feel like you have impaired vision! The best advice I can give - if you care about user experience, avoid captchas as plague! If you have any doubts, do your online research first!
See here more discussion about control vs annoyance and here some more thoughts about being user-friendly.
You have to bind their information to something that is 'real world', as Rubens says. Of course, you also need to be able to verify this information (I can just make up passport numbers all day if you don't check to make sure they're correct).
How do you deliver the money? Perhaps you can index this off the paypal account, mailing address, or whatever you're sending the money to?
Sometimes the only way to prevent people abusing a system is to not have the system in the first place.
If you're doing what you say you're doing, "giving away money to people", then surprise surprise, there will be tons of people with more time available to try to find ways to game the system than you will have to fix it.
I guess it will never be possible to have an identification system which identifies fake identities that is:
cheap to run (I think it's called "operational cost"?)
cheap to implement (ideally one time cost - how do you call that?)
has no Type-I/Type-II errors
is scalable
But I think you could prevent users from having too many (to say a quite random number: more than 50) accounts.
You might combine the following approaches:
IP address: can be bypassed with VPN
CAPTCHA: can be bypassed with human farms (see this article, for example - although they claim that their test can't be that easily passed to other humans, I doubt this is true)
Ability-based identification: can be faked when you know what is stored and how exactly the identification works by randomly (but with a given distribution) acting (example: brainauth.com)
Real-world interaction: Although this might be the best one, but I guess it is expensive and not many users will accept it. Also, for some users/countries it might not be possible. (example: Postident in Germany, where the Post wants to see your identity card. I guess this can only be faced in massive scale by the government.)
Other sites/resources: This basically transforms the problem for other sites. You can use services, where it is not allowed/uncommon/expensive to have much more than 1 account
Email
Phone number: e.g. by using SMS, see Multi-factor authentication
Bank account: PayPal; transfer not much money or ask them to transfer a random (small) amount to you (which you will send back).
Social based
When you take the social graph (vertices are people, edges are connections), you will expect some distribution. You know that you are a single human and you know some other people. So you have a "network of trust" (in quotes, because I think this might be used in other context as well). Now you might not trust people / networks how interact heavily with your service, but are either isolated (no connection) or who connect a large group with another large group ("articulation points"). You also might not trust fast growing, heavily interacting new, isolated graphs.
When a user provides content that is liked by many other users (who you trust), this might be an indicator that there is a real human creating it.
We had a similar issue recently on our website, it is really a hassle to solve this issue if you are providing a business over one time or monthly recurring free credits system.
We are using a fraud detection solution https://fraudradar.io for a while and that helped us a lot to clean out most of the spam activities. It is pretty customizable with:
IP checks
Email domain validity
Regex rules
Whitelisting options per IP, email domain etc.
Simple API to communicate through
I would suggest to check that out.

Generally a Good Idea to Always Hash Unique Identifiers in URL?

Most sites which use an auto-increment primary-key display it openly in the url.
i.e.
example.org/?id=5
This makes it very easy for anyone to spider a site and collect all the information by simply incrementing the value of id. I can understand where in some cases this is a bad thing if permissions/authentication are not setup correctly and anyone could view anything by simply guessing the id, but is it ever a good thing?
example.org/?id=e4da3b7fbbce2345d7772b0674a318d5
Is there ever a situation where hashing the id to prevent crawling is bad-practice (besides losing the time it takes to setup this functionality)? Or is this all a moot topic because by putting something on the web you accept the risk of it being stolen/mined?
Generally with web-sites you're trying to make them easy to crawl and get access to all the information so that you can get good search rankings and drive traffic to your site. Good web developers design their HTML with search engines in mind, and often also provide things like RSS feeds and site maps to make it easier to crawl content. So if you're trying to make crawling more difficult by not using sequential identifiers then (a) you aren't making it more difficult, because crawlers work by following links, not by guessing URLs, and (b) you're trying to make something more difficult that you also spend time trying to make easier, which makes no sense.
If you need security then use actual security. Use checks of the principal to authorize or deny access to resources. Obfuscating URLs is no security at all.
So I don't see any problem with using numeric identifiers, or any value in trying to obfuscate them.
Using a hash like MD5 or SHA on the ID is not a good idea:
there is always the possibility of collisions. That is, two different IDs hash to the same value.
How are you going to unhash it back to the actual ID?
A better approach if you're set on avoiding incrementing IDs would be to use a GUID, or just a random value when you create the ID.
That said, if your application security relies on people not guessing an ID, that shows some flaws elsewhere in the system. My advice: stick to the plain and easy auto-incrementing ID and apply some proper access control.
I think hashing for publicly accessible id's is not a bad thing, but showing sequential id's will in some cases be a bad thing. Even better, use GUID/UUIDs for all your IDs. You can even use sequential GUIDS in a lot of technologies, so it's faster (insert-stage) (though not as good in a distributed environment)
Hashing or randomizing identifiers or other URL components can be a good practice when you don't want your URLs to be traversable. This is not security, but it will discourage the use (or abuse) of your server resources by crawlers, and can help you to identify when it does happen.
In general, you don't want to expose application state, such as which IDs will be allocated in the future, since it may allow an attacker to use a prediction in ways that you didn't forsee. For example, BIND's sequential transaction IDs were a security flaw.
If you do want to encourage crawling or other traversal, a more rigorous way would be to provide links, rather than by providing an implementation detail which may change in the future.
Using sequential integers as IDs can make many things cheaper on your end, and might be a resonable tradeoff to make.
My opinion is that if something is on the web, and is served without requiring authorization, it was put with the intention that it should be publicly accessible. Actively trying to make it more difficult to access seems counter-intuitive.
Often, spidering a site is a Good Thing. If you want your information available as much as possible, you want sites like Google to gather data on your site, so that others can find it.
If you don't want people to read through your site, use authentication, and deny access to people who don't have access.
Random-looking URLs only give the impression of security, without giving the reality. If you put account information (hidden) in a URL, everyone will have access to that web spider's account.
My general rule is to use a GUID if I'm showing something that has to be displayed in a URL and also requires credentials to access or is unique to a particular user (like an order id). http://site.com/orders?id=e4da3b7fbbce2345d7772b0674a318d5
That way another user won't be able to "peek" at the next order by hacking the url. They may be denied access to someone else's order, but throwing a zillion letters and numbers at them is a pretty clear way to say "don't mess with this".
If I'm showing something that's public and not tied to a particular user, then I may use the integer key. For example, for displaying pictures, you might wish to allow your users to hack the url to see the next picture.
http://example.org/pictures?id=4, http://example.org/pictures?id=5, etc.
(I actually wouldn't do either as a simple GET parameter, I'd use mod_rewrite (or something) to make readable urls. Something like http://example.org/pictures/4 -> /pictures.php?picture_id=4, etc.)
Hashing an integer is a poor implementation of security by obscurity, so if that's the goal, a true GUID or even a "sequential" GUID (whether via NEWSEQUENTIALID() or COMB algorithm) is much better.
Either way, no one types URLs anymore, so I don't see much sense in worrying about the difference in length.

Best practices for developers in dealing with clients

Personally, I've found that when good developers deal with clients, they often get sucked into the after-sales support process and this process has been difficult to reverse, so was just interested to hear the various strategies that developers employ in maintaining a healthy, useful relationship that keeps clients using the right person at the right time.
So do you and, if so, how do you deal with clients?
Just a tip: Write down every single thing a client says to you.
Most of the projects I work on are done on time-and-materials contracts, which means: we give the customer an initial estimate of how long the project will take but bill for actual hours worked, whether over or under the estimate (I don't know why a client would agree to this, but they do). Once the project is "complete" and in production, we set up a service extension to the time-and-materials contract, creating a block of billable hours to cover after-sales support. When a client is aware that they're being billed for all contact with us, they tend to keep that contact to a minimum.
One other point: I've found that it's best to communicate with clients via email where possible. It's a much more efficient way to transfer information (assuming everyone involved can write), and it leaves a permanent record of what the client told you to do.
I'd go the opposite of what have been said.
The client is your number one information source
Avoid intermediaries (human and technical)
Keep tracks (not to use it against the customers, even if it can happen, but because he pays to get what he wants)
Communicate - on your initiative - in a short regular basis but for small amount of times.
Any doubt can be cleared asking the good questions. The guy don't want that ? Get rid of it (even if you like it better). The guy want that ? Why not, add time and money on the contract.
You must train your communication skills
Most of what has been said here before is essentially related to the fact that programmers usually have poor communications skills. So they fall into the typical traps :
customers give them bad info
they waste time
they get stressed
At the end, nobody is happy.
But with trained communication skills you will learn to direct when, how long and about what your chats will be, and so :
Make any deal quick and nice
Give confidence to the client
Understands what the client wants (not what he says he wants)
Ensure is satisfied with the answer (even if it's nonsens for you)
Everybody will be happier : the customer will feel good and let you work in peace while you will have the information to keep working. Eventually, the resulting software will be better.
Think talking to customer is boring ? They think it too. And paperwork is boring as well, but you must do it, so do it well instead of looking for excuses.
This is a pain we feel as well. Once you help out a customer it is too easy for the customer to directly contact the developer later on and request support. And since we usually aim to please, and probably feel sort of responsible when the application we built for them has a problem, we too often give the customer a quick helping hand.
I think that the developers should be separated from the customers, but this requires that the company has a support/concultancy department which can fix the problem instead. They in turn should be free to contact the developer, unless it's a huge company with a mainstream application where there is a less risk that the problem can be traced back to a problem with the sourcecode.
But let me tell you, I understand how difficult this is. I've been working in our consultancy shop for many years, starting from support and now I'm mostly managing the other consultants and developing. There are a lot of customers (like hundreds) who feel they have a personal relationship with me, and assume that they can call me directly even after years and years.
My tip is to make sure you have a good network of concultants and supportworkers who can help the customer for you, and have them contact you instead if they can't figure it out.
I just finished my education and am working at my first job, but here is what we do:
I communicate through a third party from the same company with "higher rank". The third party is someone knowledgeable of the requirements the software should have, but not in software engineering. When I ask about specifications, or send them proposals he distills the essence of their answers send them to me.
I think this way of working with stuff limits the amount of bullying a customer can get away with when it comes to changing specs, expanding specs etc.
For me it's especially useful since I'm only 21 years old, and people might have trouble believing I can get things done.
best practices:
Remember the client is the one who signs the checks.
Users work for the client.
Refer any user requests to the client for approval.
Always deal with the client because they understand that everything you do will cost them money.
If the client wants after the sale support and is willing to pay for it then give it to him cheerfully.
Oh and what MusiGenesis said!
The best way is to never ever ever give your direct line to a customer. Have them go through Tech support (if it exists) first. We employ this method and it works well. The software developers are the last resort - for things that support simply can't do/don't know how to fix -- such as a DBA not knowing that the servers are instanced. But it will cut down on the "it's not connecting to the internets" type of phone calls.
You could also force all support requests to go through email/secretary. At that point, you can discern which ones are critical, and which ones can be solved with a simple 'tutorial' on how to fix the problem.
And as stated above - record EVERYTHING in an exchange with a customer. Doing so prevents the 'well he said she said' deal that customers can fall into.
Then again -- if you're getting a ton of customer support issues, you should be looking at the cause of it - whether it's a training issue, or whether the software is legitimately buggy.
In our company, every developer is also a salesman. If I step over the door of a Customer then I'm in a good position to make more business.
They know me and I have credabillity because I've allready delivered to them.
I have knowledge about their business
I use my knowledge to ask questionas about other parts in their business
I plant hooks to them when I talk to them, in their best interrests of course.
I make clear that we are not a "hit and run"-company, but there to really support their business.
Maybe this is not how all company does, but I think you should use the people you have that allready has a foot inside the customers company to really work with them and make more business and tie the customer tighter to you.
I personally think developers should never interact with clients. This is why you have the Q/A team. They get requirements, hand them to developers, discuss any issues, schedule development progress meetings. If developers have questions, the go to the Q/A personnel responsible for the requirements and documentation. Developers are engineers, not salesmen or negotiators. They should be given environment to develop stable, working code without getting distracted by customer phone calls. This is how many companies deal with customers regardless of company size. In the end, your chances of completing a project on time are higher than when you customer calls up and decides to change requirements or requests a feature. Which would probably mean you have to go back a couple of iterations and change something that may break everything completed past that point.
Lots and lots of communication. Communication can be as simple as checking in with your customers by stopping by at their desks (if you are co-located) or keeping in touch over the phone. The more personal the communication is (in-person beats phone call, phone call beats email, etc.), the stronger your relationship will be.
Another good conflict resolution practice I've used is keeping as much information as possible in a single, shared place. I've used a bug/feature database (JIRA), a wiki, and even a network share drive for this purpose, but the point is that neither party has exclusive lock/write access. Updates can be made together with your customers, and there is a clear, public record of the change history of your system.