What is a Geo Cookie? How do I implement one? - html

I've been tasked with using a 'Geo Cookie' to identify where in the world users are accessing the site from, and to redirect them to a site configured for their region.
I understand what a cookie is in the context of the web, but I'm not sure about a geo cookie, partly because I've never heard of one before.
Could somebody please tell me if this a real thing, or if it could be a term that was made up by people who didn't know the proper name for what they're really talking about? (this happens quite regularly, it seems). Is there any way for me to identify where the user is located?

I'd never heard of the term before, and doing a little googling confirmed my suspicion that this is not a common term in usage.
My thoughts, reading off of the first search result from that query, is that you're being asked to implement a cookie that stores the users region. This can be done with a normal cookie, and you can give it the name "GEO" - viola! A Geo Cookie.
If I were you, I'd go back to the stakeholder and find out what this requirement means.
For determining where a user is located (without asking the directly) you can use the ip of the user, and a geo-location service. See Know a good IP address Geolocation Service for more info on that.

A geo cookie is just a cookie which you store the geo data in.
This allows you to:
Avoid having to hit the geo/ip database with each request
Allow users to override the location (e.g. I know it looks like I'm in Spain, and as it happens, I am (yay holidays), but I want data for England, that is where I live, and that is where I want my results localised to).

Related

Are google's search results influenced by our data?

I have always wondered that.
For example, If I search for the term "composer" or "what is composer", it shows the php package manager. Why does it show programmer-related results? Obviously, it makes sense that it does that, since the results I get are much more relevant to me.
What if an aspiring composer googles that? What results will they get?
Another example is, if I enter the word "spring" to the search engine, it shows the spring framework, instead of, let's say, the season.
So, my question(s):
Does google actually use the data it collects to show relevant search results? (I am not talking about ads, but search results)
If yes, why doesn't incognito mode work?
How can I avoid google using other parameters, besides the very term I typed in, to affect the search results?
Yes. This is the very core of Google's business model. The same data that influences search results is also applied to ad placement (see their real-time bidding system); when you do searches, it's likely you will see ads about the same subjects fairly soon afterwards.
Incognito mode is a very limited form of anonymisation; it's really not very anonymous at all. If you visit a page in a browser that has some google-controlled element (e.g. Google Analytics, a CDN JS library, or a font), then shortly afterwards perform a google search, there will be very many points in common that allow google to match you as very likely the same person (e.g. your IP, time of day, recent similar requests, user agent string, window size, fonts available) even if it blocks cookies that would identify you explicitly. This form of fingerprinting is quite hard to avoid, though Safari is a lot better at it than Chrome. Tor provides much more robust anonymisation by normalising many fingerprintable elements, as well as hiding your IP.
That's difficult because making use of all this information will indeed lead to generally more relevant search results, so it's in Google's interests to use whatever it can (within technical and mostly legal limits). Tor will disconnect the search results from you, but it may instead provide you with results linked to whoever else might have been using the same Tor exit node as you recently, which might not be pleasant! The same would apply to using VPN services.

is it possible to check if someone is trying to vote from the same computer

From my knowledge the answer to this question is no, but i might be missing something. There are some polling sites which claim that voting from the same computer is forbidden and will result in a ban. How can they detect that? A cheater may use routable IPs, different operating systems, different browsers, proxies etc.
Cookies, and assumption that the multi-account cheater will sooner or later forget to clear cookies or use wrong browser and they will got him.
To sum up all that I could find: the answer is no. You can't be 100% sure that the user is trying to vote from the same computer. Advanced solutions to prevent this is to check the IP and, using maybe a java applet, find out the user's computer account name, and maybe set a limit to the number of votes coming from that IP anyway. Also, a human should be screening the results anyway and see how many of those IPs belong to institutions or college campus blocks, and make decisions to keep those votes or not, if you really want to be thorough. Gathering as much information about the voter as you can is useful (OS, browser, referrer, plug-ins etc.) Other tricks involve checking if the voter has visited several resources from your server first before validating his vote and not giving negative feedback when this fails to make fraud attempts less likely.

How to defend against users with Multiple Accounts?

We have a service where we literally give away free money.
Naturally said service is ripe for abuse. To defend against this we do the following:
log ip address
use unique email addresses (only 1 acct/email addy)
collect more info like st. address, phone number, etc.
use signup captcha
BHOs (I've seen poker rooms use these)
Now, let's get real here -- NONE of this will stop a determined user.
Obviously ip addresses can be changed via a proxy (which could be blacklisted via akismet) but change anyways if the user has a dynamic ip or if more than one user is behind a NAT'd network (can we say almost everyone?)
I can sign up for thousands of unique email addresses each hour -- this is no defense.
I can put in fake information taken from lists for street addresses and phone numbers.
I can buy captchas from captcha solving services (1k for $5).
bhos seem only effective for downloadable software -- this is a website
What are some other ways to prevent multiple users from abusing the service? How do all the PPC people control click fraud?
I know we could actually call the person but I don't think we are trying to do that anytime soon.
Thanks,
It's pretty difficult to generate lots of fake phone numbers that can send and receive SMS messages. SMS verification could go a long way towards cutting down on fraud. Of course, it also limits you to giving away free money to cell phone owners.
I think only way is to bind your users accounts to 'real world' information, like his/her passport number, for instance. Of course, you'll need to make sure that information is securely stored and to find some way to validate it.
Re: signing up for new email accounts...
A user doesn't even need to do that. Please feel free to send your mail to brian_s#mailinator.com, or feydr.asks.a.question#spamherelots.com, or stackoverflow#safetymail.info, or my_arbitrary_username#zippymail.info. I haven't registered any of those email addresses, but all of them will work.
Those domains are owned by ManyBrain, and they (and probably others as well) set the domain to accept any email user. ManyBrain in particular then makes the inboxes for those emails publicly accessible without any registration (stripping everything by text from the email and deleting old mail). Check it out: admin#mailinator.com's email inbox!
Others have mentioned ways to try and keep user identities unique. This is just one more reason to not trust email addresses.
First, I suppose (hope) that you don't literally give away free money but rather give it to use your service or something like that.
That matters as there is a big difference between users trying to just get free money from you they can spend on buying expensive cars vs only spending on your service which would be much more limited.
Obviously many more user will try to fool the system in the former than in the latter case.
Why it matters? Because it is all about the balance between your control vs your user annoyance. I see many answers concentrating on the control part, so let's go through annoyance, shall we?
Log IP address. What if I am the next guy on the computer in say internet shop and the guy before me already used that IP? The other guy left your hot page that I now see but I am screwed because the IP is blocked. Yes, I can go to another computer but it is annoyance and I may have other things to do.
Collecting physical Adresses. For what??? Are you going to visit me? Or start sending me spam letters? Let me guess, more often than not you get addresses with misprints at best and fake ones at worst. In fact, it is much less hassle for me to give you fake address and not dealing with whatever possible spam letters I'll have to recycle in environment-friendly way. :)
Collecting phone numbers. Again, why shall I trust your site? This is the real story. I gave my phone nr to obscure site, then later I started receiving occasional messages full of nonsense like "hit the fly". That I simply deleted. Only later and by accident to discover that I was actually charged 2 euros to receive each of those messages!!! Do I want to get those hassles? Obviously not! So no, buddy, sorry to disappoint but I will not give your site my phone number unless your company is called Facebook or Google. :)
Use signup captcha. I love that :). So what are we trying to achieve here? Will the user who is determined to abuse your service, have problems to type in a couple of captchas? I doubt it. But what about the "good user"? Are you aware how annoying captchas are for many users??? What about users with impaired vision? But even without it, most captchas are so bad that they make you feel like you have impaired vision! The best advice I can give - if you care about user experience, avoid captchas as plague! If you have any doubts, do your online research first!
See here more discussion about control vs annoyance and here some more thoughts about being user-friendly.
You have to bind their information to something that is 'real world', as Rubens says. Of course, you also need to be able to verify this information (I can just make up passport numbers all day if you don't check to make sure they're correct).
How do you deliver the money? Perhaps you can index this off the paypal account, mailing address, or whatever you're sending the money to?
Sometimes the only way to prevent people abusing a system is to not have the system in the first place.
If you're doing what you say you're doing, "giving away money to people", then surprise surprise, there will be tons of people with more time available to try to find ways to game the system than you will have to fix it.
I guess it will never be possible to have an identification system which identifies fake identities that is:
cheap to run (I think it's called "operational cost"?)
cheap to implement (ideally one time cost - how do you call that?)
has no Type-I/Type-II errors
is scalable
But I think you could prevent users from having too many (to say a quite random number: more than 50) accounts.
You might combine the following approaches:
IP address: can be bypassed with VPN
CAPTCHA: can be bypassed with human farms (see this article, for example - although they claim that their test can't be that easily passed to other humans, I doubt this is true)
Ability-based identification: can be faked when you know what is stored and how exactly the identification works by randomly (but with a given distribution) acting (example: brainauth.com)
Real-world interaction: Although this might be the best one, but I guess it is expensive and not many users will accept it. Also, for some users/countries it might not be possible. (example: Postident in Germany, where the Post wants to see your identity card. I guess this can only be faced in massive scale by the government.)
Other sites/resources: This basically transforms the problem for other sites. You can use services, where it is not allowed/uncommon/expensive to have much more than 1 account
Email
Phone number: e.g. by using SMS, see Multi-factor authentication
Bank account: PayPal; transfer not much money or ask them to transfer a random (small) amount to you (which you will send back).
Social based
When you take the social graph (vertices are people, edges are connections), you will expect some distribution. You know that you are a single human and you know some other people. So you have a "network of trust" (in quotes, because I think this might be used in other context as well). Now you might not trust people / networks how interact heavily with your service, but are either isolated (no connection) or who connect a large group with another large group ("articulation points"). You also might not trust fast growing, heavily interacting new, isolated graphs.
When a user provides content that is liked by many other users (who you trust), this might be an indicator that there is a real human creating it.
We had a similar issue recently on our website, it is really a hassle to solve this issue if you are providing a business over one time or monthly recurring free credits system.
We are using a fraud detection solution https://fraudradar.io for a while and that helped us a lot to clean out most of the spam activities. It is pretty customizable with:
IP checks
Email domain validity
Regex rules
Whitelisting options per IP, email domain etc.
Simple API to communicate through
I would suggest to check that out.

Building an index of URLs , what features to include?

I am working towards building an index of URLs. The objective is to build and store a data structure which will have key as a domain URL (eg. www.nytimes.com) and the value will be a set of features associated with that URL. I am looking for your suggestions for this set of features. For example I would like to store www.nytimes.com as following:
[www.nytimes.com: [lang:en, alexa_rank:96, content_type:news, spam_probability: 0.0001, etc..]
Why I am building this? Well the ultimate goal is to do some interesting things with this index, for example I may do clustering on this index and find interesting groups etc. I have with me a whole lot of text which was generated by whole lot URLs over a period of whole lot of time :) So data is not a problem.
Any kind of suggestions are very welcome.
Make it work first with what you've already suggested. Then start adding features suggested by everybody else.
ideas are worth nothing unless
executed.
-- http://www.codinghorror.com/blog/2010/01/cultivate-teams-not-ideas.html
I would maybe start here:
Google white papers on IR
Then also search for white papers on IR on Google maybe?
Also a few things to add to your index:
Subdomains associated with the domain
IP addresses associated with the domain
Average page speed
Links pointing at the domain in Yahoo -
e.g link:nytimes.com or search on yahoo
Number of pages on the domain - site:nytimes.com on Google
traffic nos on compete.com or google trends
whois info e.g. age of domain, length of time registered for etc.
Some other places to research - http://www.majesticseo.com/, http://www.opensearch.org/Home and http://www.seomoz.org they all have their own indexes
I'm sure theres plenty more but hopefully the IR stuff will get the cogs whirring :)

How should web sites deal with localization settings? (from “What are common UI misconceptions and annoyances?”)

I’ve chosen to take this as a question in its own right since it was generating so much debate in the comments of the original post.
It’s interesting to see that a lot of people on SO (who are developer's) just don't get localization. Here’s my take on how it should work:
In all browsers that I've looked at (and for the .NET developers out there too) when you look at a user's culture preferences it is in the following format:
language-Culture.
So we have:
en-GB - English language - UK culture
en-US - English language - US culture
en - English language - Invariant culture.
fr-FR – French language – French culture
fr-CH – French language – Swiss culture
de-CH – German language – Swiss culture
de-DE – German language – German culture
See MSDN for a complete list that the .NET framework supports.
When I go to a website it knows that I want the English language from the en part and it knows I’m interested in it being slanted to the UK (number formatting, date formatting). So when I go to google.com and it takes me to google.de (because of my IP address) that’s completely fine if google.de displays everything to me in English but completely wrong since google.de is in German. I have little control over my IP address but complete control over my language and culture settings. If you’re interested Microsoft’s new search engine (bing.com) handles things properly. Let's hope Microsoft can learn how to do search as well as Google or Google can learn to localize as well as Microsoft ;)
MSDN has another good article here for more information
So what are your recommendations for how sites should deal with localizations?
The solution here is so simple, it's annoying that dev's do anything else.
Respect the browser setting. If it says English then by god it's English.
If you absolutely must, then simply add a button at the top to pick something else. Then, and ONLY then, do you override the browser.
If you think your way is better. Stop, have someone slap you. It's not. Repeat as necessary.
Get rid of those web splash pages that ask for someone's country. Just show your normal page, based off of the browser defaults, and see item 2 above. I have yet to run into a site where it actually matters. update: a few years later and there is now a reason to do this. In 2013 the UK instituted policies surrounding cookies that website operators need to respect for sites based in that country that are serving pages to visitors from that country. So pay attention to the laws in the countries you are hosted in.
IF you happen to have a site that really is served by multiple servers across multiple countries, then you can probably detect which one of your servers is really closer to serve from. If you can't, just stop the redirecting madness and then don't try and make a determination for them.
If localization settings are available - including, but not limited to, the HTTP Accept-Language header - then websites absolutely should respect them.
The common argument against this is that "average users" aren't smart enough to find the language settings and configure them to match their own preferences, so these settings are, more often than not, incorrect (unless the user happens to be within the US).
That is the wrong solution.
If a substantial segment of the user population can't find (or can't be bothered to find) their browser's language settings, then the correct response is to make them easier to find, not for sites to ignore what they've been set to. Perhaps make language settings directly accessible from the program's top level menu instead of burying it inside an over-complicated "Preferences" dialog. Perhaps ask for language preferences the first time the program is run. Perhaps use the operating system's localization settings. Or maybe something completely different, if that's what it takes to make it near-certain that the browser will be sending correct information about the user's preferences. But don't just throw up your hands, say "it's useless and can't be fixed!", and ignore it.
Other answers have talked about letting the user choose a language or locale in their profile on the site, which is also important and absolutely should be standard, but that's just to provide a site-specific override to the user's normal settings. If the user has not overriden this on the site, though, the correct action is to default to the most-preferred available language/locale as specified in their browser settings, not to base it on geolocation of their IP address.
At one point in my career, I maintained parts of TCP/IP stack. That puts me in the somewhat rare position of knowing very well that IP addresses should not be used as anything other than Network-layer addresses. Any association between an IP address and a location is all but coincidental - it's an artifact of the way addresses are distributed, not any fundamental part of what an IP address means.
(They're also not useful as the unique identifier of a computer, but that's a different story)
I suggest leaving geolocation out of it. The HTTP standard includes a way for a browser or other user agent to include the users culture preferences with each request (and remember, it's a list of weighted preferences, not necessarily just one culture). Since the browser is closer to the user than you are, you should honor this request, at least as the default.
It's ok to then permit the user to change their preference for your site, either temporarily or permanently. It's even ok to allow the user to choose to view different content with different culture settings. A wild example would be a site that includes both political news and technical information. It's quite reasonable that someone would want the news in their "natural" language, but the technical information in English.
Finally, it's ok to have a fallback pattern. If, for instance, you have a site that services users based on their region (resellers, for instance), then it's possible that Japanese content only exists on your Asian regional sub-site. A Japanese-speaking user visiting your EMEA site might just be stuck seeing English content, which might very well be his last choice.
On the sites I create I usually follow this pattern:
Each page has a unique URL with the language in it somewhere, usually like /en/page or a different (sub)domain
If the user opens a URL with an unspecified language like /page I start to guess:
Is a cookie from a previous session is available?
If not, is Accept-Language available and can I map it to a language available on the site?
If not, if it's a possibility, can I guess by IP?
If not, default to the site's default language.
I set a cookie with the guessed language and redirect the user to a site with the appropriate URL
I put a language switch on every page, so /en/page can easily be switched to /xx/page
Cookie gets updated if the user switches to a different page
Ideally I only have to guess once and from then on use the user's cookie, or the user visits the desired page directly.
I agree, give the user the chance to override them with user preferences in your app. This is especially handy for things like timezone localization issues which you can't derive from browser settings.
I risk being considered impolite, but I think my post on this topic will have more informative answers, mostly because my post is really a question. I am sorry though that I did not find that post before.
There's a difference between smart defaults and the ability of users to override them. In big apps I've worked on, I've assumed the user's locale from browser settings, geolocation, etc. -- but always given users a way to easily switch.
I don't know how else one would do that. Not giving users a chance to correct your assumptions is deeply problematic, because you're going to get it wrong some of the time.
ADDITION:
I think your problem here is that while you can edit your locale settings, if they look basically identical to the default, there's no way for an application developer to tell if you left it as-is intentionally, or because you don't know how or why to change it.
I suggest honoring users' localization settings, except if the setting is the overwhelming default, which users may not change. For example, I believe the great majority (90+%) of users with an en-us setting geolocated in Vietnam would almost always be better served by seeing Vietnamese content, rather than US English content, as long as there's a trivial way to switch locales. On the flip side, if a user geolocated in the US has a Vietnamese setting, by all means give him or her Vietnamese content.
Is this irritating for US-English users in Vietnam? Sure. But it's also the greatest good for the greatest number, and helps ensure that average non-technical users get the best real-world experience. Until we can hold a gun to users' heads and force them to honestly declare their language/culture preferences before turning on a computer, we're going to need heuristics like this.
I have seen enough forceful bug reports from customers that when investigated turn out to be that one of there users had the browser's culture setting wrong, that we now let the customer override the browsers with a config setting. The browser's culture setting is wrong often enough that is it not very useful, it is also too hard for most end users to find or change it.