How to prevent crawlers from following links? - html

I'm building a site that will allow sellers to:
list their products on my site
have each product link back to the seller's site
be charged for each link clicked
What I need to do now is to somehow make sure that I am only logging actual human users following the links to the sellers site. If it's a bot crawling the site, I shouldn't be charging the sellers for that.
Is there a way for me tell bots not to follow a certain link? I don't think it's nofollow as that is not intended to block access to content.

The way to tell a bot not to follow a link is precisely to add rel=nofollow to your <a> tag.
Assuming you are also logging locally before forwarding to the external url you could also check the user agent string.
In fact, if you are going to ask people to pay based on number of referrals it might be an idea to log IP address and user agent against each paid for click in case your stats are ever questioned.

You just add a [robots.txt] file, e.g. like this one.
You can find more info about [robots.txt] files on the net, e.g. in Wikipedia.

Typicall you can identify them by the user agent string. You can find a list here, can't say it's perferct, but it's a good base to extend: PHP/MySQL - an array filter for bots
Robots.txt is another way, more about it here

Related

How can act-on create a subdomain on our website

At my company, they have a Wordpress site. Disclaimer: I'm a new hire here.
They also use a third party service/website called "act-on". Within act-on, we can manage our campaigns, generate webforms that submit data back to act-on and generate anchor tags that link to resources that act-on hosts.
I want to be clear, we upload documents to act-on. Then, act-on gives us links that we can place on our website to these documents. When a user clicks the link on our website, they are taken to a subdomain of our website that they did not create, to view the resource.
When I talk about "act-on", I'm referring to this service:
https://www.act-on.com/
Example:
We live at websitename.com.
The anchor tag that act-on creates, links to solutions.websitename.com/acton/resourcename
We didn't create a page or subdomain "solutions.websitename.com" and don't have any pages that reflect that.
I need to know how this works because their google analytics doesn't seem to track page visits to this subdomain.
How has act-on created some subdomain on our website? I don't understand that process. How can act-on link to files that they host, but the url be a subdomain of our website.
Thanks,
It is very similar to another company called reachlocal. They basically proxy all your web content, and in a lot of cases they even put up proxy phone numbers and record the calls audibly and transcript them. All this in addition to marketing campaigns such as analytics, PPC and alike.
A business essentially gives them this right when signing up and are told about it upfront.
It is all for the sake of keeping record in order on file of everything taking place, with web presence and "presented in a friendly interface and graphs. Which also allows employees to listen to recorded calls to "see how the employee does"
More than likely from my experience is they were given the keys for all web presence, including web, analytics, social sites and so on by the owner or project manager.
Unfortunately, by proxying all the websites they in turn get a lot of Google ranking, but it can be a valuable service for some.
Bottom Line: Someone at your job, signed up, gave them the go ahead to perform tasks such as proxy domain names and are in fact paying them.

Post to Node.js Server from Within HTML e-mail

I am writing a simple mailing application, however I am not yet aware of the full capabilities of HTML editing within the mailing world.
I would like to give the website administrator the choice to accept or to refuse a reservation by sending him an overview of the reservation. Below in the mail I had 2 buttons in mind, accept & refuse.
I tried using a form within the HTML e-mail but almost every mailing client blocks this out.
Is there another method to do a http post command to let's say myserver.com/accept or myserver.com/refuse from within an e-mail without having to open an additional webpage?
If not, what is the best way to achieve such things?
This is a pretty relevant article: https://www.sitepoint.com/forms-in-email/
Basically he concludes that support is not reliable so you should not use forms in emails which I agree with.
Since you say you want to give this choice to a website administrator I think you probably want some sort of authentication. So I could see it working something like this...
Send the admin an email containing two links mysite.com/reservations/:reservation_id/accept and mysite.com/reservations/:reservation_id/refuse.
Admin clicks on one of the links
Link opens in the browser and your site(controller -> ReservationService) accepts or refuses based on the id and action in the url
You will have a few things to consider, such as authentication(I assume you already have this since you have the notion of website admin?), authorization(can this admin accept or deny the reservation?), does the reservation exist, has the admin already accepted or denied the reservation, etc.

Creating an anonymous link

I was wondering how I would be able to create an anonymous link (blanking the referrer) for redirection (so they are not 100% aware of where the client came from).
So for example, user visits mydomain.com/product/2/ and wants to be redirected to the cheapest offer out there othersite.com/product/aiwdkaDOW important here is that the 'othersite' has to see this request as an manual input (so it looks like that the client wrote the url down in the URL bar).
Actually I just like to create the same effect Linkonym has
Thanks in advanced.
Anonymizing a link seems a little more complex to me (due to the fact that you don't want the target link to know that the traffic came from you) but as I expected, there are APIs and even This GitHub project that might interest you.

Adding Tab to Page

I am trying to add the tab to a page I am admin of.
I use the url to do that -
http://www.facebook.com/dialog/pagetab?app_id=&next=.
Facebook shows a list of all the pages I am admin of. And that drop down has no specific sorting order.
Now my problem is - I have multiple pages with same page name. They ofcourse have different urls. I tried changing the name of pages, but due to high number of likes I can't change the names.
The only option I am left with is hit & Trial. And I have to do it for more than 30 apps.
So you understand my pain point.
Please advice any alternative.
Thanks
Pankaj
I would recommend writing down the page ids and making some sort of system for yourself to remember (perhaps only the last few digits) which page is which.
In any case, there is a way for you to add a tab application directly to a page without ever seeing that "Add Page Tab" dialog. You can do it all through the API. This means you'll need your pages access token so head on over to the Graph API Explorer, make sure you click the "get access token" button and mark the manage_pages permission.
You need to query /me/accounts to get a list of all the pages you administer.
You'll see a list with the page id, name, category... I hope you will be able to identify your page more easily here. Once you have, you'll need to get the access_token for that page. Keep a record of it - we'll need it in a few minutes. You'll also need the page id.
Modify the following URL to include the parameters we got previously -
https://graph.facebook.com/PAGE_ID/tabs?app_id=TAB_APP_ID&method=post&access_token=PAGE_ACCESS_TOKEN
Navigate to that URL and if all goes well, you'll get a simple true message indicating that the action was successful.

How do I get the text in the adress field in the browser to change when the user surfs on and outside of the page?

This is somewhat of a newbie question I'm sure and I hope the community will excuse me for not knowing this (or not knowing the appropriate search terms to resolve my question).
So, this is the deal: I'm running a small webpage with a small amount of visitors. I've written the whole page in HTML and CSS myself and I host it in my private DropBox (http://dl.dropbox.com/u/3394117/Hemsida/Psykofil/Index.html).
I've bought the domain name "www.psykofil.org" from Loopia (www.loopia.se) and I've directed this domain to the index.html file referenced to above.
Now, this is what I want to happen: I have three different places you can go to on the page (you choose where to ge through a menu on the left). When one of these links is clicked, it takes the user to another .html-file. What I would like to happen here is that this is seen in the adress field so when he or she clicks on "x", it should say www.psykofil.org/x on top. Also, when he or she navigates away from the webpage through a hyperlink I would like the adress field to update to show the new location. Right now, no matter what the user does, it always says www.psykofil.org in the adress field.
I probably should mention that my options (freely translated from swedish) when I go to the configuration of my domain name at Loopia is the following:
DNS
Parking
Forwarding (the one I'm currently using)
Send to an external URL
(Unavailable because I don't have a web hotel with Loopia) Point to another domain in the account.
(Unavailable because I don't have a web hotel with Loopia) Own homefolder for webpage.
That's because your page is inside a <frameset>, so the address bar will never update.
You say "I've directed this domain to the index.html file referenced to above." It sounds like you've set up 'domain forwarding.' Framesets are often the 'trick' hosts use to keep the same URL - embedding the pages you're 'forwarding' to in a frameset. It's called "domain masking." See http://www.hostingmultipledomainnames.com/domainforwarding.htm for a description of how it works.
If you upload your actual html files to your site root, that should do the trick. If you're not sure how to do that and you're a new webmaster, you may want to be in touch with your web host's support. Otherwise, if you want to have that domain, but keep your files in your dropbox account, your options I believe get complicated (things like reverse proxies).
UPDATED:
Typically, when people create a website, they do three thing: register a domain, buy a web hosting account, and then associate their domain with their hosting account. You've done the first part, and have found a clever way of managing the second part, but you haven't done the third part.
The process is like this:
You register your domain. I.e., you pay $10-30 a year for the exclusive right to a given domain name. Registering the domain means that when people type 'http://mysite.com' into their browser, your domain will come up. However, it's just a placeholder - there isn't any real content there. All your files and images need to be uploaded to a server in order for people to see them.
You purchase a web hosting account. Or in your case, you upload your files to a publicly-accessible server, which has the advantage of being free. You then upload all your content.
This is the part you're missing. You now need to associate your domain name with your hosting account. This typically happens without your intervention when you purchase both your domain name and your web hosting account through one company.
However, if you acquire them separately, you need to do two things:
a. Log in to your domain registrar and point the domain name to your server for your web hosting account. This is a signal to the Internet - hey, when you type in the domain name 'http://ssss.com', go to this server.
b. Log in to your web hosting account and "park" the domain at your account. This may be hard to understand at first, but basically, just telling the Internet to go to this or that server when typing in your domain name isn't very useful.
If that's all we needed to do, I could just register http://my-amazon.com and point my domain to Amazon.com. Then people could surf Amazon.com as http://myamazon.com and I could get rich from selling this now incredibly popular domain.
But that doesn't work. In order for me to actually browse the web hosting account through my domain name, I need to "add" the domain name to my hosting account. Dropbox doesn't let you do that. It's a file-sharing system, which you've cleverly used as a web host. However, you'll never be able to log into Dropbox and park your domain there, because that's not what they do.
Summary: You can think of this process like a pass in basketball. You can throw the ball by sending the user to a server, but the server has to catch it. In order to catch the ball, the server needs to know it's coming.
Your domain registrar is 'faking' this process by adding one page to its own server, which links to "http://dl.dropbox.com/yourpage/etc/etc/Index.html". This way, your domain registrar doesn't have to worry about hosting all your content and the headaches of technical support and server space.
The downside is, you don't have a webhost that allows you to park a domain at the moment. The upside is you're saving about $60-100 per year (it might be more or less in Sweden), which is what a basic "shared" hosting account would cost.
You can decide if having distinct webpages (http://psykofil.org/contact.html" etc), is worth it for you, or whether you're fine for now with the very low-cost solution that isn't perfect but at least allows people to access your site. What you've come up with is actually pretty cool, but it does have some limitations.
Finally: If you do want to go ahead an buy server space so you can host your site, it will be less of a headache to buy it through Loopla, if the price and service are good. Typically, you are given the option when making the purchase of linking your account to your already-registered domain name. Then all you need to do is use an FTP program like Filezilla to upload your content to your account, and you're done.
It seems your host is "masking" the URL, meaning actual index.html page located at "www.psykofil.org" is in fact, loading your index page located via dropbox into an "iframe" , hence your main URL does not change to reflect the changes.
Solution: Upload your file to your main host and change the default index file that has iframes with the dropbox index file.
I believe it's because you're using frames. Were you to simply link to the other html page(i.e About page) then the address bar would update.