How can I stop Google crawling webservice URLs? - html

I am finding that GoogleBot is crawling webservice URLs which are referenced in JavaScript/AJAX code. The URL is already in robots.txt as an exclusion, but Google no longer seems to obey robots.txt when determining what to crawl - it only seems to use it to know what not to index.
Thankfully these service URLs only return data rather than performing actions but it's messing up the statistics we collect which is highly undesirable. I cannot personally see how Google is even finding out the URL of the webservice unless it crawls arbitrary strings in Javascript code (which seems unlikely?).
For some URLs this also results in me getting LOTS of Elmah error messages from the website which say:
System.InvalidOperationException: Request format is unrecognized for URL unexpectedly ending in '/GetShortlists'." ... as Google tries to GET the URL when it only supports POST.
The code it's finding the URLs in is as follows:
function GetShortlistsForUser() {
$.ajax({
type: "POST", url: "/WebService/WebService.asmx/GetShortlists",
contentType: "application/json; charset=utf-8",
dataType: "json",
success: function (data) { /*--CUT--*/ });
}
});
So should I obfuscate the URL somehow by perhaps replacing the slashes, or is there a better way to stop these getting crawled?

(1) Try break the url format in your javascript codes, e.g,
var breaker="x/G";
......
url: "/WebServic"+"e/WebService."+"asm"+breaker+"etshortlists",
since Google may use regex to determine which part is url... (I am not sure if this could prevent crawlers, but if it works, you don't need to break it to this extend as it also breaks code reading experience.)
(2) On your server, Google crawler typically use customized agent string, so you can deny it (or ignore it).

Related

How to read MediaWiki API JSON response

I am trying to search images on Wikimedia Commons, using MediaWiki API. Here is my requested URL with search params:
https://commons.wikimedia.org/w/api.php?action=query&list=allimages&format=json&aifrom=Dada
I am succeed to get response in JSON format, but I could not read it programmatically because:
No 'Access-Control-Allow-Origin' header is present on the requested
resource.
Any advice?
UPDATE:
I have added one more param to the url: callback=JSON_CALLBACK, which transforms response to jsonp format. Now it possible to use angular $http.jsonp() method also.
use jsonp1 as dataType to prevent the "No 'Access-Control-Allow-Origin' header is present on the requested resource." error. Then it works :
$.ajax({
dataType: 'jsonp',
url : 'https://commons.wikimedia.org/w/api.php?action=query&list=allimages&format=json&aifrom=Dada',
success : function(json) {
json.query.allimages.forEach(function(item) {
$('<img/>', { src : item.url }).appendTo('#images');
})
}
})
Here I add each image to a #images <div> just for demonstration purposes.
demo -> http://jsfiddle.net/52g2Lazw/
1) JSONP stands for “JSON with Padding” and it is a workaround for loading data from different domains. It loads the script into the head of the DOM and thus you can access the information as if it were loaded on your own domain, thus by-passing the cross domain issue cite.
If you are accessing the Wikimedia Commons API from a Wikimedia wiki, you can use the origin parameter of the API and this way make the API setting CORS headers. E.g. on en.wikipedia.org, you could access the Commons API this way:
$.get('https://commons.wikimedia.org/w/api.php?' +
$.param({
format: 'json',
action: 'query',
list: 'allimages',
aifrom: 'Dada',
origin: location.protocol + '//' + location.host
})).done(function() { /*...*/ });
It is generally safer to use JSON (a pure data format) than loading and executing JavaScript (JSONP) file that could, in theory, do evil with your visitors. I would probably set up a proxy server for this purpose, instead of using JSONP. A simple web search for set up a proxy json may result in plenty of useful results.

How To Tell When A JSON Request Is Received In Wordpress?

I am using a JSON API plugin for wordpress to allow me to work with the sites content in a phonegap application I'm building.
However due to the complexity of some of the content on the site (caused by shortcodes outputting graphs, sliders etc..) these aren't suitable to be displayed in the mobile app. I need to remove the shortcodes from the JSON output.
I have found that I can hook into the_content filter in wordpress and use remove_shortcode to take out the necessary shortcodes. But the problem is I can only do this when I access the json url via my browser.
For example, I may use http://example.com?json=1 to return recent posts. If I type this in my url bar I can parse the url, determine that json=1 is there and strip the shortcodes.
However when I am doing an ajax (JSONP) request from my mobile application, it doesn't appear to be able to check the url for the json parameter, thus my shortcodes are not being stripped. I can't even pass in any headers either as they won't make it because of the nature of JSONP requests I believe.
Has anyone got any ideas as to how I can figure out when a JSON request from my mobile application is received, so that I can then remove the shortcodes?
Something like
if(is_json()){
//remove shortcodes
}
And before it's brought up, I have asked this on the Wordpress Stackexchange but to no avail
Update:
Here is the code I use for the ajax request from the mobile app
$.ajax({
url: "http://www.example.com/?json=1",
dataType: "jsonp",
async: true,
success: function(result) {
app.populate(result)
},
error: function(request, error) {
alert('Network error has occurred please try again!');
}
});
Prompted by one of the comments, I found what I needed in the JSON-API plugin files.
If you look in json-api/models/post.php there's a function set_content_value() which shows where the plugin is pulling in the content. Here you can modify it as needed, in my case I used it to remove certain shortcodes with the Wordpress remove_shortcode() function
Can't you just use the remove_shortcode function anytime your plugin serves content to a client?
Could you also give us the name / url of your plugin?
Maybe a bit of code woudln't hurt either. Would you mind giving use your phonegap application's API request code snippet?
Thanks.

Calling .Net web service from jQuery+Ajax

I'm trying to call a homemade vb.net web service using jQuery+Ajax and I'm struggling with the specifics.
Here's a small function exposed as a web method:
<WebMethod()> <ScriptMethod(ResponseFormat:=ResponseFormat.Xml, UseHttpGet:=True)> _
Public Function GetAllVotes() As XmlDocument
Dim theVotes = getVotes()
Dim strResult As String = theVotes.XMLSerialize
Dim doc As XmlDocument = New XmlDocument()
doc.LoadXml(strResult)
Return doc
End Function
After looking the web I've added the ScriptMethod attributes since i was returning XML but feel free to tell me i don't need them if that's the case.
Then, on the client side, this is the code :
function getVotes() {
$.support.cors = true;
$.ajax({
type: "GET",
contentType: "application/json",
url: "http://nhrd635:8008/votingmanager.asmx/GetAllVotes",
data: {},
dataType: "xml text jsonp",
success: function(msg) {
// Hide the fake progress indicator graphic.
// Insert the returned HTML into the <div>.
$('#myPlaceHolder').html(msg);
},
error: function(msg) {
$('#myPlaceHolder').html(msg);
// alert(msg);
}
});
}
I've tried many .. many variations of this code, using post or get, changing the content-type, with or without charset=utf-8. with and without double quotes on data: {}.
i use firebug to trace the output my request. only when i set dataType to jsonp do i ever get a result, but in all instances, the code ends up on the "error" function, even when status give 200 OK. but i know that setting it to jsonp is wrong since that gets my xml treated as actual javascript...
I've read very useful blog entries from a guy on encosia.
(sample: http://encosia.com/3-mistakes-to-avoid-when-using-jquery-with-aspnet-ajax/)
but even following his examples i am unable to get a proper return.
am i doing something wrong that's very obvious? is it the fact that i am returning an xml string rather than a json serialized string?
With more perusing of Stack Overflow and the help of Dave Ward from Encosia, I've managed to solve my problem. I've thought I should post my final solution here, in case that helps someone in the future.
First of all, Web Services were a bad way of doing it, I went with the HttpHandler solution, as suggested by Dave Ward in reply to my original question.
Returning XML was also a poor choice, that I wasn't really aware of. I added a reference to JSon.net to my project and used it to transform my object into a Json string.
I really wanted to stick to ".net only" to transform into a json string, as suggested in Dave's blog post, but somehow I struggled to learn how to instruct .net to automatically transform into Json as in Dave's example, so i took an easy way out with Json.net to "get it working"
Then, in my HttpHandler, I had the response string follow the instructions on this post from StackOverflow:
https://stackoverflow.com/a/3703221/1060133
in my case, it was :
context.Response.Write(String.Format("{0}({1});", context.Request("callback"), jsonVotes))
The jquery call also used the instructions in the above post.
Interesting note, even in a parameter-less call, you have to send empty data like so:
$.getJSON('http://url/httpHandler.ashx?callback=?', {},
function(data) {
alert(data);
}
);
Best of luck...
I think most of your trouble here probably stems from the cross-origin request (even making a request across different ports on the same machine counts). That's why you were able to get a glimmer of it working when you switched to JSONP. Unfortunately, ASMX "ScriptServices" don't support JSONP, so the data your WebMethod returned wouldn't be a valid parameter to the JSONP callback function that jQuery injects.
The best solution, if at all possible, is to get the service running on the same domain as the page that's calling it. There are various solutions to the cross-origin problem, but none of them are as widely compatible/reliable as a simple XHR request to the same domain that the page making the request resides on.
If you can't do that, consider enabling CORS support for the site serving up votingmanager.asmx. That doesn't work in most versions of IE, but will allow cross-origin requests in other browsers. More info on how to do that here: http://encosia.com/using-cors-to-access-asp-net-services-across-domains/
Tangentially, I'd avoid the extra XML serialization layer if possible. If getVotes() returns something like a List, use that as your return type and let ASP.NET automatically serialize the collection as JSON and then jQuery will automatically convert that to a JavaScript array in your success handler. More info about that here: http://encosia.com/asp-net-web-services-mistake-manual-json-serialization/

Disallowed Key Characters ajax response

I am creating a JSON object dynamically and when I send it via an ajax POST I get Disallowed Key Characters as the response. I know that my object is ok because I can create the SAME EXACT object manually and it sends fine. I tried escape() on all of my strings before adding them to the obj but that did not work either.
Am I missing something?
This is my post
$.ajax({
type: 'POST',
url: 'http://localhost/test',
data: obj,
dataType : 'JSON',
success: function(){
console.log('nice');
}
});
I am using the same obj as in this post
Add to JSON without knowing its structure
Your page encoding is probably not matching, it means the response can come with some invalid characters, for example:
ÿ¬{"Result":"A"}
You need to ensure that the encoding you are posting matches the encoding on the other side.
I just realized that my Keys have spaces in them
Yeah... the site you are connecting to is probably running CodeIgniter.
CI has some dumb broken input “cleaning” functionality that will deliberately refuse all form parameters with spaces in (or anything other than the alphanumerics and .-/:).
It turns out that this error was caused by CI's input library. On line 215 you will find _clean_input_keys function which uses preg_match() to disallow certain characters in your keys. So when you send JSON around and php recieves it as an array it can throw an error.
To fix this you can either extend the library or edit the CI core.

Sinatra, JavaScript Cross-Domain Requests JSON

I run a REST-API build on top of Sinatra.
Now I want to write a jQuery Script that fetches data from the API.
Sinatra is told to response with JSON
before do
content_type :json
end
A simple Route looks like
get '/posts' do
Post.find.to_json
end
My jQuery script is a simple ajax-call
$.ajax({
type: 'get',
url: 'http://api.com/posts',
dataType: 'json',
success: function(data) {
// do something
}
})
Actually everything works fine as long as both runs on the same IP, API and requesting JS.
I already tried to play around with JSONP for Rack without any positive results, though. Probably I just need a hint how to proceed.
Use JSONP (JSON with padding). There is a JSONP extension for Rack.
Basically, you'll call:
$.ajax({
type: 'get',
url: 'http://api.com/posts',
dataType: 'jsonp',
success: function(data) {
// do something
}
})
which translates to a request like:
http://api.com/posts?callback=someJSFunc
and your server will respond, e.g.:
someJSFunc({"json":"obj"});
Of course, clients can do JSONP requests without jQuery. The trick with JSONP is you serve scripts, which can be cross-domain, rather than pure JSON, with cannot.
Thank's for the answers so far.
You were right and jsonp would solve the problem. The code snippets for javascript work fine.
To set up Sinatra is very easy as it is build on top of Rack.
Therefore simply install the rack-contrib gem
gem install rack-rack-contrib --source=http://gems.github.com/
(or put it in your Gemfile) and add
require 'rack/contrib/jsonp'
use Rack::JSONP
to your application.
This middleware provides regular JSON to non-JSONP clients and JSONP to jQuery & co.
It might be interesting to you http://github.com/shtirlic/sinatra-jsonp — this extension adds missing functionality to sinatra
Also available as gem gem install sinatra-jsonp
Try to call
$.getJSON("http://example.com/?callback=?",function(data) { alert(data); });
In this sample main keyword is construction "callback=?", so you need to process this param in your server-side script, and make a valid JSONP, like this:
function({ "foo" : "bar" });
Where "function" is random data, which is generated by jQuery automatically. Read more here about jQuery and cross-domain JSONP.