I have a website with some large images. They are resized by default, but when you click on them, they open in a lightbox and become larger. I'd like to let the search engines know the original (bigger) images, instead of the smaller resize images included in the source code. Is there a way to let them index the bigger images?
You can't decide what google chose to index (but you can make it easier for google using thepiyush13 answer), but you can tell it what NOT to index.
Put this in your robot.txt files :
User-agent: Googlebot-Image
Disallow: /images/myImage.jpg // Put your images or directly the folder
Source : https://support.google.com/webmasters/answer/35308?hl=en
(to be adapted for other search engines)
Image sitemap is the way to go for Google
you can use Google image extensions for sitemaps to give Google more information about the images available on your pages. Image sitemap information helps Google discover images that we might not otherwise find (such as images your site reaches with JavaScript code), and allows you to indicate images on your site that you want Google to crawl and index.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://example.com/sample.html</loc>
<image:image>
<image:loc>http://example.com/image.jpg</image:loc>
</image:image>
<image:image>
<image:loc>http://example.com/photo.jpg</image:loc>
</image:image>
</url>
</urlset>
source : https://support.google.com/webmasters/answer/178636?hl=en
Related
Will this method of placing a transparency over an image prevent Google copying it
Or will it just find the image from the uncrypted code above (image.png)
Am I wasting my time using this method of masking the image to fool Google
For quick removal
Use the Remove URLs tool. You should see results fairly quickly.
For non-emergency image removal
To prevent images from your site appearing in Google's search results, add a robots.txt file to the root of the server that blocks the image. It takes longer to remove an image from search results than the Remove URLs tool, but is an Internet standard that applies to all search engines, and you have more flexible control through the use of wildcards or subpath blocking.
For example, if you want Google to exclude the dogs.jpg image that appears on your site at www.yoursite.com/images/dogs.jpg, add the following to your robots.txt file:
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg
The next time Google crawls your site, we'll see this directive and drop your image from our search results.
To remove all the images on your site from our index, place the following robots.txt file in your server root:
User-agent: Googlebot-Image
Disallow: /
Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), you'd use the following robots.txt entry:
User-agent: Googlebot-Image
Disallow: /*.gif$
By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Image Search. It will also prevent cropping of the image for display within Mobile Image Search, as the image will be completely removed from Google's Image index. If you would like to exclude the images from all Google searches (including Google web search and Google Images), specify User-agent Googlebot.
I would like to retrieve metadata beyond the technical metadata associated with an image (thus not the EXIF data, size, versions, etc) by which I mean the "non-technical" metadata e.g. image title (not filename), description, artists/creator (not uploader), creation date (of work, not file), etc.
For example the items listed under "summary" in this File page.
Is that information accessible via the API or is it considered human created page content?
This information is available on wikis using the CommonsMetadata extension if users took care to start using it. Commons should have this information for most images, see the extension page for API examples.
I have a webpage that it cannot be accessed through my website.
Say, my website is www.google.com and the webpage that I cannot access using the website is like www.google.com/iamaskingthis/asdasd. This webpage appears on the google results when I type its content, however there is nothing which sends me to that page on my website.
I've already tried analyzing the page source to find its parent location but I can't seem to find it. I want to delete that page, but since I cannot find it, I can't destroy it either.
Thank you
You can use a robots.txt file to prevent search engine bots from visiting a page, and thus not showing search results for it.
For example, you can create a robots.txt file in the root of your website and add the following content to it:
User-agent: *
Disallow: /mysecretpage.html
More details at: http://www.robotstxt.org/robotstxt.html
There is no such concept as a 'parent page'. If you mean, by which link Google found the page, plese keep in mind, that it need not be under your control: If I put a link to www.google.com/iamaskingthis/asdasd on a page on my website and thegooglebat crawls it, it will know about it.
To make it short: There is no reliable way of hiding a page on a website. Use authentication, if you want to restrict access.
Google will crawl the page even if the button is gone, as it already has the page stored in it's records. The only way to disallow google crawling to it is either robots.txt or simply deleting it off the server (via FTP or your hostings control panel).
I simply want to embed a PDF file in a web site.
The best solution I've found is Google Docs Viewer (http://docs.google.com/viewer), but it does not work for IE and obviously that is not going to work for me.
Anyone have a clean, easy solution to this?
Update: I should add that one of the benefits of embedding the PDF file the Google viewer way is that as the PDF file I link to gets updated (and it could without notice to me), my site would automatically be holding the same PDF file (provided the full pathname doesn't change, which it does not). For this reason converting the file to an image is not preferred while.
Well since you obviously don't want to force someone to download the bloated insecure pdf plugin, why not let them use the bloated insecure flash player?
http://flexpaper.devaldi.com/
But really it is just a simple as
<iframe src="path/to/pdf" width="500" height="700">
If you do stick with the pdf embedded option, Byron is right although embedded pdf files don't look so great on a webpage. Anyway, be sure to be strict about the coding. Hence:
<iframe src="path/to/pdf" width=500 height = 700>
Should be
<iframe src="path/to/pdf" width="500" height="700">
Small alteration.
Updated answer for HTML5:
<object data="filename.pdf" type="application/pdf">
Your browser does not support pdfs, <a href="filename.pdf">click here to
download the file.</a>
</object>
You can read about it here:
http://www.w3schools.com/tags/tag_object.asp
How long is the pdf file? Can't you convert it to a very long image and display that in a div with a scrollbar?
Probably the best approach is to use the PDF.JS library. It's a pure HTML5/JavaScript renderer for PDF documents without any third-party plugins.
Online demo: http://mozilla.github.com/pdf.js/web/viewer.html
GitHub: https://github.com/mozilla/pdf.js
enter link description here
You can also use Google PDF viewer for this purpose. As far as I know it's not an official Google feature (am I wrong on this?), but it works for me very nicely and smoothly. You need to upload your PDF somewhere before and just use its URL:
<iframe src="http://docs.google.com/gview?url=http://example.com/mypdf.pdf&embedded=true" style="width:718px; height:700px;" frameborder="0"></iframe>
I just ftp mine, I do not use Google or any other s/w
you must have some need other than a PDF file sitting in a directory, what is it?
Also, why would you convert the image (and reduce PDF resolution and clarity) ?
Response to Comments
That is the ugliest thing I have seen since my last trip to Africa. You are not seriously thinking of posting that much information on a single page, are you. No one can read it, and I tried every magnification.
For that amount of info, you need to take an architectural approach.
Put a few controls on the front page, and feed the user a small amount of manageable info, about the area that they chose. Only.
Get the info from the source website/database and feed it into your website/database. Only needs to change when the source data changes. The whole linkage can be automated.
Then you just create nice clean pages, with a reasonable quantity of info, in a readable form, on each page.
This is a 20th Century Timetable. Note, not a 21st century timetable (look at Berlin or München for one of those). You really can't just scan an intense doc and provide it as an embedded PDF.
Note that you do not need the elaborate controls of CityRail. you can have just a few to allow select of the line and timetable.
then produce a page that is a simple form of the CityRail page.
or (the absolute minimum) one fully viewable, full size PDF per web page.
.
Like this simple viewable PDF. That example could be served up in one PDF for page 1 and 4 separate PDF pages for the rest; PDFs already have basic navigation, so I have used that feature and produced one 5-pager instead. Make sure you find and use the blue glass buttons and follow the navigation hints on the left top and bottom of each page.
Second to demonstrating the PDFs and navigation, look at the folder: the files are all PDFs.
Back to the original question. Now you can embed PDFs, but if you do, please do not mess with them. All the controls you have on the linked page are redundant; any browser facilitiates that even now, and will be doing so better in future. Eg. in simple viewable PDF, use your browser controls to increase/decrease magnification, move around the doc, etc.
Let's assume you finish your Google Maps page, that's the first or index page. Draw all the train lines in; when the user clicks on a train line, it takes them to either (a) a clean page produced from your db as per (1) which will look like (2), or (b) a single clean PDF in readable form as per (3). You could do the whole project just by manipulating files in directories.
A lot less work. No Google docs; no intermediate s/w to constrict you or work around. You can forget about IE and its multiple incarnations and strangulations; any other browser and its limitations. Concentrate on the data, and getting it out there in presentable form, not on the pitiful s/w and its fits and starts.
Cheers
From HTML5 :
<embed src="url" type="media-type" height="" width=""/>
For media-type refer, http://www.iana.org/assignments/media-types/media-types.xhtml
Google Docs offers an undocumented feature that lets you embed PDF files and PowerPoint presentations in a web page. The files don't have to be uploaded to Google Docs, but they need to be available online.
Here's the code I used to embed the PDF file:
but you should replace the bold URL with your own address. As I mentioned, the document viewer works for PDF and PPT files.
http://googlesystem.blogspot.com/2009/09/embeddable-google-document-viewer.html
Is there any way at all to get the meta information about a picture from a link without download the picture it self?
Like i have this url to a picture http://www.abc.com/picture.jpeg, i want to get the meta information such as the dimensions of the picture with out actually downloading the picture it self.
Of course I want to do this by writing a program, because there is large amount of pictures to go through.
I doubt you can get information about an image without downloading it. For example, when you visit a website and it has an image on it, the browser only knows the dimensions of the image after it has downloaded it. This is especially true if you want more advanced metadata such as time picture taken, iso, exposure, etc. The URL carries no information except if you can get some information from the parameters. Ex: http://www.abc.com/picture.jpeg?x=100
Sorry :/.
You might maybe want to look into downloading a thumbnail of the image, or maybe there is a way to not download the image pixel data but only the EXIF metadata which would cut down on download time/costs but still get you that metadata you want. I have no expertise in that subject though.
If all you have is a URL, than all you have is a URL. There's no magic incantation that will extract more data than there is from it. I.e.: No, you'll have to download the image.
If you have control over the server serving the image, you could make an HTTP HEAD request, have the server evaluate the image and output meta information about the image in the HTTP header, essentially creating a custom protocol for this purpose. That's a lot of ifs though and really depends on what you want to do.