How do I provide info to Google about interesting/important pages on my website? - html

For an example of what I mean, search on Google for "Last.fm". The first result will be www.last.fm and 8 additional links are listed; "Listen", "Log in", "Music", "Download", "Charts", "Sign up", "Jazz music", and "Users". I looked around in their HTML but couldn't figure out where this information was supplied to Google.
Any help? Thanks :)

You can try looking at the Google Webmaster Tools, and provide google with a webtree of your site.

Write semantic markup.
Google work out the important links from that, they aren't told explicitly.
Google's documentation explains the process.

In your sitemap you can specify priority for pages.

The above answers are all good.
You might also try NO FOLLOWING (rel="nofollow") unimportant links on your homepage or other pages. Google will the give more weight to the followed links.

It used to be that you needed to be pagerank 4 or higher to get the sitelinks to show up if you were the top result. (and then you could edit them via webmastertools)
but it seems like google are currently changing things around. apparantly they were not clicked enough to warrant taking up valuable space on the resultspage.

Use XML sitemaps. However, be warned that sitemaps must not be misused. There is a big debate on whether sitemaps are good or not.

I met such thing before.
What I did is submitting new, accurate site page to google.
Taking a close look at the content, as well as Mata tags to see if they are accurate and descriptive. In my case I reorganized the whole content.
Most important, I back to the track of SEO, refresh content frequently. Shame to me, I had not refreshed content for a long time.
I do not know which one plays the rule, but thing works pretty well now. Hope it it is worthwhile for you as a reference.

Related

How to make a link url go through another page when clicked HTML

I'm sorry I do not know how to word that title better. I have tried searching google but my terminology isn't helping my results.
Let me explain the context. When you're on a news website or blog and you're on their homepage like: www.homepage.co.uk/ and then you click an article it will go somewhere like this: www.homepage.co.uk/2017/article/ how do they make the 2017 appear? because if you remove the /article/ from the url it takes you to an archive of all the links in that year? I don't understand, is there a process to this?
When I click a link in my website it goes to: www.website.co.uk/link
I want to be able to have that 2017/link/ in the url so they can find the archive of that year just like on their websites?
How do I do this?
I am sorry if I am not explaining this very well.
I understand changing my filenames to : "2017/article.html" might work but I do not believe that is the correct way of doing it?
Thanks a lot for your time and suggestions!
You're asking about a couple of things: one is the taxonomy of the site. Taxonomy, if you don't know, is the "shape" of or how your site is organized. News sites, for instance, are usually organized by date and perhaps topic (Health and Leisure, Politics, Entertainment, etc.). The other aspect of your question is regarding what you might call RESful "hacking" of URLs. One of the tenents of REST is that URLS (uri, to be accurate) are supposed to be hackable. A news site might have /2017/10/10 to display all articles for Oct 10. Maybe you remove the last "10", and get all the articles for October so far. If you are not using a site platform that does this for you, you will have to maintain that taxonomy yourself, and manually write all the links. Systems such as Drupal and Joomla, among others, will translate your taxonomy into automatically-maintained links. In editing a page on one of these platforms, you typically only refer to the system's internal name of the page (could be a shortened version of the article's title in the above example), and the underlying engine takes care of reconstructing the URL for you (in case the page moves, or its tags/taxonomy changes).
This is a big topic, and I encourage you to do some further reading:
http://searchcontentmanagement.techtarget.com/feature/Building-a-website-taxonomy-in-eight-steps
https://www.drupal.org/docs/7/organizing-content-with-taxonomies/organizing-content-with-taxonomies

Always avoid using <iframe>?

Some days ago, some friends of mine told me to avoid using <iframe> for virtually anything, which of course includes Google Maps. That made me do some research and, among other things, find this thread in Quora (http://www.quora.com/Google-Maps/What-are-best-practices-and-recommendations-to-implement-Google-maps-within-an-iframe-on-a-webpage), which I think isn't conclusive, at least in my case. I've made a simple site which includes displaying a Google Map. I used an <iframe> because it is very simple and, as pointed out before, it is the option that Google offers within every map, so I guessed it was the optimal one.
My question is: using an <iframe> is always a bad solution, or in a simple case like mine (only displaying a location map), is it recommended?
Thank you all, please let me hear your thoughts on this,
João
Using an iframe is like having another page loaded in your browser. Which takes resources. I think this is what the suggestion to avoid it based on. But naturally, the solution is to avoid those who suggest that you should avoid something always. Just use it when it makes sense and know where to stop.

Finding number of pages of a website

I want to find the number of pages of a website. Usually what I look for is a sitemap but I just encountered a site which does not have a sitemap so I am out of ideas of how to find its total pages. I tried to Google the URL but that did not helped much. Is there any other way we can find out the pages of a website?
Thanks in advance.
Ask Google "site:yourdomain.com"
This gives you all indexed pages.
Or use the free tool "Xenu". It crawls the whole site. But it won't find sites which have no internal links pointing to them. You can also export a sitemap with it.
I was about to suggest the same thing :) If this is a website you own, you can also add it to the Google Webmaster tools. It will show you lots of things about your site including number of links, pages, search terms, etc Its very useful and is free of charge.
I have found a better solution myself. You can go to Google Advanced Search and restrict the search results to your domain name. Leave everything else empty. It would give you the list of all pages cached by Google.
You could also try A1 Website Analyzer. But for all link checker software, you will have to make sure you configure them correctly to obey/not-obey (whatever your needs are) e.g robots.txt, noindex and nofollow instructions. (Common source for confusion in my experience.)

How to prevent search engine indexing of common utility words without JS

Google thinks my site is about ago, cancel, and edit. This is because those words appear on every comment. Why am I the only site who seems to suffer from this problem?
In fact, if you search Google for the words ago, cancel, and edit (without quotes), my site comes up in the #2 spot (right below Stack Overflow in spot #1). Even if Google is not smart enough to filter out these words, should not sites like reddit, who also use these same text buttons show up before mine?
http://www.google.com/#hl=en&q=ago+cancel+edit
Sites like reddit do not use JavaScript to cloak keywords, so they have found a non-JS solution. What gives?
Thank you.
EDIT: The website is popstrip.com and you can see the comment code at the bottom of any comic page.
This is from the google support forum, so it looks like Google doesn't think it's possible without Javascript or an iframe.

Why does the Google homepage use deprecated HTML (ie. is not valid HTML5)?

I was looking at the www.google.com in Firebug and noticed something odd: The Google logo is centered using a center tag.
So I went and checked the page with the W3C validator and it found 48 errors. Now, I know there are times when you can't make a page valid, especially when we're talking about something like www.google.com and you want it to be as small as possible, but can someone please explain why they use the center tag?
I attended a panel at SXSW a few years ago called "F*ck Standards" which was all about breaking from standards when it makes sense. There was a Google engineer on the panel who talked about the Google home page failing validation, using deprecated tags, etc. He said it was all about performance. He specifically mentioned layout rendering with tables beating divs and CSS in this case. As long as the page worked for their users, they favored performance over standards.
This is a very simple page with high traffic so it makes sense. I imagine if you're building a complex app that this approach might not scale well.
From the horse's mouth.
Because it's just the easiest, most concise way to get the job done. <center> is deprecated, for sure, but as long as it's still supported, you're likely to still see them using it.
Shorter than margin:0 auto. Quicker to parse. It is valid HTML4. No external dependencies, so less HTTP requests.
Usability is NOT validity.
Google Search's biggest achievement has been to build a site which is easy to use, and can be widely used. Now, if Google achieved this with a page which does not validate, well, there's a lesson there to learn.
I think a better question to ask would be "why would Google make it validate if it works fine?" It makes no difference to the user.
There has been speculation and discussion about whether this is intentional; the basic test carried out in the first link does result in a smaller page, and even gzipped, through millions of page views it theoretically stacks up. I doubt that's the reason though: it was created, tested on many browsers at the time, it worked, and continues to work.
Google's breaks validation in many ways on their home page. The very likely real reason - they are all about speed and bandwidth costs. Look at the size of the home page HTML particularly after Gzip is applied at the packet level. They are clearly trying to avoid packet fragmentation (which will mean more bandwidth) and willing to do whatever it takes to get it (identifier shortening, quote removal, deprecated tags, white space removal, etc.
If you look at this just as a validity question, fine but they break the rules on purpose if you don't assume this of course you may jump to a negative conclusion. BTW you can further optimize their pages both in positive and negative manners but why once inside the typical packet size it is somewhat pointless.
They also use other deprecated presentational tags like font and u. My guess is it makes the page quicker to load then using an external stylesheet and allows it to work on more platforms.
It's deprecated, sure, but I think simplicity is the answer to your question.