Google not listing site, but no errors and content appears rich - html

We have inherited a website of around 30,000 pages (actual content), each with a unique title and rich content. Whatever we try, Google seems to not like listing the new site and visits have dropped by 80% (vs. old site & domain).
The website was redeveloped recently and changed domain at that point too, which hasn't helped work out what is happening and this marked the drop in visitors. The old domain was registered Oct 2005, the new in 2009, so both have some age to them. In Webmaster tools recently submitted a notice that the site address had changed, possibly too soon to see any effect of this (7th Dec).
The older CMS was hard to redirect from, so have a very large .htaccess file (1MB), is there a limit to how large this file should be with redirects? I could perhaps code something in PHP to handle the 30,000 redirects programatically, but the URL's in the old were pretty strange using comma separation and other symbols. I have used header checkers and the correct 301's are being returned.
We also submitted a sitemap with 25,000 pages via Webmaster tools, of which it listed 11! There were no errors and as I say, the page content is rich with descriptive titles.
Google can see 68,000 pages in Webmaster tools, but the actual listed in only 175, so the problem seems quite significant and the others remain 'unselected'. The curve of the 'unselected' seems to reflect the efforts we have put in to have the site list, yet they seem not to be indexed.
Site: http://bit.ly/VKYClf
(The older site was the same name but hyphenated)
I have researched a lot, but all steps so far have been fruitless and the pages listed hands around the 170 mark.
Can you think of any specific steps worth taking to identify any factors preventing the site from listing?
Thanks in advance and happy to provide more information on anything.
EDIT: In case it helps anyone else, the website is built of Wordpress but uses custom vars to generate lots of pages on the fly... since WP 2.9 a canonical tag was added, but the two were not playing well together and they were pointing to anything WP could find with that ID... have now removed and hopefully things are moving forward

It's going to take awhile to get fully indexed if you just submitted 25000 url's, it may take months for that.
I would recommend you log into google webmasters, go to health (fetch as Google) you can give individual url's to crawl, Google should respond within 24 hours. If your indexed page still does not show up you have major problems that I would not be able to assist with. You get 500 fetches a week. If you want your site indexed right away it may be your only shot.

Related

Facebook webpage HTML code Update frequency

I have recently become aware of a major change in the HTML of the Facebook web version. Is there anywhere I can find out how often the HTML code is updated?
I've tried to ask on the Meta Developer Community Forum, but I can't post it due to 'error performing query'
Many thanks!
Not every company has a set update frequency for their pages, and sometimes only small portions will be changed when they do update. In order to avoid and discourage scrapers, most companies won't publicly reveal it, either.
I suppose you could do your own testing and find out, but you might get a lot further by posting a question asking more specifically about what it is that you're trying to achieve

Disaster Recovery: Deleted Entire Knowledge Base on Web (Zendesk)

Running Chrome Version 96.0.4664.110 (Official Build) (64-bit) under Win10.
Recently, through user error, we deleted apx. 100 articles from our customer knowledge base, which is hosted by Zendesk. Not one-at-a-time but by deleting the entire "brand" containing the info for our product.
It's uncertain, at best, if Zendesk will recover these valuable articles. So, I've been trying to find cached copies per the link below. I don't know enough about caching to understand if this "data" is stored on my hard drive, in the cloud, or both.
https://www.webnots.com/how-to-view-content-of-cached-page-when-it-is-not-accessible/
I've tried the above solutions (fortunately, I found some URLs from the deleted articles) but to no avail. In fact, options 6 and 7 did not work at all. And it would be nearly impossible to determine the article IDs (which are part of the article URLs) for 100 pages.
At this point, this process feels like a scavenger hunt: now and then I'll find a promising page or URL, but I've yet to find a complete article.
Perhaps I need to approach this problem from another angle? Can anyone offer some additional suggestions? And if I'm lucky enough to find a usable copy of an article, how do I move it from the cache into a live web page?
Many thanks,
Steve

Publishing will be delayed Broad host permissions error Chrome web store

We're trying to work out how to resolve this issue, the current issue we have is our extension works on a unlimited amount of websites 1 of the features is a time & screenshot monitor so employer can track freelancers work other feature is ability to highlight text on any site and run a amazon search for the highlighted text for example.
We are always updating and any issues got fixed within 24-48 hours now with this happening every time we have to wait 1-3 weeks for a review according to google devs.
If we change the manifest to specific websites only then 99% of the time our extensions not going to work on all the other sites not listed. Can we do something else?
IF ANYONE HAS A SOLUTION OF WHAT WE CAN CHANGE TO MAKE THIS WORK I'M ALL EARS!

Determine if list of bulk URLs are dead, live, or parked

There's a list of 100s of URLs that need to be checked to determined if the sites are live (someone has put their own content, even if just a landing), unreachable, or parked.
Unreachable is self explanatory, but distinguishing between actual user content and a parked domain is trickier. What I mean to say is someone who's hosting a domain through GoDaddy and uses their default landing page versus a hosted site with unique content as a landing page.
Using http codes (2xx,3xx,4xx,etc) isn't reliable. Does anyone know of a solution? It doesn't need to be 100% accurate in all instances, just accurate when it says it's accurate in order to minimise manual checking.
The best solution I can come up with is seeing who the site is registered with and comparing the code against other sites also registered there where matches >.9 or something to that effect. This is clunky.
Are there any ready-made solutions for this problem? If not, is there a more efficient methodology?

How can web pages be coded so that search engines assign a higher page rank to the latest version?

I frequently use Google to search for .NET documentation, and invariably, the highest ranked pages are for old versions of the .NET framework.
For example, I just did a Google search for "c# extern".
The first result was for Visual Studio 2005.
The second result was for Visual Studio .NET 2003.
I went through several pages and never did come across the Visual Studio 2010 page.
Interestingly, I tried the same search on Bing, Microsoft's own search engine, and Visual Studio 2005 was still the first hit. However, the second hit was the one I was looking for (Visual Studio 2010).
I realize that many documentation pages on MSDN have a menu at the top that allows you to switch versions, but I don't think it should be necessary to do this. There should be an HTML way to convince search engines that two pages are very similar, and one is newer/more relevant than the other.
Is there anything that can be done in HTML to force a documentation page for a more recent version to get a higher page rank than an essentially equivalent page for an older version?
You can't tell Google what page is preferred
(That's basically the answer to your question)
If someone googles c# extern, that person will get the most relevant pages calculated by googles algorithms. It will differ from user to user and where you are located, but far most how links all over the internet are directed. You can not change this with on-page optimization.
Canonical addresses mentioned by Wander Nauta is not suppose to be used in this manner. We use canonical addresses basically if we wish to tell Google or any other bot that two or more pages are the same. This is not what you where asking for. It would remove the older versions from indexing entirely in favor of the page addressed as the canonical address.
Quoted from http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Of all these pages with identical content, this page is the most useful. Please prioritize it in search results.
...
The rel="canonical" attribute should be used only to specify the preferred version of many pages with identical content...
To lead the client correct I would use, as you already described, a good web interface on the page so that the client easily can find what he or she is looking for.
Google also offers sitelinks for your search results that may or may not appear. I would say this is where you come closest to be able to direct your clients to the most relevant page by your standards on the search page.
Quoted from https://support.google.com/webmasters/bin/answer.py?hl=en&answer=47334
...sitelinks, are meant to help users navigate your site. Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they're looking for.
In Googles Webmaster Tools you have an option where you can optimize thees links, at least somewhat.
Quoted from Googles Webmaster Tools
Sitelinks are automatically generated links that may appear under your site's search results. ...
If you don't want a page to appear as a sitelink, you can demote it.
Update
You could theoretically specify what version something on your page is in with "microdata" or similar. By doing this you have at least told the bots that there are two items on this site with the same name but in different versions. I don't think this will have any effect in witch order your sites will be listed in the search result thought. But we never know what the future holds right?
If you check out schema.org you'll see CreativeWork has an property named "version" and SoftwareApplication has one named "softwareVersion".
Google uses microdata to create rich snippets. I haven't heard that Google uses it for anything else but that does of course not mean it isn't so.
Google allows you to specify a canonical address for a specific resource, i.e. the version of a given page you want Google to prioritize. It's pretty easy to use.
However, hints like these are always suggestions. That is, the search engine is free to ignore them, if they support them at all.
For that you would need to know the actual algorithms. I'm guessing that most search engines do a comparison on how well the page matches the search but then also take into account the amount of hits a site gets. So say you have a 98% match with 1000 hits and a 96% match with 5000 hits. The second page may still be ranked higher.
As for what you can do, search engines are "blind" so use CSS and avoid tables for layout purposes. This will allow the engine to get a better match with your content. For a workaround with old version, you could redirect traffic coming in to the new version and then have a link to the old version on that page. Essentially setting it up so that only following that link takes you to the old page.
Since at it's core Google's search is based on links (pagerank algorithm), it would certainly help if each page of the old version linked to it's respective page on the new version. This might not solve the problem completely, but it would certainly help.