Author error inside Google Structured Data Test - html

I receive this error when analyzing a review of my website: "Missing required hCard "author""
http://www.google.com/webmasters/tools/richsnippets?q=http%3A%2F%2Fwww.gamempire.it%2Fcastlestorm-ps-vita%2Frecensione%2F131419
Why? I have setted the class="author" inside the html.
This is the code of the page: https://gist.github.com/anonymous/7675765, that you can find here: http://www.gamempire.it/castlestorm-ps-vita/recensione/131419

author vcard has to be inside the hentry element http://microformats.org/wiki/hentry or you have to use the include pattern http://microformats.org/wiki/include-pattern, but I am not sure if it is supported by Google...

you are declaring author a few times in that document...not sure if you want to use microformats, schema.org, or both?
you have markup for both, and one of the author classes is actually nested inside the other...so at the very least that is not correct....fix your nested authors first. if that doesn't resolve validation, come back and share. the solution(s) are relatively easy, but impossible to give you the correct one without knowing more.

The solution is to create a new user mentioning him as Author in Admin Access, then open a page and go to screen option and select author. Now in the page check for author and update it with the new author user which you already created.

Related

Algorithm to develop an article extractor

I have undertaken a project which will extract the main content from any webpage. For example, if I input the URL of any news article, it will return the article part only. The first step would be getting the source code of the given URL. There are many ways to do it. After getting HTML code of given webpage, I will keep the part inside <body> tag because obviously article will be somewhere inside body.
After this, I am selecting each div element and checking how much text it contains. At end I am selecting the div with most text inside it.
Other way I am thinking is, for each <p> element, I will check the parent of it. At end, I will select the div which has most <p> child directly. To understand it better check this tree- Tree of an HTML
Now I know that these methods are the basic and that's why I am asking this question. I want to know the suggestions of the community about this. What approaches you all use?
I like the idea of implementing your own 'News' crawler...
A few suggestions:
Check the source ('Right Click' > 'Inspect' at chrome) of some popular sites (e.g. The New York Times); search for common html object names, ids or classes they use to identify the different blocks in the html; for instance: divs with 'story' or 'story-body' ids.
I would go with the word count, but also use a dictionary of common phrases, which are likely to appear in a news article.
I would search for the block within 'header' and 'footer', excluding comments section or advertisements (again, by searching the values of the object id or class names).
Start your crawling from the main page, it will probably have references to the sub pages or articles - once you have the reference (e.g. a header or article name), it will help you navigate in the sub page itself.
In any case, I suggest working with java jsoup library - it will make your life easier; use it with the jquery-like selectors.
Goodluck.

Replace iframe.src attribute with javascript comment holding value

I am looking for a way to replace the content of the src attribute for an iframe with a dummy variant containing the original src value (but will not actually fetch anything). I am loading the html code via Ajax so I can change the src-attribute before the code is injected into the DOM - so I don't need help with that part. What I would appreciate feedback on is what to put in the src attribute. There is a related post here discussing what can go in the src attribute, but in contrast to this post, I want to store data (namely the original src value) so that I can extract it later. It seems the alternatives are:
src="javascript:/*http://originalsrcvalue.com*/"
src="about:blank/*http://originalsrcvalue.com*/"
src="#http://originalsrcvalue.com"
I am leaning towards the last variant using bookmarks. I'm looking for feedback on potential problems or cross-browser issues that might arise or suggestions for alternative solutions.
Edit: One way of addressing the problem is to use custom attributes - and this is probably what I'll end up using in this specific case. However, I would also like feedback on ways to store data in src-tags in the fashion showed above.
You could store the actual URL to a data-your-data-name attribute and fetch it with Javascript when you need it, by doing element.getAttribute('data-your-data-name') or if you don't care much about IE users, by element.dataset.yourDataName
References:
https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Using_data_attributes
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset

Relative Anchor Tags from the same page

I have a page at http://mydomain/articles/20131114 I'd like to add an anchor to a comments section. Obviously I could add the full tag URL http://mydomain/articles/20131114#Comments but I'd also find it really useful at be able to add a relative anchor from inside the document (e.g. something like #Comments). I can't find much documentation on this question so I'm not sure if this is possible or not.
can you help
Thanks
You are looking for a "named anchor"
From
Go to Comments
Target:
<a name="comments">Here is the comments section</a>
Please note that newer browsers prefer an ID instead of name
HTML Anchors with 'name' or 'id'?
That means you can keep the link but access any object with an id, which has to be unique
<h1 id="comments">Comments section</h1>
Yes, this is possible and always has been possible. This will do what you want:
Go to comments
This will take you:
either to an a element with the name Comments
or to any element with the id Comments
I agree that it's a little hard to find a specification for this. The best one is probably the specification for URIs, page 27:
When a URI reference refers to a URI that is, aside from its fragment component (if any), identical to the base URI (Section 5.1), that reference is called a "same-document" reference. The most frequent examples of same-document references are relative references that are empty or include only the number sign ("#") separator followed by a fragment identifier.

Why can't I add an HTML custom attribute to a visualforce page?

I have an control, to which I want to add a custom html attribute called, previousValue.
The Salesforce Developer's Guide assures me that I can do this by prefixing the attribute name with html-.
So I have an element that appears thus:. I also have the docType="html-5.0" attribute in my page control.
However, in Eclipse I get an 'unsupported attribute' error. I have upgraded to the latest force.com IDE; can any one tell me why this isn't working? What else do I need to do?
Thanks.
After much experimentation, the answer to this seems to be that the salesforce developer's guide is inaccurate and the 'hmtl-' prefix is not supported by the <apex:inputField> component. I can add it without a problem to an <apex:outputPanel> component. Don't understand why this should be so and the whole point to using these attributes is to locate data in a relevant place and avoid complex jquery selects to find the data relative to the location at which it is required.

HTML rel="up" attribute?

I'm using mobile template HTML files on a PHPBB forum.
I tested the html for errors at http://validator.w3.org/
The test results showed the following error
Line 24, Column 66: {navlinks.FORUM_NAME}
Bad value up for attribute rel on element a: The string up is not a registered keyword or absolute URL.
Not having heard back from the author and not finding much on Google search, I'm trying to understrand what rel="up" does, if anything constructive.
Can't find any mention as an official HTML attribute
http://www.w3schools.com/tags/att_link_rel.asp
wondering if it's probably safe to just remove the phrase rel="up"
The Internet Assigned Numbers Authority (IANA) keeps a list of link relationships The latest version is from March 21 2013.
up: Refers to a parent document in a hierarchy of documents.
Unfortunately, despite the fact that this registry was long established, it was decided that HTML5 would not use this registry and would use a Wiki page to list the conforming link types instead.
Up, is listed in a rather insane section marked "dropped without prejudice", which nobody seems to know what to do with, or how to get those link types out of.
It's safe to drop it, but some browsers and browser plugins make use of it. For example, I use a Firefox plugin called "Link Widgets" like this to make use of the link type.
From: http://www.w3.org/MarkUp/html3/dochead.html
REL=Up
When the document forms part of a hierarchy, this link references the immediate parent of the current document.
If this is causing any specific problems or unexpected results, please post your code. Thanks.