Correct Microdata syntax for breadcrumbs NOT in a list? - html

Trying to determine the correct syntax for using Microdata inside my breadcumbs implementation. Everything I have read seems to lean towards the fact that the breadcrumbs are structured inside an ordered or unorderd list. Mine is not.
<body itemscope="" itemtype="http://schema.org/WebPage">
...
<div class="breadcrumbs" itemprop="breadcrumb">
Home
<span class="delimiter"> > </span>
Parent Item
<span class="delimiter"> > </span>
<span>Child</span>
</div>
...
</body>
If I run it inside Google's tool it seems correct, but compared to their example it is missing a lot of elements and doesn't have the structure of their example BreadcrumbList.
I'm also a little confused about the correct properties for the links. Should they all have title and url properties?
I was looking at the examples at the bottom of the page here: http://schema.org/WebPage

The breadcrumb property expects one of two values:
Text
BreadcrumbList
If you provide a Text value (like you do in the example), you can’t provide data about each link. If you are fine with that, the Microdata in your example is correct (but it also contains RDFa, which doesn’t seem to make sense, at least not without further context; so if you didn’t add them intentionally, you might want to remove the property attributes).
If you want to provide data about each link, you have to provide a BreadcrumbList value.
For the Microdata, it doesn’t matter whether or not you use a list. If the example uses ol→li→a→span, you could as well use something like div→span→a→span. You just have to make sure to use the correct element type.
If you can’t add parent elements to the a elements, it’s still possible to use BreadcrumbList. But then you would have to duplicate the URL with a link element inside the a element.

Related

How to select <div class="ok">.....<a href="soft://an.id/">...</div> nodes?

A document has several <div class="ok"> tags. I am able to select all of them with
"//*[#class="ok"]" (i don't have to specify div, because only div tags have this class). I get a list of 6 nodes matching this.
Now, i need
either to test each node in order to see if it includes the tag <a href="soft://an.id/">. This inclusion is not direct. I mean, the <div> includes a <table> with many <tr> and <td> and <span>, and the <a..> (only one, or none) somewhere before </div>.
or to directly select only (div) nodes of class="ok" that include this <a> tag.
I have tried many things, that all fail. Including protecting the "/" in the href detection (is it required?).
I am quite familiar with regular expressions, but i must confess that i find XPath syntax even harder to understand.. And the W3C reference documents are so hard, without examples..
Any hints are welcome.
In order to select only <div class="ok"> element containing <a href="soft://an.id/"> child element you can use the following XPath locator:
"//div[#class='ok' and .//a[#href='soft://an.id/']]"
If I understand you correctly, you have a nested somewhere under the div with class "ok", right?
So in xpath, the a / is meant for a direct locator under/above the current tag. If you are looking for the somewhere under the found div, you need to use:
//div[#class="ok"]//a[#href="soft://an.id/"]
Then you need to check if it exists or not by using some kind of an assertion.

Terminology - The types of elements in HTML

A while ago there was a term that I remembered that described two categories of elements. I forgot the term and I want to know what that term was. The information I can remember is that the first category of elements get their values from within HTML like <p> or <a> or <ul> but there is another category of elements which get their values from "outside" of HTML like <img> or <input type="textbox">. I want to know the terminology for these types.
Edit - I've went through Zomry, Difster and BoltClock's answers and didn't get anything. So I remembered some extra piece of information and decided to add it. The two categories are Lazy Opposites of each other. For example if one is called xyz, then the other is called non-xyz.
Probably you mean replaced elements (and non-replaced, respectively)?
However, the distinction between them is not so unambigous. For example, form controls were traditionally considered replaced elements, but the HTML spec currently explicitly lists them as non-replaced (introducing the "widget" term instead).
The HTML specification mentions for tags like <img> and <input> the following: Tag omission in text/html: No end tag.
Tags with an end tag are defined as: Tag omission in text/html: Neither tag is omissible.
So as far as I can find, the HTML spec does define a technical name for this, apart from void versus normal elements, so what Watilin pointed out in the comments should be fine: standalone vs containers.
As an added side-note: HTML has a lot more HTML content categories. You can find a complete overview at the HTML spec here: https://html.spec.whatwg.org/multipage/indices.html#element-content-categories
Also interesting to read to visualize that a bit better: https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/Content_categories
Elements whose contents are defined by text and/or other elements between their start and end tags don't have a special category. Even the HTML spec just calls them normal elements for the most part in section 8.1.2.
Elements whose primary values are defined by attributes and that cannot have content between their tags are called void elements. img and input are indeed two examples of void elements. Note that void elements are not to be confused with empty elements; see the following questions for more details on that:
Are void elements and empty elements the same?
HTML "void elements" so called because without content?
<input type="text" id="someField" name="someField">
With an input selector, you can get a value from it like so (with jQuery):
$("#someField).val();
Where as with a paragraph or a div, you don't get a value, you get the text or html.
<div id="someDiv">Blah, blah, blah</div> You can get that with jQuery as follows:
$("#someDiv").html();
Do you see the difference?

Parsing awful HTML: How do I recognize boundaries with xpath?

This is almost going to sound like a joke, but I promise you this is real life. There is a site on the internet, one which you have all used, that does not believe in css classes. Everything is defined directly in the style tag on an element. It's horrifying.
My problem though is that it also makes the html extraordinarily difficult to parse. The structure that I've got to go on looks something like this:
<td>
<a name="<random_string>"></a>
<div style="generic-style, used by other elements">
<div style="similarly generic style">{some_stuff}</div>
</div>
<a name="<random_string>"></a>
...
</td>
Basically, I've got these a tags that are forming the boundaries of the reviews, whos only defining information is the random string that is their name. I don't actually care about the anchor tags, but I would like to grab the reviews between them using xpath.
I've looked into sibling queries, but they don't seem to be well suited for alternating boundaries. I also looked into the Kayessian method of xpath queries, which (aside from having an awesome name) only seems well suited to grab a particular div, rather than all divs between the anchor tags.
Any thoughts on how I could grab the divs here?
If //td/div[../a[#name]] works for you, then the following should also work :
//td[a/#name]/div
This way you don't need to go back and forth -or rather down and up-. For a more specific selector, you may want to try the following :
//td/div[preceding-sibling::*[1][self::a/#name]][following-sibling::*[1][self::a/#name]]
The XPath selects div element having all the following properties :
td/div : is child of <td> element
[preceding-sibling::*[1][self::a/#name]] : preceded directly by <a> element having attribute name
[following-sibling::*[1][self::a/#name]] : followed directly by <a> element having attribute name
I figured it out! It turns out that xpath will allow for relative attribute assertions. I am not sure if this behavior is desired, but it happens to work in this case! Here's the xpath:
//td/div[../a[#name]]
Nice and clean, the ../a[#name] basically just says:
Go up a level, and make sure on that level of the hierarchy there's an a element with a name attribute

Fragment link not working

Total newbie question, but I cant figure out what im doing wrong. I want a make a link that jumps down the page to a header. I believe these are called fragment links. Here is my code thats not working:
My Link
<div id="cont">
<p>Lots of content here, abbreviated in this example to save space</p>
<h2 id="Frag">Header I want to jump to</h2>
</div>
Pretty sure you need to specify the name attribute for an anchor to work, for example:
Skip to content
<div name="content" id="content"></div>
Okay, so 'pretty sure' was a euphemism for 'guess' and I thought I'd look it up, so, from the HTML 4.01 Specification we get this from section 12.2.3 Anchors with the id attribute:
The id attribute may be used to create an anchor at the start tag of
any element (including the A element). This example illustrates the use of the id attribute to position an anchor in an H2 element. The anchor is linked to via the A element.
You may read more about this in Section Two.
...later in the document
<H2 id="section2">Section Two</H2>
...later in the document
<P>Please refer to Section Two above for more details.`
To carry on the convention of guesswork, perhaps your page isn't long enough to allow jumping to that content (that is, your page might have nowhere to jump and the content to jump to is already visible.)
Other than that, and from the same section of the spec previously linked, here is some general info on when to use what as the anchor identifier (in terms of the link its self) that could be otherwise valuable:
Use id or name? Authors should consider the following issues when
deciding whether to use id or name for an anchor name:
The id attribute can act as more than just an anchor name (e.g., style sheet selector, processing identifier, etc.).
Some older user agents don't support anchors created with the id attribute.
The name attribute allows richer anchor names (with entities).
Your code works fine in firefox anyway you can use as well name instead of id..
http://www.w3schools.com/tags/att_a_name.asp
if you want to have a nice scrolling you can use jquery scroll http://api.jquery.com/scroll/

When using HTML5 Microdata, should the 'itemscope' and 'itemtype' always be used on the same element?

I'm trying to understand the reason behind the existence of two attributes instead of just making the element holding the 'itemtype' the one that wraps the scope for the item.
Is it valid to have 'itemtype' attribute on one element and 'itemscope' attribute in some other? like this:
<section itemtype="http://data-vocabulary.org/Person">
<div itemscope>
<span itemprop="name">Alonso Torres</span>
</div>
</section>
If this case is not valid then why the existence of the 'itemscope' attribute at all? Why the spec didn't come up with the idea of making the element holding the 'itemtype' attribute to be the one which sets the scope. That would have make sense for me.
You're right, the itemscope attribute seems redundant. Someone else pointed this out on the W3C's HTML mailing list: http://lists.w3.org/Archives/Public/public-html-bugzilla/2011Jan/0517.html
The answer ( http://lists.w3.org/Archives/Public/public-html-bugzilla/2011Jan/0523.html ) was that:
The HTML spec editor did user-testing
of the feature earlier, and if I
recall correctly, several of the test
subjects found it much easier if there
was an explicit indicator of the
container, rather than it being
implicit due to the type.
In other words, it's better for attributes to have a single clear definition than multiple implied definitions. Not sure I agree but that's the official view.
itemscope is mandatory if itemtype is used on the same element
The example you show is invalid. The spec has been updated to include this:
The itemtype attribute must not be specified on elements that do not have an itemscope attribute specified.
Here, "must not" is to interpreted as in RFC2119: "the definition is an absolute prohibition of the specification".
I don't believe that it is useful to place an itemtype attribute anywhere but on the same element as the itemscope attribute. The spec says:
The type for an item is given as the
value of an itemtype attribute on the
same element as the itemscope
attribute.
The reasons why two attributes are needed isn't clear to me either. Semantically they serve different purposes, so for clarity of usage it may have seemed more sensible. For simple use, it's possible to create an item using itemscope without giving it a type. That means that itemscope is a boolean attribute, whereas itemtype takes a string value. It's not possible in HTML for an attribute to behave as boolean when used without a value, and a string when used with one, so separate attributes makes sense.
I know that Google did a usability study on the Microdata mark-up before it was announced, so it was likely that such questions were addressed there and that the separate attributes was the preferred outcome. (Although that study also resulted in a preference for itemref being an element, not an attribute, something that was subsequently changed.)