What's the difference between `<seg>` and `<span>` - html

What's the difference between a <seg> in XML and <span> in HTML? Here are two passages from Bibles, one from the English Bible in Christodouloupoulos' and Steedman's massively parallel Bible corpus,
<?xml version="1.0" ?>
<cesDoc version="4">
…
<text>
<body id="Bible" lang="en">
<div id="b.GEN" type="book">
<div id="b.GEN.1" type="chapter">
<seg id="b.GEN.1.1" type="verse">
In the beginning God created the heaven and the earth.
</seg>
<seg id="b.GEN.1.2" type="verse">
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
</seg>
…
and the other from the NIV English Bible at Bible Gateway, which is where they got most of their texts from:
<p class="chapter-1">
<span id="en-NIV-27932" class="text Rom-1-1">
<span class="chapternum">1 </span>
Paul, a servant of Christ Jesus, called to be an apostle and set apart for the gospel of God—
</span>
<span id="en-NIV-27933" class="text Rom-1-2">
<sup class="versenum">2 </sup>the gospel he promised beforehand through his prophets in the Holy Scriptures
</span>
…
In the HTML, a it seems a <span> can replace a <seg>, except that the HTML has added verse numbers in <span>. Oh, and the chapters are in <div>. So it's not one-to-one.
Of course, I realize that HTML and XML are different, and this is only one juxtaposition; I'm sure there are others out there. But I'm going to need to be able to display XML as HTML, and I don't want to anger the doctype gods. So, conceptually, how is <seg> different from <span> in purpose, meaning and usage?
Update: #jim-garrison, says I'm going to need to read the schema to understand the XML, but I'm a neophyte at that, too. In particular, I did find some official-looking documentation for <seg> by TEI that makes me think it's use is a little more than arbitrary, but I have no idea how to interpret this documentation. Should it give us a more specific answer than what Jim has already written?

The difference between XML and HTML generally is that the list of tags that can be present in XML is defined by a DTD or XML Schema, and tags represent document semantics and not presentation. So tags can be named anything. In HTML the set of tags is generally predefined, as if there was a pre-existing HTML DTD or schema, but HTML is not XML and doesn't follow all the rules of XML. While HTML was in some sense derived from the same parent as XML (SGML), and the two are superficially very similar, they are most definitely NOT the same thing.
The answer to your specific question is that the writers of the XML chose to use a tag named <seg> ("segment"?) to represent generalized strings of text, with attributes providing additional semantic information. For more details you'll need to find the DTD or XML schema that governs the content of the XML and read the documentation that goes with it.
But I'm going to need to be able to display XML as HTML, and I don't want to anger the doctype gods. So, conceptually, how does different from in purpose, meaning and usage?
This is where you will use XSLT to transform the input XML into valid HTML. To figure out how to do that transformation you will need to know the full semantics of all the tags that can appear (again, go to the documentation for the DTD/Schema) and decide on a visual representation for the data. There's no one answer to "how should a <seg>" be transformed. That's up to your requirements regarding presentation. One possible transformation converts <seg> tags to <span>, but that may depend on the value of certain attributes (type="verse" vs some other type). It might even differ depending on output medium (desktop vs tablet vs phone vs watch vs ...?)
Once you convert from XML to HTML you have left the realm of the Doctype gods and they have no interest in what you do :-) There's a whole different set of deities such as CSS-Cthulhu, Javascript-Janai'ngo (look it up), et al who will take great pleasure making your life miserable.

Related

Style guide for documentation in HTML urges to use spaces in <code>...</code>

In the style guide for the maintenance of a bulky documentation of an existing system using HTML which I has to maintain for a client, I found, that text given in a code-tag should be enclosed with spaces like:
..., the element<code> STATE </code>matches datatype ...
In most cases the whole text is enclosed in <p> tags:
<p>..., the element<code> STATE </code>matches datatype ...</p>
Does anyone has an idea why I should write <code> STATE </code> with no place before and afterwards?
One explanation could be that rendering the HTML leads to "better" (i. e. same / bigger width, ...) constant spaces between normal text and the code (the space in code-tag seems to be "bigger"). Is that approach meaningful? Or are there arguments against this rule so I could convince the program director to kick-out this rule?
This sounds like a way of enforcing a style without, for whatever reason, using CSS.
There's no reason to do this other than to conform to somebody's preference (your boss or a client, presumably, in this case).
To back this up, the HTML specification itself uses examples of <code> elements wrapped within <p> elements which do not follow this format:
Example 104
The following example shows how the element can be used in a paragraph to mark up element names and computer code, including punctuation.
<p>The <code>code</code> element represents a fragment of computer code.</p>
— Example 104 within the HTML5.1 specification

<pre> or <p> with styling for code and formal language

First of all, I'm not a native English speaker, so please prepare my bad English.
In HTML, since I realize that style="white-space:pre;" attribute makes the element like as <pre> tag, I tend to use that attribute (optionally also use "font-family:monospace;" when I need), not <pre>. To me, <pre> feels very less semantic. Using stronger semantic tags with proper styling looks more logical.
Currently, my rule is like this:
preformatted plain text, bunch of characters -> use <pre>.
preformatted paragraph-like things -> use <p style="white-space:pre;"> (optionally add "font-family:monospace;").
Following that rule, I've been using <pre> tag only for ASCII art because I don't think ASCII art is a paragraph-like stuff. However, I think stanza ≒ paragraph, so I use <p style="white-space:pre;"> when I express verse things(poem, poetry, lyrics) in HTML.
My problem is.. I cannot decide what should I use for code. In other words, I'm confused whether the bunch of code is paragraph or not. In addition, both <pre> and <p> with styling looks having a point and logical.
Anyway.. Wikipedia says:
A paragraph is a self-contained unit of a discourse in writing dealing
with a particular point or idea.
Collins Cobuild Advanced Learner's English Dictionary says:
[NOUN] A paragraph is a section of a piece of writing.
I think those can be partially valid for code. Especially, these example is more confusing:
Let's a + b, and divide it by 2.
result = a + b result = result / 2
The only difference between them is one is natural language and the other is code. The natural language one is obviously normal paragraph, and that fact makes me feel the code also paragraph partially because they even has same discourse. The talk about 'code is documentation' or 'self-documentation code' also makes using <p> feel more right.
However, I do feel <pre> also logical. Especially when it is more like less human readable, more machine-like, raw and primal, like pure machine code (01010101100..) or morse code. I would tag them with <pre>, and though I cannot say why exactly, it looks more right. However, using <pre> for some code and using <p> for high level code feels inconsistent.
I don't think it's just simple preference problem, like both are fine, just pick any of them and be consistent', and the logical answer exists, which I need.
Any ideas?
Thanks in advance.

Can I slice a word with </span> for the sake of structured data?

I have this line inside a ProfessionalService itemscope:
Az <span itemprop="makesOffer">ágyi poloska irtását</span> permetezéses módszerrel végezzük.
This is in Hungarian and the problem comes from my language too. For search engines I would like to communicate the offer is "ágyi poloska irtás" without the addendum "át" so it would look like this:
Az <span itemprop="makesOffer">ágyi poloska irtás</span>át permetezéses módszerrel végezzük.
Is this legal? Can I break a word with a </span> closing tag?
Sorry I can't come up with an English example. The example sentence is about how the company exterminate bed bugs it would read like this in English: The <span itemprop="makesOffer">bed bug extermination</span> done by spraying method. but in English it works.
Yes, it is valid and it can make sense to do this.
Any conforming Microdata parser will get the value "ágyi poloska irtás" for the property makesOffer.
Following the HTML5 specification, consumers would have no reason to break the word (e.g., by adding whitespace or a line break) if it contains a span element (… which does not necessarily mean that you won’t find consumers that do this nonetheless).

How To (Semantically) Mark Up A (Theatre) Script / Play in HTML5

How To (Semantically) Mark Up A (Theatre) Script / Play in HTML5?
For obvious reasons, it's hard to search for "play" and "script" without a search engine thinking you mean “play a sound" and “JavaScript".
How can I mark up a script (as in the document one would give to actors in a play) such that it is semantically correct, and easy to style?
For example, let's take the start of Hamlet
Hamlet
ACT I
SCENE I Elsinore. A platform before the castle.
[FRANCISCO at his post. Enter to him BERNARDO]
BERNARDO Who's there?
FRANCISCO Nay, answer me: stand, and unfold yourself.
Fairly obviously, I think, one should start with
<h1 id="title">Hamlet</h1>
<h2 id="act-1">Act 1</h2>
<h3 id="scene-1">Scene 1</h3>
But, then I get stuck.
I've tried looking at MicroData, but Schema.org's CreativeWork[0] really doesn't contain much that would be useful in the case of a work of fiction.
Is it enough just to say
<p class="stage-direction">FRANCISCO at his post. Enter to him BERNARDO</p>
<p id="1"><span class="character bernardo">BERNARDO</span>Who's there?</p>
<p id="2"><span class="character francisco">FRANCISCO</span>Nay, answer me: stand, and unfold yourself.</p>
Or is there a better / more sensible way of doing things?
[0]http://schema.org/CreativeWork
It seems that the idea of precisely specifying markup for dialogue has been abandoned, and the W3C now simply offers some guidelines which pretty much equate to your idea of using paragraphs and spans.
Note that the dl element, which older sources - including the spec - had formerly recommended, should now definitely not be used: "The dl element is inappropriate for marking up dialogue".
But of course all this might change next week, or month, or year…
Does this provide any inspiration? caesar in xml

Which HTML tags are more appropriate for money?

If you had to properly choose one HTML tag to represent a price, a money amount or an account balance, (e.g. 3/9/2012 - Income: 1.200,00 € or item #314159 - price: $ 31,99) then
which tag would you choose for the amount and why?
should the currency also be wrapped in its own tag or not?
I'd really like to avoid a generic inline element like <span class="income">1.200,00 €</span> or <span class="price">$ 31,99</span> but so far I've found no references about it.
The HTML spec for var states:
The var element represents a variable. This could be an actual
variable in a mathematical expression or programming context, an
identifier representing a constant, a function parameter, or just be a
term used as a placeholder in prose.
For me this means that <var> is not suitable for the prices in your examples. It depends on what you are trying to accomplish, but it seems your options are:
Use microdata (ref), for example Schema.org’s offer vocabulary for a product’s price
Use <b> if you’d like to draw attention to the price without indicating it’s more important (ref)
Use <strong> if the price is important, such as the total price of an itemised receipt
Use <span> with a class if you need an element to style the price differently, but <b> and <strong> are not appropriate
If nothing above is suitable and you don’t want to style the price, don’t do anything
From the examples you’ve given there doesn’t seem to be any need to mark up prices. If the examples are from a table to display financial information, make sure they’re in a column headed by <th scope="col">Income</th> or <th scope="col">Price</th> respectively for accessibility.
Hope that helps!
Looking at the HTML5 specs, it's rather clear that a price is not considered to be a semantic entity. And I agree. Think about it this way:
If there were semantic elements, this would be the result
<p>
I have 4 apples, 2 oranges and <money>5 <currency>dollars</currency></money>.
</p>
What is it that makes 5 dollars different from 2 oranges? Should we add a <fruit> tag too?
which tag would you choose for the amount and why?
A span with a class, if you want to add some CSS.
Because nobody really cares too much about semantics. Nice to have, but in reality all that matters is styling.
The currency should be also wrapped in its own tag or not?
Definitely not.
I'd really like to avoid a generic inline element
Why?
You may decide to use <i> if you want to express the "special nature of money".
The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, ...
http://dev.w3.org/html5/spec/the-i-element.html
What about <data>?
<p>The price is <data class="money" value="100.00">$100</data>.</p>
According to the HTML5 spec:
The data element represents its contents, along with a machine-readable form of those contents in the value attribute.
When combined with microformats or microdata, the element serves to provide both a machine-readable value for the purposes of data processors, and a human-readable value for the purposes of rendering in a Web browser. In this case, the format to be used in the value attribute is determined by the microformats or microdata vocabulary in use.
In this case you could also use microdata to add additional information about the kind of currency, etc.
I would use a definition list here.
The HTML element (or HTML Description List Element) encloses a
list of pairs of terms and descriptions. Common uses for this element
are to implement a glossary or to display metadata (a list of
key-value pairs).
<dl>
<dt>Income:</dt>
<dd>1.200,00 €</dd>
<dt>Price:</dt>
<dd>$31,99</dd>
</dl>
I can't see anything more semantic than var either:
<var>1.200,00 <abbr title="EUR">€</abbr></var>
Use the var tag. Is described as: "Variable or user defined text"
<var> </var>