How To (Semantically) Mark Up A (Theatre) Script / Play in HTML5

How To (Semantically) Mark Up A (Theatre) Script / Play in HTML5 - html

How To (Semantically) Mark Up A (Theatre) Script / Play in HTML5?
For obvious reasons, it's hard to search for "play" and "script" without a search engine thinking you mean “play a sound" and “JavaScript".
How can I mark up a script (as in the document one would give to actors in a play) such that it is semantically correct, and easy to style?
For example, let's take the start of Hamlet
Hamlet
ACT I
SCENE I Elsinore. A platform before the castle.
[FRANCISCO at his post. Enter to him BERNARDO]
BERNARDO Who's there?
FRANCISCO Nay, answer me: stand, and unfold yourself.
Fairly obviously, I think, one should start with
<h1 id="title">Hamlet</h1>
<h2 id="act-1">Act 1</h2>
<h3 id="scene-1">Scene 1</h3>
But, then I get stuck.
I've tried looking at MicroData, but Schema.org's CreativeWork[0] really doesn't contain much that would be useful in the case of a work of fiction.
Is it enough just to say
<p class="stage-direction">FRANCISCO at his post. Enter to him BERNARDO</p>
<p id="1"><span class="character bernardo">BERNARDO</span>Who's there?</p>
<p id="2"><span class="character francisco">FRANCISCO</span>Nay, answer me: stand, and unfold yourself.</p>
Or is there a better / more sensible way of doing things?
[0]http://schema.org/CreativeWork

It seems that the idea of precisely specifying markup for dialogue has been abandoned, and the W3C now simply offers some guidelines which pretty much equate to your idea of using paragraphs and spans.
Note that the dl element, which older sources - including the spec - had formerly recommended, should now definitely not be used: "The dl element is inappropriate for marking up dialogue".
But of course all this might change next week, or month, or year…

Does this provide any inspiration? caesar in xml

Related

<pre> or <p> with styling for code and formal language

First of all, I'm not a native English speaker, so please prepare my bad English.
In HTML, since I realize that style="white-space:pre;" attribute makes the element like as <pre> tag, I tend to use that attribute (optionally also use "font-family:monospace;" when I need), not <pre>. To me, <pre> feels very less semantic. Using stronger semantic tags with proper styling looks more logical.
Currently, my rule is like this:
preformatted plain text, bunch of characters -> use <pre>.
preformatted paragraph-like things -> use <p style="white-space:pre;"> (optionally add "font-family:monospace;").
Following that rule, I've been using <pre> tag only for ASCII art because I don't think ASCII art is a paragraph-like stuff. However, I think stanza ≒ paragraph, so I use <p style="white-space:pre;"> when I express verse things(poem, poetry, lyrics) in HTML.
My problem is.. I cannot decide what should I use for code. In other words, I'm confused whether the bunch of code is paragraph or not. In addition, both <pre> and <p> with styling looks having a point and logical.
Anyway.. Wikipedia says:
A paragraph is a self-contained unit of a discourse in writing dealing
with a particular point or idea.
Collins Cobuild Advanced Learner's English Dictionary says:
[NOUN] A paragraph is a section of a piece of writing.
I think those can be partially valid for code. Especially, these example is more confusing:
Let's a + b, and divide it by 2.
result = a + b result = result / 2
The only difference between them is one is natural language and the other is code. The natural language one is obviously normal paragraph, and that fact makes me feel the code also paragraph partially because they even has same discourse. The talk about 'code is documentation' or 'self-documentation code' also makes using <p> feel more right.
However, I do feel <pre> also logical. Especially when it is more like less human readable, more machine-like, raw and primal, like pure machine code (01010101100..) or morse code. I would tag them with <pre>, and though I cannot say why exactly, it looks more right. However, using <pre> for some code and using <p> for high level code feels inconsistent.
I don't think it's just simple preference problem, like both are fine, just pick any of them and be consistent', and the logical answer exists, which I need.
Any ideas?
Thanks in advance.

What's the difference between `<seg>` and `<span>`

What's the difference between a <seg> in XML and <span> in HTML? Here are two passages from Bibles, one from the English Bible in Christodouloupoulos' and Steedman's massively parallel Bible corpus,
<?xml version="1.0" ?>
<cesDoc version="4">
…
<text>
<body id="Bible" lang="en">
<div id="b.GEN" type="book">
<div id="b.GEN.1" type="chapter">
<seg id="b.GEN.1.1" type="verse">
In the beginning God created the heaven and the earth.
</seg>
<seg id="b.GEN.1.2" type="verse">
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
</seg>
…
and the other from the NIV English Bible at Bible Gateway, which is where they got most of their texts from:
<p class="chapter-1">
<span id="en-NIV-27932" class="text Rom-1-1">
<span class="chapternum">1 </span>
Paul, a servant of Christ Jesus, called to be an apostle and set apart for the gospel of God—
</span>
<span id="en-NIV-27933" class="text Rom-1-2">
<sup class="versenum">2 </sup>the gospel he promised beforehand through his prophets in the Holy Scriptures
</span>
…
In the HTML, a it seems a <span> can replace a <seg>, except that the HTML has added verse numbers in <span>. Oh, and the chapters are in <div>. So it's not one-to-one.
Of course, I realize that HTML and XML are different, and this is only one juxtaposition; I'm sure there are others out there. But I'm going to need to be able to display XML as HTML, and I don't want to anger the doctype gods. So, conceptually, how is <seg> different from <span> in purpose, meaning and usage?
Update: #jim-garrison, says I'm going to need to read the schema to understand the XML, but I'm a neophyte at that, too. In particular, I did find some official-looking documentation for <seg> by TEI that makes me think it's use is a little more than arbitrary, but I have no idea how to interpret this documentation. Should it give us a more specific answer than what Jim has already written?

The difference between XML and HTML generally is that the list of tags that can be present in XML is defined by a DTD or XML Schema, and tags represent document semantics and not presentation. So tags can be named anything. In HTML the set of tags is generally predefined, as if there was a pre-existing HTML DTD or schema, but HTML is not XML and doesn't follow all the rules of XML. While HTML was in some sense derived from the same parent as XML (SGML), and the two are superficially very similar, they are most definitely NOT the same thing.
The answer to your specific question is that the writers of the XML chose to use a tag named <seg> ("segment"?) to represent generalized strings of text, with attributes providing additional semantic information. For more details you'll need to find the DTD or XML schema that governs the content of the XML and read the documentation that goes with it.
But I'm going to need to be able to display XML as HTML, and I don't want to anger the doctype gods. So, conceptually, how does different from in purpose, meaning and usage?
This is where you will use XSLT to transform the input XML into valid HTML. To figure out how to do that transformation you will need to know the full semantics of all the tags that can appear (again, go to the documentation for the DTD/Schema) and decide on a visual representation for the data. There's no one answer to "how should a <seg>" be transformed. That's up to your requirements regarding presentation. One possible transformation converts <seg> tags to <span>, but that may depend on the value of certain attributes (type="verse" vs some other type). It might even differ depending on output medium (desktop vs tablet vs phone vs watch vs ...?)
Once you convert from XML to HTML you have left the realm of the Doctype gods and they have no interest in what you do :-) There's a whole different set of deities such as CSS-Cthulhu, Javascript-Janai'ngo (look it up), et al who will take great pleasure making your life miserable.

<abbr />-Element: aria-label or title attribute

While it is recommended to use the title attribute on the <abbr /> element, this has no effect on screen readers (at least not on chromevox).
<abbr title="as soon as possible">ASAP</abbr>
The thing that works is of course aria-label e.g:
<abbr aria-label="as soon as possible">ASAP</abbr>
So in order to be both semantically corrent and screen reader compatible I need to mark both:
<abbr aria-label="as soon as possible" title="as soon as possible">ASAP</abbr>
which seems a bit of a hack. why doesn't chromevox just read the title attribute instead?

In short : Despite one of the WCAG recommendations, abbr is not a perfect solution to explain the signification of an abbreviation to everyone, aria-label should be used when you want to announce the pronunciation of the abbreviation.
Screen readers are not supposed to read the title attribute as it is not intended to replace the aria-label. See also W3 warning:
http://www.w3.org/TR/html/dom.html#attr-title
Relying on the title attribute is currently discouraged as many user agents do not expose the attribute in an accessible manner as required by this specification (e.g. requiring a pointing device such as a mouse to cause a tooltip to appear, which excludes keyboard-only users and touch-only users, such as anyone with a modern phone or tablet).
I never encourage the use of the abbr tag for two reasons:
it's not a focusable element so you can't navigate through it using the keyboard to see the meaning of the abbreviation. If you intend to provide a pronounceable alternative then aria-label is definitely what you need.
For instance, when abbreviation is part of the language, you do need to explain it, but you can give a speech alternative :
Director: <span aria-label="Mister">Mr</span> Smith
Blind people do understand abbreviations just like most of us do,
For instance, the following sentence is something blind people can understand perfectly:
John Smith of the NATO was arrested by the FBI.
And the following one is far less understandable
John Smith of the North Atlantic Treaty Organization was arrested by the Federal Bureau of Investigation.
As abbr is used for acronyms and abbreviations you should use the CSS property speak:spell-out to announce that an element must be spelled-out. You can use abbr tag to semantically indicate that it's an abbreviation or an acronym, but it won't have any effect on the global accessibility.
If you do consider that the abbreviation needs an explanation (intended for everyone and not only for blind people) then you should give this explanation in full words without requiring the user to mouseover the abbreviation to see a small tooltip.
Bad example, when abbreviation doesn't help the readability:
<abbr title="Doctor">Dr.</abbr> Smith is located on Lincoln <abbr title="Drive">Dr.</abbr>
Good example (simple is better):
Doctor Smith is located on Lincoln Drive
WCAG promote many other methods than using abbr tag:
Providing the expansion or explanation of an abbreviation
Providing the first use of an abbreviation immediately before or after the expanded form
Linking to definitions

I'm sorry to add another totally different answer but I think both answers should not be merged:
As ChromeVox is opensource, I have a second and now technical answer to your question.
http://src.chromium.org/svn/trunk/src/chrome/browser/resources/chromeos/chromevox/common/dom_util.js
For any element (and there is no exception for abbreviations) ChromeVox fallbacks to the title attribute if there is no text content in the node (node.textContent.length==0)
Here is the order of precedence defined in the code:
Text content if it's a text node.
aria-labelledby
aria-label
alt (for an image)
title (only if there is no text content)
label (for a control)
placeholder (for an input element)
recursive calls to children
Now, it's kind of a buggy situation
This example, in my opinion, correctly reads "BBC":
<abbr title="British Broadcasting Corporation">BBC</abbr>
This one announces "British Broadcasting Corporation": which is a correct fallback to an invalid markup
<abbr title="British Broadcasting Corporation"></abbr>
But this one doesn't read anything, because the node text length is not null
<abbr title="British Broadcasting Corporation"> </abbr>
If we except the last bug, it is not a perfect but quite consistent implementation of Text Alternative Computation
[F. Otherwise, look in the subtree]
G. Otherwise, if the current node is a Text node, return its textual contents.
H. Otherwise, if the current node has a tooltip attribute, return its value.
Note that according to the document referenced in one comment above, the title attribute, if present, should now be used by accessibility api instead of the text content: (http://rawgit.com/w3c/aria/master/html-aam/html-aam.html#text-level-elements-not-listed-elsewhere). I'm not sure it's a good thing as the title attribute was previously and is still defined as the following by the W3.
The title attribute represents advisory information for the element, such as would be appropriate for a tooltip

Can I slice a word with </span> for the sake of structured data?

I have this line inside a ProfessionalService itemscope:
Az <span itemprop="makesOffer">ágyi poloska irtását</span> permetezéses módszerrel végezzük.
This is in Hungarian and the problem comes from my language too. For search engines I would like to communicate the offer is "ágyi poloska irtás" without the addendum "át" so it would look like this:
Az <span itemprop="makesOffer">ágyi poloska irtás</span>át permetezéses módszerrel végezzük.
Is this legal? Can I break a word with a </span> closing tag?
Sorry I can't come up with an English example. The example sentence is about how the company exterminate bed bugs it would read like this in English: The <span itemprop="makesOffer">bed bug extermination</span> done by spraying method. but in English it works.

Yes, it is valid and it can make sense to do this.
Any conforming Microdata parser will get the value "ágyi poloska irtás" for the property makesOffer.
Following the HTML5 specification, consumers would have no reason to break the word (e.g., by adding whitespace or a line break) if it contains a span element (… which does not necessarily mean that you won’t find consumers that do this nonetheless).

Parsing HTML into JSON

I've been tasked with getting all the SMS updates from this page and putting them into a JSON feed using Yahoo Pipes. I'm not entirely sure how I would get each update, as they are not individual elements, but just a collection of title, etc. Any shared wisdom would be much appreciated!

<h1 id="blogtitle">SMS Update</h1>
<div class="blogposttime blogdetail">Left at 2nd January 2010 at 01:12</div>
<div class="blogcategories blogdetail">Recieved by SMS (Location: Pokhara - Nepal)</div>
<p class="blogpostmessage">
RACE DAY! We took the extra day off to pimp the rick some more, including a huge Australian flag. Quiet night at a pub with 6 other teams. Time for brekkie and then we're off to the rickshaw grounds for 8:30 for 10am start.
</p>
That seems a fairely easy job for a DOM/XML parser.
Since the blocks are not enclosed in XML tags you could look for elements that are present in each block, for example the <h1 id="blogtitle">SMS Update</h1> defines the start of a new block.
Use your DOM parser to look for all the elements with id blogtitle. At this point you can use a DOM function to reference the nextSibling of the blogtitle element. All you need is the 3 siblings after the blogtitle element.
With a little work you can easily use this logic to build your JSON object.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008