Render bibliography using `pandoc-citeproc` in HTML

Render bibliography using `pandoc-citeproc` in HTML - html

Is it possible to use pandoc (or pandoc-citeproc directly) to render a bibliography and references in an HTML document?
For a minimal example, assume I have some sort of bibliography:
#article{SomePerson2014, ...}
And an HTML file with Pandoc-style citations:
...
As mentioned in #SomePerson2014,
...
I want this to render to the following HTML code:
... As mentioned in <span class="citation">Person (2014)</span>, ...
<div class="references">
<p>Person, Some. 2014. “The Merits of Existence.” <em>The People’s Journal</em>.</p>
</div>

In this case, I resorted to exploiting Pandoc markdown's embedded HTML, basically rending an HTML fragment, embedded in "markdown", to HTML using the pandoc-citeproc filter.
It works, though the generated HTML isn't perfect (a lot of invalid <p> tags are inserted).

Related

What are these HTML <c- g> tags? Undefined custom element?

When looking at the source code of the HTML standard there were some tags that I didn't recognise..
For example in this snippet:
<pre><code class='idl'>[<c- g>Exposed</c->=<c- n>Window</c->]
<c- b>interface</c-> <dfn id='htmlparagraphelement' data-dfn-type='interface'><c- g>HTMLParagraphElement</c-></dfn> : <a id='the-p-element:htmlelement' href='dom.html#htmlelement'><c- n>HTMLElement</c-></a> {
[<a id='the-p-element:htmlconstructor' href='dom.html#htmlconstructor'><c- g>HTMLConstructor</c-></a>] <c- g>constructor</c->();
// <a href='obsolete.html#HTMLParagraphElement-partial'>also has obsolete members</a>
};</code></pre>
From https://html.spec.whatwg.org/multipage/grouping-content.html
I thought these may be custom elements, but it doesn't look like they are defined via the custom element registry.. This is the result of interrogating the customElements object.
>>> customElements.get('c')
undefined
>>> customElements.get('c-')
undefined
Is this allowed? (I'd guess so since it's from the HTML standard, but it's still surprising to me). How would the browser know how these elements are supposed to be displayed? For example display: block vs. display: inline.

These are custom-elements (and valid HTML), generated by bikeshed's highlighter.
There is no need to define these as customElements because they don't bring any particular behavior, all they do is to ... save bandwidth.
Here is the commit excerpt:
🚨 TERRIBLE-HACK-ALERT 🚨 Switch to using <c- kt> instead of <span clas…
…s='kt'> to cut the weight of highlighting in half. Still valid HTML!
So apparently by switching from <span class="kt"> to <c- kt> (and span.kt { to c-[kt]{) they saved half of the weight induced by their highlighting.
Though as they say, it's a "terrible-hack", which still can make sense when building a tool that generates the majority of Web Standards pages, which can get very lengthy.
Regarding the default display of such custom-element, I'll quote Alohci's comment which did put it nicely:
All elements take the initial, or inherited for inherited properties, value of each CSS property until specified otherwise. So they would be display:inline
And regarding your expectation to see only best practices in the specs sources, it's better not assume so. Read the content of these pages, don't look at how they're built.
Most HTML editors don't look at the tools that will generate the pages, they write the specs in a pseudo-HTML language full of templates.
Or as it's put in the source:
<!-- Note: This file is NOT HTML, it's a proprietary language that is then post-processed into HTML. -->

Escaped HTML markup being rendered in dangerouslySetInnerHTML?

I have a Gatsby + WP API blog setup (with Markdown enabled) and it's working great, except when I'm trying to display HTML markup as code snippets. I'm using escape characters (see below), but for some reason the HTML inside the <code>/<pre> tags is rendering as actual HTML instead of displaying as an HTML code snippet.
I understand that's what dangerouslySetInnerHTML is there to do, but I didn't think it would if I'm using the escape character <?
Here's the markup inside the WP blog post..
<pre class="language-markup"><code>
<div>
<p>Lorem ipsum...</p>
</div>
</code></pre>
And this is how I'm displaying the entire post content in the react component...
<section className="article-body" itemProp="articleBody"
dangerouslySetInnerHTML={{ __html: this.props.html }}
/>
The <div> and <p> tags rendering as HTML, instead of displayed as a code snippet..
Is there some other way I should be doing this? For the record I also tried this using a 'non-dangerously' method (react-render-html) with the same results.
-- UPDATE: --
I was able to display the HTML as a code snippet by replacing the <code> tag with <xmp>. I know this tag is no longer officially supported, and it's far from elegant, so I think I may try to separate code snippets from the rest of the content as suggested below.

I tried it in CodeSandbox, too - working as expected. If you're sure about data (escaping) received from WP API I affraid it's a Gatsby issue. There must be a place where it's modified (unescaped).
If data will be ok and you don't want to make deep ivestigation there could be workaround. Split article body and treat sections separately - texts and code snippets. The second wrap with code literal with sth like this:
const CodeBlock = (props) => {
return <section className="article-code">
<pre className="language"><code>{`${props.html}`}</code></pre>
</section>
}
Of course remove unused first and last line of original code/snippet block.

jekyll not linking to internal posts

Just started jekyll, and I want to display a link to one of my posts on the index.html page. I looked through the documentation and the following code appears to be what I'm suppose to do.
The following is in index.html
<p>......</p>
[Hello World]({% post_url 2015-01-19-soccer %})
<p>........ </p>
but it simply displays
.....
[Hello World]({% post_url 2015-01-19-soccer %})
.......
what am I doing wrong?

Since you used a mix of Markdown and HTML, which is causing the markdown processor to ignore anything in between the HTML blocks.
Markdown is also sometimes not processed when you have HTML right above the Markdown. (This is the case for you, since your example shows you have closed off the <p> tags)
There are a few ways around this.
Make sure there is a newline in between any HTML and Markdown, this will not show up as a <br> or a <p> in the final output, but rather ensures that the processor will convert the Markdown correctly.
So you should have something like this:
<p>......</p>
[Hello World]({% post_url 2015-01-19-soccer %})
<p>........ </p>
Notice the extra line there between the first <p></p> and the Markdown.
Use only HTML (this is as answered by user #topleft)
Use only Markdown, since <p> tags are supported.
Try the markdown=1 HTML attribute.
Markdown processors like Kramdown allow you to add an explicit tag to tell the processor to go through HTML blocks and process any Markdown there. I'm assuming you're using the default (which I believe is Redcarpet) and couldn't find the links on whether this is supported. But you can try this:
<div id="someDiv" markdown=1>
[This is a Markdown link that will be parsed](http://www.example.com)
</div>

You are using markdown language here, it won't work in html. You need to use that instead :
Hello World
site.baseurl default is empty
you can change it in _config.yml to suit your needs
for instance :
baseurl: "me/blog"

Is there a way to create your own html tag in HTML5?

I want to create something like
<menu>
<lunch>
<dish>aaa</dish>
<dish>bbb</dish>
</lunch>
<dinner>
<dish>ccc</dish>
</dinner>
</menu>
Can it be done in HTML5?
I know I can do it with
<ul id="menu">
<li>
<ul id="lunch">
<li class="dish">aaa</li>
<li class="dish">bbb</li>
</ul>
</li>
<li>
<ul id="dinner">
<li class="dish">ccc</li>
</ul>
</li>
</ul>
but it is so much less readable :(

You can use custom tags in browsers, although they won’t be HTML5 (see Are custom elements valid HTML5? and the HTML5 spec).
Let's assume you want to use a custom tag element called <stack>. Here's what you should do...
STEP 1
Normalize its attributes in your CSS Stylesheet (think css reset) -
Example:
stack{display:block;margin:0;padding:0;border:0; ... }
STEP 2
To get it to work in old versions of Internet Explorer, you need to append this script to the head (Important if you need it to work in older versions of IE!):
<!--[if lt IE 9]>
<script> document.createElement("stack"); </script>
<![endif]-->
Then you can use your custom tag freely.
<stack>Overflow</stack>
Feel free to set attributes as well...
<stack id="st2" class="nice"> hello </stack>

I'm not so sure about these answers. As I've just read:
"CUSTOM TAGS HAVE ALWAYS BEEN ALLOWED IN HTML."
http://www.crockford.com/html/
The point here being, that HTML was based on SGML. Unlike XML with its doctypes and schemas, HTML does not become invalid if a browser doesn't know a tag or two. Think of <marquee>. This has not been in the official standard. So while using it made your HTML page "officially unapproved", it didn't break the page either.
Then there is <keygen>, which was Netscape-specific, forgotten in HTML4 and rediscovered and now specified in HTML5.
And also we have custom tag attributes now, like data-XyZzz="..." allowed on all HTML5 tags.
So, while you shouldn't invent a whole custom unspecified markup salad of your own, it's not exactly forbidden to have custom tags in HTML. That is however, unless you want to send it with an +xml Content-Type or embed other XML namespaces, like SVG or MathML. This applies only to SGML-confined HTML.

I just want to add to the previous answers that there is a meaning to use only two-words tags for custom elements.
They should never be standardised.
For example, you want to use the tag <icon>, because you don't like <img>, and you don't like <i> neither...
Well, keep in mind that you're not the only one. Maybe in the future, w3c and/or browsers will specify/implement this tag.
At this time, browsers will probably implements native style for this tag and your website's design may break.
So I'm suggesting to use (according to this example) <img-icon>.
As a matter of fact, the tag <menu> is well defined ie not so used, but defined. It should contain <menuitem> which behave like <li>.

As Michael suggested in the comments, what you want to do is quite possible, but your nomenclature is wrong. You aren't "adding tags to HTML 5," you are creating a new XML document type with your own tags.
I did this for some projects at my last job. Some practical advice:
When you say you want to "add these to HTML 5," I assume what you really mean is that you want the pages to display correctly in a modern browser, without having to do a lot of work on the server side. This can be accomplished by inserting a "stylesheet processing instruction" at the top of the xml file, like <?xml-stylesheet type="text/xsl" href="menu.xsl"?>. Replace "menu.xsl" with the path to the XSL stylesheet that you create to convert your custom tags into HTML.
Caveats: Your file must be a well-formed XML document, complete with XML header <xml version="1.0">. XML is pickier than HTML about things like mismatched tags. Also, unlike HTML, tags are case-sensitive. You must also make sure that the web server is sending the files with the appropriate mime type "application/xml". Often the web server will be configured to do this automatically if the file extension is ".xml", but check.
Big Caveat: Finally, using the browsers' automatic XSL transformation, as I've described, is really best only for debugging and for limited applications where you have a lot of control. I used it successfully in setting up a simple intranet at my last employer, that was accessed only by a few dozen people at most. Not all browsers support XSL, and those that do don't have completely compatible implementations. So if your pages are to be released into the "wild," it's best to transform them all into HTML on the server side, which can be done with a command line tool, or with a button in many XML editors.

Creating your own tag names in HTML is not possible / not valid. That's what XML, SGML and other general markup languages are for.
What you probably want is
<div id="menu">
<div id="lunch">
<span class="dish">aaa</span>
<span class="dish">bbb</span>
</div>
<div id="dinner">
<span class="dish">ccc</span>
</div>
</div>
Or instead of <div/> and <span/> something like <ul/> and <li/>.
In order to make it look and function right, just hook up some CSS and Javascript.

Custom tags can be used in Safari, Chrome, Opera, and Firefox, at least as far as using them in place of "class=..." goes.
green {color: green} in css works for
<green>This is some text.</green>

<head>
<lunch>
<style type="text/css">
lunch{
color:blue;
font-size:32px;
}
</style>
</lunch>
</head>
<body>
<lunch>
This is how you create custom tags like what he is asking for its very simple just do what i wrote it works yeah no js or convoluted work arounds needed this lets you do exactly what he wrote.
</lunch>
</body>

For embedding metadata, you could try using HTML microdata, but it's even more verbose than using class names.
<div itemscope>
<p>My name is <span itemprop="name">Elizabeth</span>.</p>
</div>
<div itemscope>
<p>My name is <span itemprop="name">Daniel</span>.</p>
</div>

Besides writing an XSL stylesheet, as I described earlier, there is another approach, at least if you are certain that Firefox or another full-fledged XML browser will be used (i.e., NOT Internet Explorer). Skip the XSL transform, and write a complete CSS stylesheet that tells the browser how to format the XML directly. The upside here is that you wouldn't have to learn XSL, which many people find to be a difficult and counterintuitive language. The downside is that your CSS will have to specify the styling very completely, including what are block nodes, what are inlines, etc. Usually, when writing CSS, you can assume that the browser "knows" that <em>, for instance, is an inline node, but it won't have any idea what to do with <dish>.
Finally, its been a few years since I tried this, but my recollection is that IE (at least a few versions back) refused to apply CSS stylesheets directly to XML documents.

The point of HTML is that the tags included in the language have an agreed meaning, that everyone in the world can use and base decisions on - like default styling, or making links clickable, or submitting a form when you click on an <input type="submit">.
Made-up tags like yours are great for humans (because we can learn English and thus know, or at least guess, what your tags mean), but not so good for machines.

Polymer or X-tags allow you to build your own html tags. It is based on native browser's "shadow DOM".

In some circumstances, it may look like creating your own tag names just works fine.
However, this is just your browser's error handling routines at work. And the problem is, different browsers have different error handling routines!
See this example.
The first line contains two made-up elements, what and ever, and they get treated differently by different browsers. The text comes out red in IE11 and Edge, but black in other browsers.
For comparison, the second line is similar, except it contains only valid HTML elements, and it will therefore look the same in all browsers.
body {color:black; background:white;} /* reset */
what, ever:nth-of-type(2) {color:red}
code, span:nth-of-type(2) {color:red}
<p><what></what> <ever>test</ever></p>
<p><code></code> <span>test</span></p>
Another problem with made-up elements is that you won't know what the future holds. If you created a website a couple of years ago with tag names like picture, dialog, details, slot, template etc, expecting them to behave like spans, are you in trouble now!

This is not an option in any HTML specification :)
You can probably do what you want with <div> elements and classes, from the question I'm not sure exactly what you're after, but no, creating your own tags is not an option.

As Nick said, custom tags are not supported by any version of HTML.
But, it won't give any error if you use such markup in your HTML.
It seems like you want to create a list. You can use unordered list <ul> to create the rool elements, and use the <li> tag for the items underneath.
If that's not what you want to achieve, please specify exactly what you want. We can come up with an answer then.

You can add custom attribute through HTML 5 data- Attributes.
For example: Message
That is valid for HTML 5. See http://ejohn.org/blog/html-5-data-attributes/ to get details.

You can just do some custom css styling, this will create a tag that will make the background color red:
redback {background-color:red;}
<redback>This is red</redback>

you can use this:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>MyExample</title>
<style>
bloodred {color: red;}
</style>
</head>
<body>
<bloodred>
this is BLOODRED (not to scare you)
</bloodred>
</body>
<script>
var btn = document.createElement("BLOODRED")
</script>
</html>

I found this article on creating custom HTML tags and instantiating them. It simplifies the process and breaks it down into terms anyone can understand and utilize immediately -- but I'm not entirely sure the code samples it contains are valid in all browsers, so caveat emptor and test thoroughly. Nevertheless, it's a great introduction to the subject to get started.
Custom Elements : Defining new elements in HTML

How can I parse and normalize HTML from different HTML generators?

This is an extension of this question. I'm trying to parse HTML snippets embedded in an XML backup of a Blogger blog and retag them with InDesign tags.
Blogger doesn't standardize the HTML for any of its posts, and the posts can be written in Word, Windows Live Writer, the native Blogger interface, or text editors, resulting in tons of different forms of HTML. Some posts don't mark paragraphs and only use double <br>s in between paragraphs—others use actual <p> tags.
What's the best way to parse this unstandard conglomeration of tags?
Additionally, each post is not a complete HTML file--just a snippet that gets inserted into a template—which means that there is no overall HTML structure to parse (<html><body></body></html>, etc.) Does that have any effect on XML/HTML parsing?
Here's some potential examples, mostly standard HTML, missing paragraphs:
This is a section of a blog post. It has links and lists and stuff. Weee....
<br>
<br>
Here's a list
<br/>
<br />
<ul><li>Item 1</li><li>Item 2</li><ul>
And another paragraph here...
<br>
<br/>
Etc.
The Word HTML looks like this - http://www.timeatlas.com/mos/images/stories/word_html_tags.png

HTML::Parser?

The HTML generated by Word is relatively easier to deal with. I would just get rid of all the tag attributes (unless you care about styles). That would live you with fairly plain HTML which you can then style.
HTML::TokeParser::Simple can help make that relatively painless.
As for the other stuff, that will take some trial and error. I am going to think more about that and post later if I can think of something clever.
Later Update:
Well, here is something that makes me cringe a little but it seems to work:
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
use Text::Markdown qw( markdown );
my $html = read_file \*DATA;
$html =~ s{(?:<br(:? ?/)*>)}{\n\n}g;
print markdown( $html );
__DATA__
This is a section of a blog post. It has links and lists and stuff. Weee....
<br>
<br>
Here's a list
<br/>
<br />
<ul><li>Item 1</li><li>Item 2</li></ul>
And another paragraph here...
<br>
<br/>
Output:
<p>This is a section of a blog post. It has links and lists and
stuff. Weee....</p>
<p>Here's a list</p>
<ul><li>Item 1</li><li>Item 2</li></ul>
<p>And another paragraph here...</p>

As I said in the other question, I like XML::Twig. It can handle both XML and HTML.

FWIW, I tend to use XML::LibXML for all my XML and HTML needs. Here is a one-liner that will convert a line of "bad" HTML into a well-formed XHTML document:
perl -MXML::LibXML -ne 'my $p = XML::LibXML->new->parse_html_string($_); print $p->toString'
In your case, you probably want to use the DOM to emit a new document that has the correct tags. This is straightforward; XML::LibXML uses the same W3C DOM that JavaScript does.
As an example, this input:
<p>Foo<p>Bar<br>Baz!
Gets translated into:
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Foo</p><p>Bar<br/>Baz!
</p></body></html>
This is probably what you want, and remember, use the DOM to translate... don't worry about this printed representation.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008