How to get links output in a printer-friendly fashion, e.g. as numbered-citation style incuding a list of links at the end? - html

pandoc turns input such as
See [that site](http://my.link)
into
See that site
which means the link information will get lost in printing. I would like to get some printer-friendly version, i.e. the links numbered
See [1]
(code See [[1]](http://my.link "that site")
and at the end (or optionally as a footnote when using xelatex to get a pdf) a summary of all links, i.e.
[1] that site: http://my.link
(whether the original link title shall be in this list or not is optional).
How can this be achieved? Via a filter or is there already some switch for that?

You can use footnotes:
Here is a footnote reference[^1]
[^1]: Here is the [footnote](http://my.link)
which will also result in footnotes in HTML output. For more flexibility, see this answer, e.g. this will only work for LaTeX/PDF output:
pandoc -o myfile.pdf -V links-as-notes=true myfile.md
Edit: this works only if you have the following in your template (from the default-template):
$if(links-as-notes)$
% Make links footnotes instead of hotlinks:
\renewcommand{\href}[2]{#2\footnote{\url{#1}}}
$endif$

Related

Convert markdown table of contents to HTML

I am trying to convert a markdown document to HTML, using pandoc. I cannot get the HTML output to create the table of contents correctly.
Issue:
I have added a table of contents to the markdown doc, where clicking on each header takes the reader to the relevant section. I am using the format below, where clicking on 'Header Title' will send the reader to the section 'header' in the document:
[Header Title](#header)
I tried to convert this to HTML using the pandoc command
pandoc -i input.md -f markdown -t html -o input.html
This creates a valid HTML file I can open in Firefox, and the items in the table of contents show up as links - but when I click them, nothing happens (I am expecting it to jump to the relevant section)
This happens when I use either markdown or markdown_github as the input format (-i in pandoc)
Question:
How can I get the table of contents to show the expected behavior in HTML?
Or is the concept of 'table of contents' a wrong approach to HTML, and I should change my markdown code?
Apologies if I am going about this the wrong way, I have no experience with HTML / web documents.
I found a couple of similar questions but they seemed to be specific to other programming languages / tools, so any help how I can achieve this with markdown / pandoc is much appreciated.
I am using pandoc 1.19.2.4 on Ubuntu.
Example markdown:
- [Chapter 1](#chapter-1)
- [1. Reading a text file](#1-reading-a-text-file)
## Chapter 1
This post focuses on standard text processing tasks such as reading files and processing text.
### 1. Reading a text file
Reading a file.
Looking at your markdown file, you have used #1-reading-a-text-file as the id for the 1st subheading.
While converting it to HTML, the following line is generated for the subheading:
<h3 id="reading-a-text-file">1. Reading a text file</h3>
The problem is the mismatch of "#1" which is present in the table of contents, but not in the heading.
My guess is that pandoc does not allow HTML id to start with a number.
Changing the table of contents to the following should work:
- [Chapter 1](#chapter-1)
- [1. Reading a text file](#reading-a-text-file)

Markdown TOC with Special Characters?

I am trying to create a TOC for my Markdown blog.
The methods I am finding here... : Markdown to create pages and table of contents?
....do not work for me because I am naming all of my headers # _</>_ The Setup because I am using CSS on to style the "", giving each header a nice colored Icon next to it. If I simply use ```# The Setup ```` it works great.
This causes issues whenever I try to use [The Setup](#The-Setup).
I tried a few things like [The Setup](#_</>_-The-Setup) and other things, but I can not get it to work.
If someone can point me in the right direction I would greatly appreciate it. Also, if anyone has a better way of adding custom icons next to headers, I think that would be the better way to go about it.
As always, thanks in advance.
The general solution is to examine the rendered HTML output to see what the tool is converting the special characters to, in the HTML's element ID. Every tool could handle the conversion differently (it could convert special characters to -, _, or just remove special characters). Some examples:
<h1 id="_____the-setup">The Setup</h1>
<h1 id="-the-setup">The Setup</h1>
<h1 id="the-setup">The Setup</h1>
Once you have identified the exact id that the tool is using, then you use that value as the heading link in the markdown's table of contents. For example:
[The Setup](#_____the-setup)
Now, the tricky part is that not all Markdown tools will export the rendered HTML, including VS Code. The workaround for VS Code is:
Open the markdown preview mode (which renders to html internally).
Open the VS Code Developer Tools (Help > Toggle Developer Tools).
Use DevTools to inspect the element (in this case, the heading element for "The Setup").
I see that VS Code named the id as the-setup, so in the markdown's table of contents, I write [The Setup](#the-setup). Now the table of content hyperlink works in VS Code. Caveat: it might not work in other Markdown tools if they render a different HTML element ID!
Another shortcut now available in VS Code (1.70 July 2022), is that markdown can autocomplete the header ID. So you just type #, and it will list the valid IDs:

Lua filter for pandoc to append html

I'm currently compiling markdown to html using pandoc:
pandoc in.md -o out.html
and would like to include the same piece of html code in each of the output files, without having to write it into my markdown file.
I was hoping that a lua filter would do the job. However, the docs seem to indicate the filters will only respond to a sequence of characters within my markdown file, rather than appending something to each file.
I've played around with CSS (I've never used it before), but it doesn't look like I can just add arbitrary html code like this (correct me if I'm wrong).
To summarize, I'd like to find a way to add html code to my output.
A Lua filter is likely to be overkill here. Pandoc has an option --include-after-body (or --include-before-body) which will do what you need:
-A FILE, --include-after-body=FILE|URL
Include contents of FILE, verbatim, at the end of the document body (before the </body> tag in HTML, or the \end{document} command in LaTeX). This option can be used repeatedly to include multiple files. They will be included in the order specified. Implies --standalone.

Jekyll/Octopress text modification. Filter, generator...converter?

I am building an octopress blog. In that blog, a number of entries have footnotes. The markdown files currently denote a footnote like so:
"This is the main text <footnote>and this is the footnote</footnote> where
we speak of main-text things"
What I want to do is extract the footnotes from the body text and then have access to both the main text AND the footnotes as variables in the layout.
I've made some progress with this by creating a filter but it doesn't work very well because filters always output directly on return and I need to format the footnotes.
Would a generator be more appropriate? A converter? Should I not be using liquid tags at all in this case?
Filters make the most sense to me. Is there a way to get the return value of a filter without it printing to the screen? I currently use this:
{{ content | footnotes }}
But that just dumps the array as one big, unformatted array. If it isn't blindingly obvious already, I'm just getting started with Liquid and I'm a little confused.
Depending on your markdown parser you could just write the footnotes normally in the markdown. This is what I'm using on my blog. This is my config in the _config.yml file:
markdown: rdiscount
rdiscount:
extensions:
- autolink
- footnotes
- smart
Then I just use footnotes by using [^1] to specify the footnote and
[^1]: My footnote
To show it at the bottom of the screen.
Or are you trying to show footnotes at some other part of the screen and not at the bottom of the post?

Full urls of images of a given page on Wikipedia (only those I see on the page)

I'd want to extract all full urls of images of "Google"'s page on Wikipedia
I have tried with:
http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
but, in this way, I got also not google-related images, such as:
http://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/commons/f/fe/Crystal_Clear_app_browser.png
How can I extract just only images that I see on Google page
Retrieve page source code, https://en.wikipedia.org/w/index.php?title=Google&action=raw
Scan it for substrings like [[File:Google web search.png|thumb|left|On February 14, 2012, Google updated its homepage with a minor twist. There are no red lines above the options in the black bar, and there is a tab space before the "+You". The sign-in button has also changed, it is no longer in the black bar, instead under it as a button.]]
Ask API for all pictures on page, http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json
Filter out urls but those which match picture names found in step 2.
Steps 2 and 4 need more explanation.
#2. Regexp /\b(File|Image):[^]|\n\r]+/ should be enough. In Ruby's regexps, \b denotes word boundary which might be unsupported in language of your choice. Regexp I proposed will match all cases which come to my mind: [[File:something.jpg]], gallery tags: <gallery>\nFile:one.jpg\nFile:two.jpg\n</gallery>, templates: {{Infobox|pic = File:something.jpg}}. However, it won't match filenames which contain ]. I'm not sure if they're legal, but if they are, they must be very uncommon and it should not be a big deal.
If you want to match only constructs like this: [[File:something.jpg|thumb|description]], following regexp will work better: /\[\[(File|Image):[^]|]+/
#4. I'd remove all characters from names which match /[^A-Za-z0-9]/. It's easier than escaping them and, in most cases, enough.
Icons are most often attached in templates, contrary to pictures related to article subject, which are most often attached directly ([[File:…]]). There are exceptions though, for example in some articles pictures are attached with {{Gallery}} template. There is also <gallery> tag which introduces special syntax for galleries. You got to tune my solution to your needs, and even then it won't be perfect, but it should be good enough.