Markdown/Jekyll: Auto-surround bare URLs with angle brackets?

Markdown/Jekyll: Auto-surround bare URLs with angle brackets? - html

I notice that some Markdown parsers and GitHub will auto-convert bare URLs to links, but others (like Kramdown) don't. The standard Markdown syntax requires that URLs be wrapped in angle brackets, e.g. <https://www.google.com/>.
I have a number of documents with bare URLs that appear as desired, i.e. as hyperlinks, in my Markdown editor but are not getting rendered as links when I push them in Jekyll to GitHub Pages.
How can I write a script to surround bare URLs with angle brackets?
Preferably via shell scripting, standard command line tools (sed, awk) or Python. Or perhaps there's already a Jekyll plugin for this?
I know that matching URLs is highly nontrivial, so wanted to ask here on SO before getting too deep into this.
Further difficulty: The solution should only change bare URLs, and leave alone URLs that have already been wrapped/encoded via standards-compliant Markdown or HTML.
(I expected this to be a common question, and it is in various GitHub-Issues posts for various packages, with no solutions... But tried searching for this question here and couldn't find it already asked, nor any premade Jekyll solutions. I found many questions about matching when the angle brackets are already there, but not ones to add the angle brackets. Yet I'm imagining the solution has been implemented many, many times -- in the very tools we use, such as GitHub and MathOverflow -- so, not sure why the means to do this isn't widely posted.)

You may try the below regex:
(?!<)^(https?:\/\/(?:www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()#:%_\+.~#?&\/\/=]*))(?!>)$
Explanation of the above regex:
(?!<) - Represents negative look-ahead not matching the string if it starts with a <.
^, $ - Represents start and end of line respectively.
(https?:\/\/(?:www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()#:%_\+.~#?&\/\/=]*)) - This part matches all the possible valid urls efficiently.
(?!>) - Represents negative look-ahead not matching if the url ends in >.
You can find the demo of the above regex in here.
NOTE: I also prefer using perl command if it comes to implementation in bash. But if it is your necessary requirement to use sed then you can try the below command. However; please be noted that sed misses many amazing features of regex namely; look-arounds, non-captured groups, etc.
sed -E 's#^[^<]?(https?://(www\.)?[-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()#:%_\+.~#?&/=]*))[^>]?$#<\1>#gm'
You can find sample run of the perl and sed implementation in here.

use pandoc to convert between different markdown flavors
example:
pandoc -f gfm+hard_line_breaks -t markdown_strict in.md -o out.md
gfm means github-flavored markdown
demo:
pandoc -f gfm+hard_line_breaks -t markdown_strict <<<$'
https://asdf.com
<https://asdf.com>
[asdf](https://asdf.com)
' | perl -pe 's/\n/¶\n/'
<https://asdf.com> ¶
<https://asdf.com> ¶
[asdf](https://asdf.com)¶
my example also converts from hard linebreaks (\n renders as linebreak) to soft linebreaks (\n renders as space). i added pilcrows (¶) to clarify the output
live demo

Related

Showing LAtex formulas in GitHub markdown.md [duplicate]

Is there any way to render LaTex in README.md in a GitHub repository? I've googled it and searched on stack overflow but none of the related answers seems feasible.

For short expresions and not so fancy math you could use the inline HTML to get your latex rendered math on codecogs and then embed the resulting image. Here an example:
- <img src="https://latex.codecogs.com/gif.latex?O_t=\text { Onset event at time bin } t " />
- <img src="https://latex.codecogs.com/gif.latex?s=\text { sensor reading } " />
- <img src="https://latex.codecogs.com/gif.latex?P(s | O_t )=\text { Probability of a sensor reading value when sleep onset is observed at a time bin } t " />
Which should result in something like the next
Update: This works great in eclipse but not in github unfortunately. The only work around is the next:
Take your latex equation and go to http://www.codecogs.com/latex/eqneditor.php, at the bottom of the area where your equation appears displayed there is a tiny dropdown menu, pick URL encoded and then paste that in your github markdown in the next way:
![equation](http://latex.codecogs.com/gif.latex?O_t%3D%5Ctext%20%7B%20Onset%20event%20at%20time%20bin%20%7D%20t)
![equation](http://latex.codecogs.com/gif.latex?s%3D%5Ctext%20%7B%20sensor%20reading%20%7D)
![equation](http://latex.codecogs.com/gif.latex?P%28s%20%7C%20O_t%20%29%3D%5Ctext%20%7B%20Probability%20of%20a%20sensor%20reading%20value%20when%20sleep%20onset%20is%20observed%20at%20a%20time%20bin%20%7D%20t)

I upload repositories with equations to Gitlab because it has native support for LaTeX in .md files:
```math
SE = \frac{\sigma}{\sqrt{n}}
```
The syntax for inline latex is $`\sqrt{2}`$.
Gitlab renders equations with JavaScript in the browser instead of showing images, which improves the quality of equations.
More info here.
Let's hope Github will implement this as well in the future.

My trick is to use the Jupyter Notebook.
GitHub has built-in support for rendering .ipynb files. You can write inline and display LaTeX code in the notebook and GitHub will render it for you.
Here's a sample notebook file: https://gist.github.com/cyhsutw/d5983d166fb70ff651f027b2aa56ee4e

Readme2Tex
I've been working on a script that automates most of the cruft out of getting LaTeX typeset nicely into Github-flavored markdown: https://github.com/leegao/readme2tex
There are a few challenges with rendering LaTeX for Github. First, Github-flavored markdown strips most tags and most attributes. This means no Javascript based libraries (like Mathjax) nor any CSS styling.
The natural solution then seems to be to embed images of precompiled equations. However, you'll soon realize that LaTeX does more than just turning dollar-sign enclosed formulas into images.
Simply embedding images from online compilers gives this really unnatural look to your document. In fact, I would argue that it's even more readable in your everyday x^2 mathematical slang than jumpy .
I believe that making sure that your documents are typeset in a natural and readable way is important. This is why I wrote a script that, beyond compiling formulas into images, also ensures that the resulting image is properly fitted and aligned to the rest of the text.
For example, here is an excerpt from a .md file regarding some enumerative properties of regular expressions typeset using readme2tex:
As you might expect, the set of equations at the top is specified by just starting the corresponding align* environment
**Theorem**: The translation $[\![e]\!]$ given by
\begin{align*}
...
\end{align*}
...
Notice that while inline equations ($...$) run with the text, display equations (those that are delimited by \begin{ENV}...\end{ENV} or $$...$$) are centered. This makes it easy for people who are already accustomed to LaTeX to keep being productive.
If this sounds like something that could help, make sure to check it out. https://github.com/leegao/readme2tex

Since May 2022, this has been officially supported:
Inline:
Where $x = 0$, evaluate $x + 1$
Blocks:
Where
$$x = 0$$
Evaluate
$$x + 1$$

One can also use this online editor: https://www.codecogs.com/latex/eqneditor.php which generates SVG files on the fly. You can put a link in your document like this:
![](https://latex.codecogs.com/svg.latex?y%3Dx%5E2) which results in:
.

I test some solution proposed by others and I would like to recommend TeXify created and proposed in comment by agurodriguez and further described by Tom Hale - I would like develop his answer and give some reason why this is very good solution:
TeXify is wrapper of Readme2Tex (mention in Lee answer). To use Readme2Tex you must install a lot of software in your local machine (python, latex, ...) - but TeXify is github plugin so you don't need to install anything in your local machine - you only need to online installation that plugin in you github account by pressing one button and choose repositories for which TeXify will have read/write access to parse your tex formulas and generate pictures.
When in your repository you create or update *.tex.md file, the TeXify will detect changes and generate *.md file where latex formulas will be exchanged by its pictures saved in tex directory in your repo. So if you create README.tex.md file then TeXify will generate README.md with pictures instead tex formulas. So parsing tex formulas and generate documentation is done automagically on each commit&push :)
Because all your formulas are changed into pictures in tex directory and README.md file use links to that pictures, you can even uninstall TeXify and all your old documentation will still works :). The tex directory and *.tex.md files will stay on repository so you have access to your original latex formulas and pictures (you can also safely store in tex directory your other documentation pictures "made by hand" - TeXify will not touch them).
You can use equations latex syntax directly in README.tex.md file (without loosing .md markdown syntax) which is very handy. Julii in his answer proposed to use special links (with formulas) to external service e.g . http://latex.codecogs.com/gif.latex?s%3D%5Ctext%20%7B%20sensor%20reading%20%7D which is good however has some drawbacks: the formulas in links are not easy (handy) to read and update, and if there will be some problem with that third-party service your old documentation will stop work... In TeXify your old documentation will works always even if you uninstall that plugin (because all your pictures generated from latex formulas are stay in repo in tex directory).
The Yuchao Jiang in his answer, proposed to use Jupyter Notebook which is also nice however have som drawbacks: you cannot use formulas directly in README.md file, you need to make link there to other file *.ipynb in your repo which contains latex (MathJax) formulas. The file *.ipynb format is JSON which is not handy to maintain (e.g. Gist don't show detailed error with line number in *.ipynb file when you forgot to put comma in proper place...).
Here is link to some of my repo where I use TeXify for which documentation was generated from README.tex.md file.
Update
Today 2020.12.13 I realised that TeXify plugin stop working - even after reinstallation :(

For automatic conversion upon push to GitHub, take a look at the TeXify app:
GitHub App that looks in your pushes for files with extension *.tex.md and renders it's TeX expressions as SVG images
How it works (from the source repository):
Whenever you push TeXify will run and seach for *.tex.md files in your last commit. For each one of those it'll run readme2tex which will take LaTeX expressions enclosed between dollar signs, convert it to plain SVG images, and then save the output into a .md extension file (That means that a file named README.tex.md will be processed and the output will be saved as README.md). After that, the output file and the new SVG images are then commited and pushed back to your repo.

I just published a new version of xhub, a browser extension that renders LaTeX (and other things) in GitHub pages.
Cons:
You have to install the extension once.
Pros:
No need to set up anything.
Just write Markdown with math
Display math:
```math
e^{i\pi} + 1 = 0
```
and line math $`a^2 + b^2 = c^2`$.
(Syntax like on GitLab.)
Works on light and dark background. (Math has text-color)
You can copy-and-paste the math just like text
As an example, check out this GitHub README:

You can get a continuous integration service (e.g. Travis CI) to render LaTeX and commit results to github. CI will deploy a "cloud" worker after each new commit. The worker compiles your document into pdf and either cuses ImageMagick to convert it to an image or uses PanDoc to attempt LaTeX->HTML conversion where success may vary depending on your document. Worker then commits image or html to your repository from where it can be shown in your readme.
Sample TravisCi config that builds a PDF, converts it to a PNG and commits it to a static location in your repo is pasted below. You would need to add a line that fetches pdfconverts PDF to an image
sudo: required
dist: trusty
os: linux
language: generic
services: docker
env:
global:
- GIT_NAME: Travis CI
- GIT_EMAIL: builds#travis-ci.org
- TRAVIS_REPO_SLUG: your-github-username/your-repo
- GIT_BRANCH: master
# I recommend storing your GitHub Access token as a secret key in a Travis CI environment variable, for example $GH_TOKEN.
- secure: ${GH_TOKEN}
script:
- wget https://raw.githubusercontent.com/blang/latex-docker/master/latexdockercmd.sh
- chmod +x latexdockercmd.sh
- "./latexdockercmd.sh latexmk -cd -f -interaction=batchmode -pdf yourdocument.tex -outdir=$TRAVIS_BUILD_DIR/"
- cd $TRAVIS_BUILD_DIR
- convert -density 300 -quality 90 yourdocument.pdf yourdocument.png
- git checkout --orphan $TRAVIS_BRANCH-pdf
- git rm -rf .
- git add -f yourdoc*.png
- git -c user.name='travis' -c user.email='travis' commit -m "updated PDF"
# note we are again using GitHub access key stored in the CI environment variable
- git push -q -f https://your-github-username:$GH_TOKEN#github.com/$TRAVIS_REPO_SLUG $TRAVIS_BRANCH-pdf
notifications:
email: false
This Travis Ci configuration launches a Ubuntu worker downloads a latex docker image, compiles your document to pdf and commits it to a branch called branchanme-pdf.
For more examples see this github repo and its accompanying sx discussion, PanDoc example,
https://dfm.io/posts/travis-latex/, and this post on Medium.

I have been looking around and found that this answer in another question works best for me. i.e. use githubcontent math renderer, e.g. to display:
Use this link
Beware of the latex needs to be url encoded, but otherwise work quite well for me.

If you are having issues with https://www.codecogs.com/latex/eqneditor.php, I found that https://alexanderrodin.com/github-latex-markdown/ worked for me. It generates the Markdown code you need, so you just cut and paste it into your README.md document.

You may also take a look on my tool latexMarkdown2Markdown which convert LaTeX to SVG and generate a table of content with chapter numbering.

Good news!
According to this blogpost, now GitHub supports Mathjax in readme files.
You can use in-line LaTeX inspired syntax using $ delimiters, or in-blocks using $$ delimiters.

Writing inline expressions:
This sentence uses $ delimiters to show math inline:
$\sqrt{3x-1}+(1+x)^2$
Writing expressions as blocks:
The Cauchy-Schwarz Inequality
$$\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2
\right) \left( \sum_{k=1}^n b_k^2 \right)$$
Source: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/writing-mathematical-expressions

You can use markdowns, e.g.
![equ](https://latex.codecogs.com/gif.latex?log(y)=\beta_0&space;&plus;&space;\beta_1&space;x&space;&plus;&space;u)
Code can be typed here: https://www.codecogs.com/latex/eqneditor.php.

Edit: As germanium pointed out, it does not work for README.md but other git pages though no explanation is available.
My quick solution is this
step 1. Add latex to your .md file
$$x=\sqrt{2}$$
Note: math eqns must be in $$...$$ or \\(... \\).
step 2. Add the following to your scripts.html or theme file (append this code at the end)
<script type="text/javascript" async
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
Done!. See your eq. by loading the page.

How to convince VS code to accept # as comment in JSON files?

We have special files that contain JSON data mixed with # comments.
I figured I need to enhance Code's json.settings file with:
"files.associations": {
"*.ourextension": "jsonc"
}
but then I discovered that jsconc is about JSON data with // comments.
Is there a convenient way to get VS code to accept # comments in JSON data?
Edit: VS code recognizes the jsconc language, it gives me this error message:
And it also accepts // comments:
adding the // got me a green first line, and now the second line gets the first error (because starting with #).

If you use the Change Language mode command (or click on the language indicator on the status bar) you can select "jsonc JSON with Comments".
I think this is only auto-detected when the extension is .jsonc.
NB. JSON with comments uses JavaScript style single line comments: from a \\, outside a string literal, to the end of line.
To support some different comment indicator would require a new language mode (and an extension to add it).

A distinct non-answer: it might be possible to add a such a new language definition, but it would require quite a bit of work. I also had a quick look if I could simply change the corresponding config json file for jsonc that ships with VS code, but that file is rather complex, and would probably be overridden with the next VS code update.
Thus a straight forward workaround. Two scripts to replace one command style with the other:
#!/bin/sh
# a helper script that turns all # into //
# with the syntax that works for sed on MAC OS
for file in "$#"
do
sed -i '' -e 's,#,//,g' $file
done
Not exactly convenient, but fast and robust, given our specific requirements.

How to get absolute URLs in Bash

I want to get all URLs from specific page in Bash.
This problem is already solved here: Easiest way to extract the urls from an html page using sed or awk only
The trick, however, is to parse relative links into absolute ones. So if http://example.com/ contains links like:
About us
<script type="text/javascript" src="media/blah.js"></a>
I want the results to have following form:
http://example.com/about.html
http://example.com/media/blah.js
How can I do so with as little dependencies as possible?

Simply put, there is no simple solution. Having little dependencies leads to unsightly code, and vice versa: code robustness leads to higher dependency requirements.
Having this in mind, below I describe a few solutions and sum them up by providing pros and cons of each one.
Approach 1
You can use wget's -k option together with some regular expressions (read more about parsing HTML that way).
From Linux manual:
-k
--convert-links
After the download is complete, convert the links in the document to
make them suitable for local viewing.
(...)
The links to files that have not been downloaded by Wget will be
changed to include host name and absolute path of the location they
point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif
(or to ../bar/img.gif), then the link in doc.html will be modified to
point to http://hostname/bar/img.gif.
An example script:
#wget needs a file in order for -k to work
tmpfil=$(mktemp);
#-k - convert links
#-q - suppress output
#-O - redirect output to given file
wget http://example.com -k -q -O "$tmpfil";
#-o - print only matching parts
#you could use any other popular regex here
grep -o "http://[^'\"<>]*" "$tmpfil"
#remove unnecessary file
rm "$tmpfil"
Pros:
Works out of the box on most systems, assuming you have wget installed.
In most cases, this will be sufficient solution.
Cons:
Features regular expressions, which are bound to break on some exotic pages due to HTML hierarchical model standing below regular expressions in Chomsky hierarchy.
You cannot pass a location in your local file system; you must pass working URL.
Approach 2
You can use Python together with BeautifulSoup. An example script:
#!/usr/bin/python
import sys
import urllib
import urlparse
import BeautifulSoup
if len(sys.argv) <= 1:
print >>sys.stderr, 'Missing URL argument'
sys.exit(1)
content = urllib.urlopen(sys.argv[1]).read()
soup = BeautifulSoup.BeautifulSoup(content)
for anchor in soup.findAll('a', href=True):
print urlparse.urljoin(sys.argv[1], anchor.get('href'))
And then:
dummy:~$ ./test.py http://example.com
Pros:
It's the correct way to handle HTML, since it's properly using fully-fledged parser.
Exotic output is very likely going to be handled well.
With small modifications, this approach works for files, not URLs only.
With small modifications, you might even be able to give your own base URL.
Cons:
It needs Python.
It needs Python with custom package.
You need to manually handle tags and attributes like <img src>, <link src>, <script src> etc (which isn't presented in the script above).
Approach 3
You can use some features of lynx. (This one was mentioned in the answer you provided in your question.) Example:
lynx http://example.com/ -dump -listonly -nonumbers
Pros:
Very concise usage.
Works well with all kind of HTML.
Cons:
You need Lynx.
Although you can extract links from files as well, you cannot control the base URL and you end up with file://localhost/ links. You can fix this using ugly hacks like manual inserting <base href=""> tag into HTML.

Another option is my Xidel (XQuery/Webscraper):
For all normal links:
xidel http://example.com/ -e '//a/resolve-uri(#href)'
For all links and srcs:
xidel http://example.com/ -e '(//#href, //#src)/resolve-uri(.)'
With rr-'s format:
Pros :
Very concise usage.
Works well with all kind of HTML.
It's the correct way to handle HTML, since it's properly using fully-fledged parser.
Works for files and urls
You can give your own base URL. (with resolve-uri(#href, "baseurl"))
No dependancies except Xidel (except openssl, if you also have https urls)
Cons:
You need Xidel, which is not contained in any standard repository

Why not simply this ?
re='(src|href)='
baseurl='example.com'
wget -O- "http://$baseurl" | awk -F'(src|href)=' -F\" "/$re/{print $baseurl\$2}"
you just need wget and awk.
Feel free to improve the snippet a bit if you have both relative & absolute urls at the same time.

Convert MediaWiki wikitext format to HTML using command line

I tend to write a good amount of documentation so the MediaWiki format to me is easy for me to understand plus it saves me a lot of time than having to write traditional HTML. I, however, also write a blog and find that switching from keyboard to mouse all the time to input the correct tags for HTML adds a lot of time. I'd like to be able to write my articles in Mediawiki syntax and then convert it to HTML for use on my blog.
I've tried Google-ing but must need better nomenclature as surprisingly I haven't been able to find anything.
I use Linux and would prefer to do this from the command line.
Any one have any thoughts or ideas?

The best would be to use MediaWiki parser. The good news is that MediaWiki 1.19 will provide a command line tool just for that!
Disclaimer: I wrote that tool.
The script is maintenance/parse.php some usage examples straight from the source code:
Entering text yourself, ending it with Control + D:
$ php maintenance/parse.php --title foo
''[[foo]]''^D
<p><i><strong class="selflink">foo</strong></i>
</p>
$
The usual file input method:
$ echo "'''bold'''" > /tmp/foo.txt
$ php maintenance/parse.php /tmp/foo.txt
<p><b>bold</b>
</p>$
And of course piping to stdin:
$ cat /tmp/foo | php maintenance/parse.php
<p><b>bold</b>
</p>$
as of today you can get the script from http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/maintenance/parse.php and place it in your maintenance directory. It should work with MediaWiki 1.18
The script will be made available with MediaWiki 1.19.0.

Looked into this a bit and think that a good route to take here would be to learn to a general markup language like restucturedtext or markdown and then be able to convert from there. Discovered a program called pandoc that can convert either of these to HTML and Mediawiki. Appreciate the help.
Example:
pandoc -f mediawiki -s myfile.mediawiki -o myfile.html -s

This page lists tons of MediaWiki parsers that you could try.

Want to convert all <tt> tags to <code> in a large hierarchy of HTML files

I have nearly 100 HTML files that uses the <tt> tag to markup inline code which I'd like to change to the more meaningful <code> tag. I was thinking of doing something on the order of a massive sed -i 's/<tt>/<code>/g' command but I'm curious if there's a more appropriate industrial mechanism for changing tag types on a large HTML tree.

The nicest thing you may do is to use
xmlstartlet:
xml ed -r //b -v code
It is freaky powerful. See http://xmlstar.sourceforge.net/, http://www.ibm.com/developerworks/library/x-starlet.html

If your are on a linux environment then sed is very easy, short, and fast way to do it.
Corrected command :
SAVEIFS=$IFS
IFS="\n"
for f in `find . -name "*.htm"` do sed -i 's/tt>/code>/g' "$f" ;done
IFS=$SAVEIFS
Some text editors or IDE also allow you to do a search and replace in directories with a filter on filename.

For one time performance of such tasks I use UltraEdit on Windows. UE has a find and replace in files function that works great for this. I point it at the top of the directory tree containing the files I want to change, tell it to process sub-directories, give it the extension of the files I want to change, tell it what to change and what to change it to and go.
If you have to script this in linux, then I think the sed solution or a perl / php script will work great.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Markdown/Jekyll: Auto-surround bare URLs with angle brackets? - html

Related

Showing LAtex formulas in GitHub markdown.md [duplicate]

How to convince VS code to accept # as comment in JSON files?

How to get absolute URLs in Bash

Convert MediaWiki wikitext format to HTML using command line

Want to convert all <tt> tags to <code> in a large hierarchy of HTML files

Categories

Resources