PanDoc: How to assign level-one Atx-style header (markdown) to the contents of html title tag - html

I am using PanDoc to convert a large number of markdown (.md) files to html. I'm using the following Windows command-line:
for %%i in (*.md) do pandoc -f markdown -s %%~ni.md > html/%%~ni.html
On a test run, the html looks OK, except for the title tag - it's empty. Here is an example of the beginning of the .md file:
#Topic Title
- [Anchor 1](#anchor1)
- [Anchor 2](#anchor2)
<a name="anchor1"></a>
## Anchor 1
Is there a way I can tell PanDoc to parse the
#Topic Title
so that, in the html output files, I will get:
<title>Topic Title</title>
?
There are other .md tags I'd like to parse, and I think solving this will help me solve the rest of it.

I don't believe Pandoc supports this out-of-the-box. The relevant part of the Pandoc documentation states:
Templates may contain variables. Variable names are sequences of alphanumerics, -, and _, starting with a letter. A variable name surrounded by $ signs will be replaced by its value. For example, the string $title$ in
<title>$title$</title>
will be replaced by the document title.
It then continues:
Some variables are set automatically by pandoc. These vary somewhat depending on the output format, but include metadata fields (such as title, author, and date) as well as the following:
And proceeds to list a bunch of variables (none of which are relevant to your question). However, the above quote indicates that the title variable is a metadata field. The metadata field can be defined in a pandoc_title_block, a yaml_metadata_block, or passed in as a command line option.
The docs note that:
... you may also keep the metadata in a separate YAML file and pass it to pandoc as an argument, along with your markdown files ...
So you have a couple options:
Edit each document to add metadata defining the title for each document (this could possibly be scripted).
Write your script to extract the title (perhaps a regex which looks for #header in the first line) and passes that in to Pandoc as a command line option.
If you intend to start including the metadata in new documents you create going forward, then the first option is probably the way to go. Run a script once to batch edit your documents and then your done. However, if you have no intention of adding metadata to any documents, I would consider the second option. You are already running a loop, so just get the title before calling Pandoc within your loop (although I'm not sure how to do that in a windows script).

Related

How can I change specific recurring text on a very large HTML file?

I have a very big HTML file (talking about 20MB) and I need to remove from the file a large amount of nodes of the form:
<tr><td>SPECIFIC-STRING</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
The file I need to work on is basically made of thousands of these strings, and I only need to remove those that have a specific first string, for instance, all those with the first string being "banana":
<tr><td>banana</td><td>RANDOM-STRING</td><td>RANDOM-STRING</td></tr><tr><td style="padding-top:0" colspan="3">RANDOM-STRING</td></tr>
I tried achieving this opening the file in Geany and using the replace feature with this regex:
<tr><td>banana<\/td><td>(.*)<\/td><td>(.*)<\/td><\/tr><tr><td(.*)<\/td><\/tr>
but the console output was that it removed X amount of occurrences, when I know there are way more occurrences than that in the file.
Firefox, Chrome and Brackets fail even to view the html code of the file due to it's size. I can't think of another way to do this due to my large unexperience with HTML.
You could be using a stream editor which as the name suggest streams the file content, thus never loads the whole file into the main memory.
A popular editor is sed. It does support RegEx.
Your command would have the following structure.
sed -i -E 's/SEARCH_REGEX/REPLACEMENT/g' INPUTFILE
-E for support of extended RegEx
-i for in-place editing mode
s denotes that you want to replace values
g is for global. By default sed would only replace the first occurrence so to replace all occrrences you must provide g
SEARCH_REGEX is the RegEx you need to find the substrings you want to replace
REPLACEMENT is the value you want to replace all matches with
INPUTFILE is the file sed is gonna read line-by line and do the replacement for you.
While regex may not be the best tool to do this kinda job, try this adjustment to your pattern:
<tr><td>banana<\/td><td>(.*?)<\/td><td>(.*?)<\/td><\/tr><tr><td(.*?)<\/td><\/tr>
That's making your .* matches lazy. I am wondering if those patterns are consuming too much.

Sublime Text - find all instances of an html class name project-wide

I want to find all instances of a class named "validation" in all of my html files project wide. It's a very large project and a search for the word "validation" gives me hundreds of irrelevant results (js functions, css, js/css minified, other classes, functions and html page content containing the word validation, etc). It can sometimes be the second, third, or fourth class declared so searching for "class='validation" doesn't work.
Is there a way to specify that I only want results where validation is a class declared on an html block?
Yes. In the sublime menu go to Find --> Find in Files...
Then match what is in the following image.
The first thing you will want to do is consider other possibilities with how you can solve this problem. Currently, it sounds like you are only using sublime text. Have you considered trying to use a command-line tool like grep?
Here is an example of how it could be used.
I have a project called enfold-child with a bunch of frontend assets for a wordpress project. Let's say, I want to find all of my scss files with the class "home" listed in them somewhere, but I do NOT want to pull in built css files, or anything in my node_modules folder. The way i would do that is as follows:
Folder structure:
..
|build
|scss_files
|node_modules
|css_files
|style.css
grep -rnw build --exclude=*{.css} --exclude-dir=node_modules -e home
grep = handy search utility.
-r = recursive search.
-n = provide line numbers for each match
-w = Select only those lines containing matches that form whole words.
-e = match against a regular expression.
home = the expression I want to search for.
In general, the command line has most anything one could want/need to do most of the nifty operations offered by most text-editors -- such as Sublime. Becoming familiar with the command line will save you a bunch of time and headaches in the future.
In SublimeText, right-click on the folder you want to start the search from and click on Find in Folder. Make sure regex search is enabled (the .* button in the search panel) and use this regex as the search string:
class="([^"]+ )?validation[ "]
That regex will handle cases where "validation" is the only classname as well as cases where its one of several classnames (in which case it can be anywhere in the list).
If you didn't stick to double quotes, this version will work with single or double quotes:
class=['"]([^'"]+ )?validation[ '"]
If you want to use these regexes from the command line with grep, you'll need to include a -E argument for "extended regular expressions".

How to add html attributes and values for all lines quickly with vim and plugins?

My os:debian8.
uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1+deb8u2 (2017-03-07) x86_64 GNU/Linux
Here is my base file.
home
help
variables
compatibility
modelines
searching
selection
markers
indenting
reformatting
folding
tags
makefiles
mapping
registers
spelling
plugins
etc
I want to create a html file as bellow.
home
help
variables
compatibility
modelines
searching
selection
markers
indenting
reformatting
folding
tags
makefiles
mapping
registers
spelling
plugins
etc
Every line was added href and id attributes,whose values are line content pasted .html and line content itself correspondingly.
How to add html attributes and values for all lines quickly with vim and plugins?
sed,awk,sublime text 3 are all welcomed to solve the problem.
$ sed 's:.*:&:' file
home
help
variables
compatibility
modelines
searching
selection
markers
indenting
reformatting
folding
tags
makefiles
mapping
registers
spelling
plugins
etc
if you want to do this in vi itself, no plug-in neccessary
Open the file, type : and insert this line as the command
%s:.*:&
it will make all the substitutions in the file.
sed is the best solution (simple and pretty fast here) if your are sure of the content, if not it need a bit of complexity that is better treated by awk:
awk '
{
# change special char for HTML constraint
Org = URL = HTML = $0
# sample of modification
gsub( / /, "%20", URL)
gsub( /</, "%3C", HTML)
printf( "%s\n", URL, Org, HTML)
}
' YourFile
To complete this easily in Sublime Text, without any plugins added:
Open the base file in Sublime Text
Type Ctrl+Shift+P and in the fuzzy search input type syn html to set the file syntax to HTML.
In the View menu, make sure Word Wrap is toggled off.
Ctrl+A to select all.
Ctrl+Shift+L to break selection into multi-line edit.
Ctrl+C to copy selection into clipboard as multiple lines.
Alt+Shift+W to wrap each line with a tag-- then tap a to convert the default <p> tag into an <a> tag (hit esc to quit out of any context menus that might pop up)
Type a space then href=" -- you should see this being added to every line as they all have cursors. Also you should note that Sublime has automatically closed your quotes for you, so you have href="" with the cursor between the quotes.
ctrl+v -- this is where the magic happens-- your clipboard contains every lines worth of contents, so it will paste each appropriate value into the quotes where the cursor is lying. Then you simply type .html to add the extension.
Use the right arrow to move the cursors outside of the quotes for the href attribute and follow the two previous steps to similarly add an id attribute with the intended ids pasted in.
Voila! You're done.
Multi-line editing is very powerful as you learn how to combine it with other keyboard shortcuts. It has been a huge improvement in my workflow. If you have any questions please feel free to comment and I'll adjust as needed.
With bash one-liner:
while read v; do printf '%s\n' "$v" "$v" "$v"; done < file
(OR)
while read v; do echo "$v"; done < file
Try this -
awk '{print a$1b$1c$1d}' a='' d='' file
home
help
variables
compatibility
modelines
searching
selection
markers
indenting
reformatting
folding
tags
makefiles
mapping
registers
spelling
plugins
etc
Here I have created 4 variable a,b,c & d which you can edit as per your choice.
OR
while read -r i;do echo ""$i";done < f
home
help
variables
compatibility
To execute it directly in vim:
!sed 's:.*:&:' %
In awk, no regex, no nothing, just print strings around $1s, escaping "s:
$ awk '{print "" $1 ""}' file
home
help
If you happen to have empty lines in there just add /./ before the {:
/./{print ...
list=$(cat basefile.txt)
for val in $list
do
echo ""$val"" >> newfile.html
done
Using bash, you can always make a script or type this into the command line.
This vim replacement pattern handles your base file:
s#^\s*\(.\{-}\)\s*$#\1#
^\s* matches any leading spaces, then
.\{-} captures everything after that, non-greedily — allowing
\s$ to match any trailing spaces.
This avoids giving you stuff like home .
You can also process several base files with vim at once:
vim -c 'bufdo %s#^\s*\(.\{-}\)\s*$#\1# | saveas! %:p:r.html' some.txt more.txt`
bufdo %s#^\s*\(.\{-}\)\s*$#\1# runs the replacement on each buffer loaded into vim,
saveas! %:p:r.html saves each buffer with an html extension, overwriting if necessary,
vim will open and show you the saved more.html, which you can correct as needed, and
you can use :n and :prev to visit some.html.
Something like sed’s probably best for big jobs, but this lets you tweak the conversions in vim right after it’s made them, use :u to undo, etc. Enjoy!

In Stata, how do I add variable labels from a separate csv file?

I have a set of csv files that are very simple to load into Stata using the -insheet- command. But they have very uninformative variable names. For each of these files, I also have a file of metadata consisting of two columns: the original (uninformative) variable names, and a description of what the variables actually mean. I'd like to use these metadata files to create variable labels, preferably without going through and typing up all the separate label commands or turning the metadata file into a dictionary for each file. It seems like there must be a quick way of loading the metadata file into Stata and looping through it to generate the label commands, but I don't know what it is. Any thoughts?
Ideally each line of the metadata is something like
varname1 "more interesting description"
in which case you can prefix each line with
label var
and then run the file as if it were a do-file using do. See the help for label. That is easy in a decent text editor, as for example searching for the start of each line and replacing it with label var (note the need for the space).
What could bite here includes:
You don't have double quotes " " as delimiters, in which case you need to insert them.
The extra information does not qualify as a variable label because it is more than 80 characters long. See help limits.
There are other ways to do this with Stata. You could write a program to read in the metadata and write out a do-file using file, but if this were my problem I would reach first for my text editor. (Most experienced Stata programmers use something else as well as doedit.)

How to embed HTML string syntax in CoffeeScript using VIM?

I have looked at how to embed HTML syntax in JavaScript string from HTML syntax highlighting in javascript strings in vim.
However, when I use CoffeeScript I cannot get the same thing working by editing coffee.vim syntax file in a similar way. I got recursive errors which said including html.vim make it too nested.
I have some HTML template in CoffeeScript like the following::
angular.module('m', [])
.directive(
'myDirective'
[
->
template: """
<div>
<div>This is <b>bold</b> text</div>
<div><i>This should be italic.</i></div>
</div>
"""
]
)
How do I get the template HTML syntax in CoffeeScript string properly highlighted in VIM?
I would proceed as follows:
Find out the syntax groups that should be highlighted as pure html would be. Add html syntax highlighting to these groups.
To find the valid syntax group under the cursor you can follow the instructions here.
In your example the syntax group of interest is coffeeHereDoc.
To add html highlighting to this group execute the following commands
unlet b:current_syntax
syntax include #HTML syntax/html.vim
syn region HtmlEmbeddedInCoffeeScript start="" end=""
\ contains=#HTML containedin=coffeeHereDoc
Since vim complains about recursion if you add these lines to coffee.vim i would go with an autocommand:
function! Coffee_syntax()
if !empty(b:current_syntax)
unlet b:current_syntax
endif
syn include #HTML syntax/html.vim
syn region HtmlEmbeddedInCoffeeScript start="" end="" contains=#HTML
\ containedin=coffeeHereDoc
endfunction
autocmd BufEnter *.coffee call Coffee_syntax()
I was also running into various issues while trying to get this to work. After some experimentation, here's what I came up with. Just create .vim/after/syntax/coffee.vim with the following contents:
unlet b:current_syntax
syntax include #HTML $VIMRUNTIME/syntax/html.vim
syntax region coffeeHtmlString matchgroup=coffeeHeredoc
\ start=+'''\\(\\_\\s*<\\w\\)\\#=+ end=+\\(\\w>\\_\\s*\\)\\#<='''+
\ contains=#HTML
syn sync minlines=300
The unlet b:current_syntax line disables the current syntax matching and lets the HTML syntax definition take over for matching regions.
Using an absolute path for the html.vim inclusion avoids the recursion problem (described more below).
The region definition matches heredoc strings that look like they contain HTML. Specifically, the start pattern looks for three single quotes followed by something that looks like the beginning of an HTML tag (there can be whitespace between the two), and the end pattern looks for the end of an HTML tag followed by three single quotes. Heredoc strings that don't look like they contain HTML are still matched using the coffeeHeredoc pattern. This works because this syntax file is being loaded after the syntax definitions from the coffeescript plugin, so we get a chance to make the more specific match (a heredoc containing HTML) before the more general match (the coffeeHeredoc region) happens.
The syn sync minlines=300 widens the matching region. My embedded HTML strings sometimes stretched over 50 lines, and Vim's syntax highlighter would get confused about how the string should be highlighted. For complete surety you could use syn sync fromstart, but for large files this could theoretically be slow (I didn't try it).
The recursion problem originally experienced by #heartbreaker was caused by the html.vim script that comes with the vim-coffeescript plugin (I'm assuming that was being used). That plugin's html.vim file includes the its coffee.vim syntax file to add coffeescript highlighting to HTML files. Using a relative syntax include, a la
syntax include #HTML syntax/html.vim
you get all the syntax/html.vim files in VIM's runtime path, including the one from the coffeescript plugin (which includes coffee.vim, hence the recursion). Using an absolute path will restrict you to only getting the particular syntax file you specify, but this seems like a reasonable tradeoff since the HTML one would embed in a coffeescript string is likely fairly simple.