Recursive directory parsing with Pandoc on Mac - html

I found this question which had an answer to the question of performing batch conversions with Pandoc, but it doesn't answer the question of how to make it recursive. I stipulate up front that I'm not a programmer, so I'm seeking some help on this here.
The Pandoc documentation is slim on details regarding passing batches of files to the executable, and based on the script it looks like Pandoc itself is not capable of parsing more than a single file at a time. The script below works just fine in Mac OS X, but only processes the files in the local directory and outputs the results in the same place.
find . -name \*.md -type f -exec pandoc -o {}.txt {} \;
I used the following code to get something of the result I was hoping for:
find . -name \*.html -type f -exec pandoc -o {}.markdown {} \;
This simple script, run using Pandoc installed on Mac OS X 10.7.4 converts all matching files in the directory I run it in to markdown and saves them in the same directory. For example, if I had a file named apps.html, it would convert that file to apps.html.markdown in the same directory as the source files.
While I'm pleased that it makes the conversion, and it's fast, I need it to process all files located in one directory and put the markdown versions in a set of mirrored directories for editing. Ultimately, these directories are in Github repositories. One branch is for editing while another branch is for production/publishing. In addition, this simple script is retaining the original extension and appending the new extension to it. If I convert back again, it will add the HTML extension after the markdown extension, and the file size would just grow and grow.
Technically, all I need to do is be able to parse one branches directory and sync it with the production one, then when all changed, removed, and new content is verified correct, I can run commits to publish the changes. It looks like the Find command can handle all of this, but I just have no clue as to how to properly configure it, even after reading the Mac OS X and Ubuntu man pages.
Any kind words of wisdom would be deeply appreciated.
TC

Create the following Makefile:
TXTDIR=sources
HTMLS=$(wildcard *.html)
MDS=$(patsubst %.html,$(TXTDIR)/%.markdown, $(HTMLS))
.PHONY : all
all : $(MDS)
$(TXTDIR) :
mkdir $(TXTDIR)
$(TXTDIR)/%.markdown : %.html $(TXTDIR)
pandoc -f html -t markdown -s $< -o $#
(Note: The indented lines must begin with a TAB -- this may not come through in the above, since markdown usually strips out tabs.)
Then you just need to type 'make', and it will run pandoc on every file with a .html extension in the working directory, producing a markdown version in 'sources'. An advantage of this method over using 'find' is that it will only run pandoc on a file that has changed since it was last run.

Just for the record: here is how I achieved the conversion of a bunch of HTML files to their Markdown equivalents:
for file in $(ls *.html); do pandoc -f html -t markdown "${file}" -o "${file%html}md"; done
When you have a look at the script code from the -o argument, you'll see it uses string manipulation to remove the existing html with the md file ending.

Related

Pandoc fails to embed metadata from the supplied YAML file

I need to convert some .xhtml files to regular .html (html5) with pandoc, and during the conversion I would like to embed some metadata (supplied via a YAML file) in the final files.
The conversion runs smoothly, but any attempt to embed the metadata invariably fails.
I tried many variations of this command, but it should be something like:
pandoc -s -H assets/header -c css/style.css -B assets/prefix -A assets/suffix --metadata-file=metadata.yaml input_file -o output_file --to=html5
The error I get is:
pandoc: unrecognized option `--metadata-file=metadata.yaml'
Try pandoc --help for more information.
I really don't get what's wrong with this, since I found this option in the pandoc manual
Any ideas?
Your pandoc version is too old. Update to pandoc 2.3 or later.

Opening an html file in windows 7 using git bash

I'm trying to repeat the steps from this video.
I find that the git bash, in my Windows 7 (x64), doesn't accept the command atom for opening an html file. I created the html file using the touch command:
I tried so many sites on how to open a file of a text editor from git bash, but nothing works.
Here's some things to try:
Add the path to atom.exe to your environment variables
Associate atom with all git operations, run: git config --global core.editor "atom --wait" (when you use git bash to edit, this tells it to always use atom)
Add an alias in git: git config --global alias.edit "! atom" and now you can edit any file by calling: git edit [filename]
Here is another SO post with something very similar (I think) to your question: Open Atom editor from git shell.
However, it's difficult to know what your problem is without more detail.

Linux shell script command - gzip

I am having one shell script in Linux in which the output will be generated in .csv format.
At the end of the script i am making this .csv to .gz format to reduce the space on my machine.
The file which is generated comes in this format Output_04-07-2015.csv
The command which i have written to make it zip is:-gzip Output_*.csv
But i am facing an issue that if the file already exists, then it should make the new file with that reported time stamp.
Can anyone help me with it.?
If all you want is to just overwrite the file if it already exists, gzip has a -f flag for it.
gzip -f Output_*.csv
What the -f flag does is forcefully create the gzip file, and overwrite whatever existing zip file there might already be.
Have a look at the man pages by typing man gzip or even this link for many other options.
If instead you want to do it more elegantly, you could check out and see if shell commands for your script work for you or not. But that would differ depending on what shell you have, bash, cshell, etc.

firebreath variables in custom file

I have a firebreath plugin with installer.cmake script for Mac. Instead of creating a dmg file it creates a package based on pmdoc folder.
COMMAND ${CMD_CP} -r ${CMAKE_CURRENT_SOURCE_DIR}/Mac/MyPlugin.pmdoc ${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_CFG_INTDIR}/MyPlugin.pmdoc
COMMAND /Applications/PackageMaker.app/Contents/MacOS/PackageMaker --doc ${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_CFG_INTDIR}/MyPlugin.pmdoc --version ${FBSTRING_PLUGIN_VERSION} --out ${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_CFG_INTDIR}/MyPlugin.pkg
Problem is I want to use FB variables in one of the pmdoc file, for example set title to ${FBSTRING_PluginName} ${FBSTRING_PLUGIN_VERSION} Obviously, copy command just copies the file, but how can I replace variables with their values?
Use cmake's configure_file. This will take an input file and an output file; the output file will have all variables replaced. Lots of examples of this in the firebreath codebase.

how to combine "-" and "--" options when starting octave?

I noticed that I can't combine --traditional options with the other one letter other options such as -i for example.
For example, when I have this as the first line in my octave .m file
#!/usr/bin/octave --traditional
Then it work. Octave starts ok and runs the script.
But when I try
#!/usr/bin/octave --traditional --silent --norc --interactive
It does not work. Error from octave. does not understand the options.
When I try
#!/usr/bin/octave --traditional -qfi
Also error. But this
#!/usr/bin/octave -qfi
works.
The problem is that --traditional does not have a one letter short cut like all the other options. This is the options I see
Options:
--debug, -d Enter parser debugging mode.
--doc-cache-file FILE Use doc cache file FILE.
--echo-commands, -x Echo commands as they are executed.
--eval CODE Evaluate CODE. Exit when done unless --persist.
--exec-path PATH Set path for executing subprograms.
--help, -h, -? Print short help message and exit.
--image-path PATH Add PATH to head of image search path.
--info-file FILE Use top-level info file FILE.
--info-program PROGRAM Use PROGRAM for reading info files.
--interactive, -i Force interactive behavior.
--line-editing Force readline use for command-line editing.
--no-history, -H Don't save commands to the history list
--no-init-file Don't read the ~/.octaverc or .octaverc files.
--no-init-path Don't initialize function search path.
--no-line-editing Don't use readline for command-line editing.
--no-site-file Don't read the site-wide octaverc file.
--no-window-system Disable window system, including graphics.
--norc, -f Don't read any initialization files.
--path PATH, -p PATH Add PATH to head of function search path.
--persist Go interactive after --eval or reading from FILE.
--silent, -q Don't print message at startup.
--traditional Set variables for closer MATLAB compatibility.
--verbose, -V Enable verbose output in some cases.
--version, -v Print version number and exit.
I am mainly interested in running octave code that is compatible with Matlab, so I'd like to use this --traditional option to make sure I keep the code compatible with Matlab in case I need to run the same code inside Matlab as well.
Or may be I can "turn on" this compatiblity mode once octave starts using a different command?
I am using GNU Octave, version 3.2.4 on Linux.
thanks
I don't think this is really an octave problem, per se. The Unix shebang notation in general is somewhat limited. I don't know the exact limits off the top of my head, but I'm pretty sure many implementations aren't happy if you add more than one option to the shebang line, which seems to be your problem.
Using a wrapper script is probably the canonical way to get around such problems.
To address your question of combining short and long options, Unix conventions don't allow for this. You could consider patching octave to add a short option for --traditional, if this is feasible for you. Alternatively, I'd imagine there's a way to specify the traditional behavior in the user or system-wide Octave configuration file, but this might not be that helpful if you need the script to work on systems you don't control.