Tag <code>: how to "correct" publish it? - html

i'm not sure to explain what i'm looking for.
What's the name of the "source code parser" for publish code, in HTML ?
For example, when i write some source code here in stack overflow, system auto-detect the sintax and write "correct" source code in html.
I've noticed that exists the HTML <"code"> tag, but it simply write source code in "courier" font.
So i'm asking you if exists some "external" component that, given a text, parse it out correctly in a HTML page.
Thank you!

SO uses prettify to syntax highlight the <code> snippets.
Source: Which tools and technologies were used to build the Trilogy?
It is a JavaScript tool that scans a page for code snippets, and colours them on the fly. The downside of this solution is that it doesn't work with JavaScript turned off. Seeing as syntax colouring is not really an essential task, it is arguably a small downside.

The system is called Markdown and here is an explanation of the code blocks it uses.
For the syntax highlighting that you mentioned, a different system is used called prettify.

There are two components to this:
The CSS/HTML structure for syntax highlighting (e.g. styles for printing keywords, #s, strings, comments etc... in certain colors). This can be generic or per-language.
The code parser (grammar parser), which breaks the code up into tokens and labels the tokens with the appropriate classes. This can be implemented on either back-end via whatever language your back-end is in; or on front-end via JavaScript (the example of the latter is Google's Code Pretty which is used by StackOverflow).
It can be coupled with some heuristic logic to decide what language the code belongs to (and thus which grammar/parser to use).

Related

Writing a book for both print and HTML which can include code samples

I want to write a book on programming. I need to target both print and HTML.
In order not to get burned with the code examples, I need to be able to include parts of source code which have been marked up with start and end points to ensure the code is up to date and compiles. Extract the code from external files if you will.
I would like some simple format such as Txt2tags rather than latex since I then can use word's fine spelling capabilities.
Any experiences you want to share?
It is important to note that by starting with Txt2Tags you will be able to export your documents into LaTex. To my knowledge this is a one-way street, so by starting with Txt2tags you can still have the flexibility of LaTex, but by going with LaTex you don't get the benefits of Txt2tags.
Firstly, don't dismiss LaTeX too rapidly. Although it can be a bit of a pain to spellcheck, it's still quite doable with tools like aspell.
That being said, I would highly recommend using emacs' org-mode. It will provide you with a nice foldable overview of your book's structure, and is much more readable in plain text than LaTeX. Additionally, since it uses emacs' native syntax highlighting when you export (to HTML, LaTeX, PDF, etc) you'll be able to write the code inline (between #+begin_src tags) and get a much more precise WYSIWYG view of the code snippets you include.
Since emacs will work with aspell out-of-the-box, you'll still be able to check spelling as you work. Also, it uses LaTeX as an export format, which means you can obtain the same professional/technical look that LaTeX affords.
I see it has been reported as a missing feature on the text2tag homepage...

Syntax highlighting HTML with links

Is there a tool available which would convert the sources given into HTML with links?
By links I mean that every type, class, and method used would point via href to its definition.
I haven't managed to make highlight, syntax-highlight, nor pygments work this way. Even if it supports input from ctags, it only adds the title attribute, but not links.
Highlight can easily be modified to support things such as adding links to function / class definitions, as well as manual entries.
I was able to hook on to the class and function detection, and have each instance linked to the PHP Manual in my testing. I don't know what you'd want yours to link to, so it's your choice (per language, of course.)
Depending upon the language of your source code you might want to use doxygen. It supports a variety of source languages and can export the comments to HTML and LaTeX.
Many modern languages, like Java or C# support XML-comments to document the source code. You can then extract these comments into a single XML file by compiling them with special options. From this XML you can then easily produce HTML by adding an appropriate CSS sheet. MSDN documentation, for example, is largely based upon these HTML files generated in automated mode.

using html/css, i would like to automatically generate a bibliography at the bottom of my website akin to latex's \bibliography command

I'll ask my question first, then give some background for those who are interested:
I would like to know if there is a command in html that will automatically generate a bibliography from a .bib file? This means that throughout the text, i would add something like <cite name="Jones2010">, and then at the bottom of the html (or css) file, I would write something like <makebib file="biblist.bib", format="APA">, and a bibliography would be generated using my .bib file, and formated according to the APA style. The functionality would be quite similar to footnotes, except that each footnote is populated according to some script that extracts the information from (essentially) an xml file and outputs the content in the desired format. It is not difficult to imagine somebody creating a tool to do just that, however, my google search skills have not enabled me to find such a tool. It is easy to find tools that convert bib files to html or xml, but that is not sufficient for my needs. I do not desire to publish my entire bib file online. Rather, for each document that I generate, I want several of the entries in the bib file to be included as footnotes. Any pointers will be greatly appreciated.
Now, the reason behind the question:
I have recently begun switching from writing all my manuscripts using latex to writing them using html/css. The advantages of this approach are fast: only 1 file for versioning (instead of .dvi, .ps, .aux, .blg, etc.), it is much smaller to share, other people can edit the html file and compile it much more easily, it is more configurable to my tastes, easier to read on screen, etc. The disadvantage for me, however, is that while I've been writing in latex for years, I've only just begin using html and css for scientific document creating. The main impetus for the switch was MathJaX, which enables me to to embed latex equations in my html files, and therefore, allows me to combine the advantages of latex with the advantages of css. I imagine that nearly all my colleagues will switch away from latex to this simpler format, assuming a few remaining issues get resolved, like ease of creating bibliographies.
Many thanks.
What you're asking isn't possible, unless when you specify html/css you really mean html/css/php or html/css/python or some other combination that includes an actual programming language, rather than just a markup language.
I understand your motivation, I'd love to switch to html instead of latex! However, I suspect an html-based solution would involve so much extra processing added on top to sort out bibliographies etc that the complexity would start approaching that of LaTeX by the time you got it all worked out.
I'd be pleased to be proven wrong on this!
I've done this, in the past, using XSLT and BibTeX. In outline, the steps are
Mark up your document using some convention or other: I used <span class='citation'>Smith99</span>
Write an XSLT script to transform that file into a .aux file with \citation commands in it
Use BibTeX along with a .bst file which spits out HTML rather than LaTeX
Use another XSLT script (or the same one, in a different mode) to pull the bibliography in
It's not quite as fiddly as it sounds, but you can look at how I did it on google code. In particular, see structure.xslt and plainhtml.bst.
If there's a more direct way, I'd be quite interested to hear about it.
Both answers so far are somewhat correct, although not quite what you were asking for. Part of the problem is that the question as it's phrased doesn't necessarily makes sense.
HTML is just markup; you need something to process the markup, be it python, php, ruby, etc.
And you probably want to write in XML (or XHTML), not HTML.
XSLT may work for you (once it's in XML), but remember, an XSLT document that defines a set of rules. You would get an XSLT engine to apply your XSLT rules against your XML document.
You can create an html bibliography from a .bib file using bibtex2html. This package takes a series of command line arguments and extracts the info from the BibTeX source and outputs a file with html markup.
As far as I know you cannot get it to read and parse the html document like the LaTeX \cite command but there are several ways to indicate the references you want. I find that the easiest way is to just maintain a text file of the BibTeX keys I use in my manuscript and then call this using the --citefile option. There is also a tool called bib2bib included that will take search commands.
It is a very flexible package and there are a lot of options so it works in a lot of situations. For example you can get it to omit the <html> headers from the output file so that you can directly paste into an existing html document.
The documentation is useful but make sure you look at the pdf documentation file and the man pages.

Javadocs without HTML

Robert C. Martin's book Clean Code contains the following:
HTML in source code comments is an abomination [...] If comments are going to be extracted by some tool (like Javadoc) to appear in a Web page, then it should be responsibility of that tool, and not the programmer, to adorn the comments with appropriate HTML.
I kind of agree - source code surely would look cleaner without the HTML tags - but how do you make decent-looking Javadoc pages then? There's no way to even separate paragraphs without using a HTML tag. Javadoc manual says it clearly:
A doc comment is written in HTML.
Are there some preprocessor tools that could help here? Markdown syntax might be appropriate.
I agree. (This is also the reason why I am -strongly- opposed to C#-style "XML comment blocks"; the Javadoc DSL at least provides some escape for top-level entities!). To this end I simply do not try to make the javadoc look pretty...
...anyway, you may be interested in Doxygen. Here is a very quick post Doxygen versus Javadoc. It also brings up the issues that you do :-)
HTML is nothing I'd like to see in "normal" comments. But for Tools like JavaDoc, HTML adds the possibility to add formatting information, bullet points etc...
I would distinguish these two things:
non-javadoc code comments are for the programmer who maintains or enhances the code i question. he has to dig through existing sources, and any HTML in coments just doesn't make things easier. So, ban it in normal comments.
javadoc-comments are used to generate documentation. Use HTML where it helps. But a very limited subset of HTML should suffice.

Should HTML co-exist with code?

In a web application, is it acceptable to use HTML in your code (non-scripted languages, Java, .NET)?
There are two major sub questions:
Should you use code to print HTML, or otherwise directly create HTML that is displayed?
Should you mix code within your HTML pages?
Generally, it's better to keep presentation (HTML) separate from logic ("back-end" code). Your code is decoupled and easier to maintain this way.
As long as your HTML-writing code is separate from your application logic, and the HTML is guaranteed to be well-formed somehow, you should be okay.
The only code that should be mixed in markup-based pages (i.e, those that contain literal HTML) is the code used for formatting the HTML (e.g., a loop for writing out a list).
There are trade-offs whether you put the code in with the HTML or you use pure code to write the HTML out using quoted string literals.
No, if you want to build good and maintainable software, and to achieve loose coupling.
If I understand the question right, you're asking whether it's a good practice to mix markup with back-end code. No. While this is commonly done, it's still a bad idea.
You should read up on the MVC paradigm, as well as on existing questions on the matter, such as What is the best way to migrate an existing messy webapp to elegant MVC? and Best practices for refactoring classic ASP?
The point is to keep the display logic separate from the rest of the code. In any complex site you'll have code mixed in with your HTML, but the code should be for display purposes only. It shouldn't be doing any complex calculations.
For example, templates will contain loops and conditionals. Plus you'll probably have a library of HTML-specific routines, like printing out an <option> list based on a list object.
Imagine you were writing an application that has two output modes: HTML and something else. How would you write it, to avoid duplicating code? That will probably point you in the right direction.
The HTML that makes up the view has to get sent to the browser in some way. In .net, each server control emits its own HTML markup as part of the page lifecycle. So yes it is OK to use HTML in server side code.
Perhaps you should try following the ASP.net pattern. Create a bunch of controls that represent UI elements and make them responsible for emitting their own HTML based on their state.
Its fugly, and not type safe. But people do it without consequence. I'd prefer using a DOM or, at a minimum, classes designed to write HTML using type safe semantics. Also, its not all that good to mix UI with logic...
If I need methods that generate HTML I usually isolate them in an HtmlHelpers class. That way you keep some level of separation. The ASP.NET MVC Framework does this quite successfully.
If you mean printing out HTML in your code, then no. Unless you have a good reason not to, you should use templates
Even if you think you don't need this now, there's always a good chance you'll need it later. Maybe you want to output in a different format than HTML, or you want different presentation for the same data. You usually have the need for these things further down the road, so it's best to use one from the start.
I hate when developers print() a bunch of html. It's completely unnecessary and looks ugly in any text editor that shows print/echo strings in red.
I agree with everyone else that you should try as hard as you can to separate the HTML/XHTML markup from the application logic. However, sometimes you do need to generate HTML/XHTML in the application logic for various reasons.
In these cases what I have been trying to do is to ensure the bare minimum amount of presentation code is in mixed in with the application logic and try to migrate everything else over to the presentation code. It is worth nothing that is some cases you have situations where you could have everything moved over to the presentation layer, but it might be a bit easier to generate the markup as part of the application logic. In those cases, your best bet is likely to be to go the route that makes the most sense in terms of time.
I don't think there's any excuse for generating HTML inside your business logic. Don't even do it when it's just a "quick fix" or when you'll "go back and fix it later", because that never happens.
To reiterate my position from other questions, using some control logic (conditionals, loops) within HTML to construct it is OK. Do NOT do any data massaging or business logic in the HTML. You have to be disciplined, but it's worth it. Maintenance is much easier if your concerns (like logic and display) are separated.
Ideally you are aiming for a separation of concerns between your presentation (UI) code and your domain (business logic) code.
The reason why you should avoid coupling these two concerns (in either direction) is simple...
You will only have one reason to change a piece of code. whether this is from structural/styling changes in your html design, or from your business rules changing, you should only have to make the change in one place.
To a lesser extent, although many purists would disagree, by sprinkling HTML code through your domain code or vice versa you are creating noise for the next developer who comes along to read/maintain it.
I try to avoid using code to print HTML "directly". It is difficult to maintain, edit, add styles and etc. Some cases like generating an HTML email in the code, I create a text file or HTML file with markers like, [name], [verification code] and etc. I load this from the code and replace those markers. This way, you can edit the style of the email without re-compiling your code. Separating "presentation" and "logic" is a good practice in my opinion.
Mixing code within HTML is generally not a good practice in similar reasons as said in #1. However, I do use code in HTML for things like simple dynamic strings that are displayed multiple times on a page or pages. I think this is better than creating multiple server controls for same exact values to set. Since this is not code "logic" mixed in the HTML, I think this is ok.