Adding metadata to markdown text - html

I'm working on software creating annotations and would like my main data structure to be based around markdown.
I was thinking of working with an existing markdown editor, but hacking it so that certain tags, i.e. [annotation-id-001]Sample text.[/annotation-id-001] did not show up as rendered HTML; the above would output Sample text. in an HTML preview and link to a separate annotation with the ID 001.
My question is, is this the most efficient way to represent this kind of metadata inside of a markdown document? Also, if a user wants to legitimately use something like "[annotation-id-001]" as text inside of their document, I assume that I would have to make that string syntax illegal, correct?

I don't know what Markdown parser you use but you can abord your problem with different points of view:
first you can "hack" an existing parser to exclude your annotation tags from "classic" parsing and include them only in a certain mode
you can also use the internal "meta-data" information proposed by certain parsers (like MultiMarkdown or MarkdownExtended) and only write your annotations like meta-data with a reference to their final place in content
or, as mentionned by mb21, you can use simple links notation like [Sample text.](#annotation-id-001) or use footnotes like [Sample text.](^annotation-id-001) and put your annotations as footnotes.

Related

Scraping pseudo-elements from a website with XPath

I want to extract data from a website, but it seems that the elements that I want to extract are not "accessible".I also discovered they seem to be pseudo-elements. I can se that their tags are marked with a # before in my web-inspector.
Moreover, while using XPath I can't extract the text I want to access. Their is a point in the CSS "cascade tree" when I can't extract the content of a tag, you can see it below.
Here I can extract information up to the tag 'content fond'. But when I ask for the tag "fos_comment_thread" which is the tag just below, the return is empty. And it is especially this tag which is a pseudo-element, and the following behind. However the text I want to access is even more deeper in this part of the CSS tree...
Input
reponse.xpath=('//div[class#='row']/div[#class='span9 forum']/div[class#='content fond'].extract()
Output
['<div id="foc_comment_thread"<div>']
Input
reponse.xpath=('//div[class#='row']/div[#class='span9 forum']/div[class#='content fond']/div[id#='fos_comment_thread'].extract()
Output
[]
I don't understand why I can't extract, I think it is due to the fact that the rest of my tags are pseudo-elements,but I haven't found a solution to solve the problem...
The first thing you need to do is to not using your web-inspector tool and look at the raw HTML of the website.
Web inspectors take into account the transformations made by Javascript and may show you an update HTML after Javascript execution, that scrapy obviously can't see.

Github Jekyll how to make real page format same as preview

My github jekyll structure looks like next:
after I enter _posts and create .md file, it looks like:
the corresponding code is:
Generics were introduced to the Java language to provide tighter type checks at compile time and to support generic programming.
The generics looks like:
```Java
List<String> list = new ArrayList<>();
//add item to list
String s = list.get(0);
```
or
```Java
public class Box<T> {
// T stands for "Type"
private T t;
//other code
}
```
The most commonly used type parameter names are:
```
E - Element (used extensively by the Java Collections Framework)
K - Key
```
We can see that the format is nice, such as it has syntax highlight.
I called it preview page
However, when I enter into my page by typing my github page url to see, it likes:
I called it real page
We can see that real page looks bad, e.g. there is no syntax highlight, there are multiple borders for quoting code syntax etc.
Thus, how to make the real page format is the same as preview page?
I suppose that you're relatively new to Jekyll so I have to do some clarification to you.
The result that you call "preview" is the result of your markdown parsed by GitHub. All markdowns have something in common so it's very likely that even if your parser is different almost all the things are parsed similarly. You could see a difference at the beginning of your "preview": the yaml content is displayed as a table.
Let's come back to Jekyll. If you're using the default settings, the parser of your code is kramdown (you could change it in the _config.yml file). When you execute Jekyll, it builds your website. That means that it parses your markdown and convert it to HTML. How it converts to HTML depends on a lot of things based on your configuration and plugin installed.
By default, you have no highlight. If you want to change it, take a look at the jekyll documentation. By default, it uses Rogue but you can also use Pigments or some other highlighter of your choice.
I don't think that this answer covers all your doubts and certainly not all your problems but it's to let you understand that your question, as it was posted, have not so much sense since your "preview page" and your "real page" are two completely distinct things. So google a bit, find what you want to achieve and ask a new question (you will surely have one in the near future).
Happy coding!

How to incorporate HTML tags for styling certain parts of localized strings (Polymer)

The web application I am working on uses resource strings for localization. The issue I am having is with styling certain parts of these strings. Let's say I want to display this string:
user1234 created a new document.
So in the resource file it would be localized like so:
{username} created a new document.
The issue is I also need <b></b> tags around {username}. I can't put these tags in the html file because I need it to apply just to the username, not to the whole localized string. So unless I split up the string into two localized strings (which I should definitely not do, because other languages do not necessarily have the same sentence structure), I have to put these html tags in the localized string itself:
<b>{username}</b> created a new document.
Even if we disregard best practices for a moment (of which I have read briefly) and go with this, this solution isn't working for me. I believe this is because the application is using Polymer (this seems to work with Angular). So if we stick by the following two requirements:
Use Polymer
Have the whole string together as one resource string
then there doesn't seem to be a way to style certain parts of the string. Does anyone know a solution?
I got it to work by setting the resource string to the inner HTML of the element which contains the string. So let's say the div containing the text has id="textElem", in the Javascript I set the inner HTML like so:
this.$.textElem.innerHTML = this.localize('user_created_document', 'username', this.username)
I suppose I should have specified in the question that my previous attempts of setting the string were just (a) simply binding the string to the property of an object and referencing that in the HTML, and (b) localizing the string directly in the HTML, neither of which worked.

Pulling out some text from a giant HTML file using Nokogiri/xpath

I am scraping a website and am trying to pull out certain elements from the HTML. In the sites I am scraping, there are script tags with a bunch of info in them however, there is one part inside these tags that I am interested in. The line basically looks like:
'image':'http://ut5.example.com/t/231/3_b_643435.jpg',
With some stuff above and below it. Now, this is different for each page source except for obviously the domain and some of the subfolders that store the images.
How would I go about looking through the source for this specific line, and cutting out just the URL? I would need to use regular expressions I feel as the URLs are dynamic.
The "gsub" method does something similar to what I want to search for, with its ability to use /regex/. But, I am not wanting to replace anything, I just want to find that URL in the source code using a /regex/ and copy it.
According to you comments, this is what you're looking for I guess
var regex = /http.+/;
Example http://jsfiddle.net/Km9ZB/

How to view xsd:documentation that is in HTML markup?

I am generating WSDL/XSD for SOAP services from a UML model using IBM Rational Software Architect (RSA). RSA allows you to document the classes and attributes in the model using rich-formatting.
For example, I have the following documentation on a Trailer class:
A wheeled Vehicle that is designed for towing by another
Vehicle. Known subtypes include:
Caravan
BoxTrailer
BoatTrailer
When the UML model is transformed to WSDL/XSD (using the out-of-the-box UML to WSDL transform), the formatting is preserved as HTML markup inside the xsd:documentation element:
<xsd:complexType name="Trailer">
<xsd:annotation>
<xsd:documentation><p>
A&nbsp;wheeled <strong>Vehicle</strong> that is designed for&nbsp;towing by another <strong>Vehicle.</strong> Known
subtypes include:&nbsp;
</p>
<ul>
<li>
<strong>Caravan</strong>
</li>
<li>
<strong>BoxTrailer</strong>
</li>
<li>
<strong>BoatTrailer</strong>
</li>
</ul></xsd:documentation>
</xsd:annotation>
</xsd:complexType>
Unfortunately, this is really hard to read and I've been searching (with no luck) for a program that can view WSDL/XSD with documentation in HTML markup.
XmlSpy 2008 can't do it, RSA can't do it (which is a bit surprising, as it generated the XSD in the first place), neither can any web browser I've tried.
I did write a JET template that extracted the documentation from the model and outputted to HTML, and I could probably write some XSLT to do something similar from the XSD, but I was hoping there's a program out there (ideally free) that could view the documentation as HTML.
Essentially, I'd like to be able to tell the consumers of our web service that they can view the WSDL in X program if they want to read the documentation - does anybody know the best solution to this?
Edit:
Thanks for the suggestions, but I think I have a solution! I didn't realise that RSA can export a WSDL to HTML (right-click on WSDL, export, HTML). The generated HTML has a graphical view of each schema element, the documentation for each element, as well as the original source, and everything is hyperlinked together.
Most importantly, the documentation is richly-formatted again! One small caveat is that the ;nbsp's appear in the HTML output. This seems to be because the ampersand is escaped in the HTML:
&nbsp;
Instead it should be
I will update my model-to-model transform to ensure that the ;nbsp's are replaced with real spaces (I don't believe I'll need non-breaking spaces in the documentation), so the generated WSDL/XSD won't ever have them.
I highly doubt if the standard xml/xsd editors can interpret the html tags and generate appropriate documentation. Oxygen XML Editor does a decent job of understanding and converting the XML entities (liket < etc) but HTML tags and entities are left as is. Below is the screen shot in design view.
The type of <xs:documentation> is <xs:any> so you should actually be able to include your documentation without escaping the markup, provided that it is a well formed XHTML fragment instead of HTML. I guess some XML Schema tools would be capable to interpret the embedded XHTML and show it as formatted text.
Do note that if the markup is not escaped it absolutely must be a well formatted XML fragment or the documentation element will cause your schema to be malformed. This applies also to HTML entities! If the documentation contains an (unescaped) entity reference (other than the 5 pre-defined XML entities), then your schema either must contain an external DTD reference or have an embedded DTD that defines what is the replacement text of that entity. In your case the documentation contains an entity reference. Probably easiest will be to replace such entities with the corresponding Unicode character/text or with character references (use   for )
If you have a chance, try to include the documentation without escaping the markup and make sure that it will be well formed. Otherwise you probably need to process the documentation twice: 1) parse the schema and extract documentation 2) parse the documentation text again (possibly as HTML, not XML).
I've tried this with the latest build of QTAssistant and it shows like this in the Schema Help Panel only; I've put a feature request for the grid view, as well as the documentation generator to work the same. Is this what you're expecting?
The help panel shows the annotation of the schema object that is selected in the Graph/Diagram view. To display the help panel press F1.
This issue is fixed in RSA 8.0.4 - which now supports exporting to WSDL/XSD with plain text (as well as an option to sort the schema by type, then name alphabetically!).
To view the the documentation in a WSDL/XSD generated from a UML model in prior versions of RSA, the easiest solution is to export the WSDL/XSD as HTML using RSA. You can do this by right-clicking on the WSDL/XSD, selecting export, then selecting HTML.
The generated HTML has a graphical view of each schema element, the documentation for each element, as well as the original source, and everything is hyperlinked together.
Most importantly, the documentation (that's virtually unreadable in the WSDL/XSD) is richly-formatted again! One small caveat is that the ;nbsp's that RSA's documentation editor inserts also appear in the HTML output. This seems to be because the ampersand is not only escaped in the WSDL/XSD (which is good), but also in the HTML (bad!):
&nbsp;
Instead it should be
A simple workaround to this is to replace all &nbsp;'s in the WSDL/XSD with real spaces before generating the HTML.