FO List Block change style - html

I am using XSL FO list block to show bullet points. Is it possible to change list-style-type to show square (or other shape)? In HTML, it is <ul style="list-style-type:square;">
Code:
<fo:list-block>
<fo:list-item>
<fo:list-item-label>
<fo:block>*</fo:block>
</fo:list-item-label>
<fo:list-item-body>
<fo:block>Volvo</fo:block>
</fo:list-item-body>
</fo:list-item>
<fo:list-item>
<fo:list-item-label>
<fo:block>*</fo:block>
</fo:list-item-label>
<fo:list-item-body>
<fo:block>Saab</fo:block>
</fo:list-item-body>
</fo:list-item>
</fo:list-block>

Put the character that you want in place of the *:
<fo:list-item-label>
<fo:block color="blue" font-weight="bold" font-size="1.3em">✪</fo:block>
</fo:list-item-label>
This looks like a lot of work compared to <ul style="list-style-type:square;">, but:
You typically only need to do this once, since you are generating the XSL-FO using XSLT
You have complete control over the content, size, weight, colour, alignment (see relative-align: https://www.w3.org/TR/xsl11/#relative-align), and position of the list item label (and, as above, you typically only need to set that up once)
If you want to, you could change, e.g., the colour of the bullet for each list item by using position() in your XSLT
When you look at numbering list items, you'll see that xsl:number makes it easy to generate hierarchical numbers to use in list item labels. (If you were using AH Formatter, you'd also be able to use a bunch of predefined counter styles: https://www.antennahouse.com/product/ahf66/ahf-ext.html#axf.counter-style.)

Related

How to select all elements with a specific name under every li node with the same structure?

I have a certain bunch of XPath locators that hold the elements I want to extract, and they have a similar structure:
/div/ul/li[1]/div/div[2]/a
/div/ul/li[2]/div/div[2]/a
/div/ul/li[3]/div/div[2]/a
...
They are actually simplified from Pixiv user page. Each /div/div[2]/a element has a title string, so they are actually artwork titles.
I want to use a single expression to fetch all the above a elements in an WebExtension called PageProbe. Although I've tried a bunch of methods, it just can't return the wanted result.
However, the following expression does return all the a elements, including the ones I don't need.
/div/
The following expression returns the a element under only the first li item.
/div/ul/li/div/div[2]/a
Sorry for not providing enough info earlier. Hope someone can help me out. Thanks.
According to the information you gave here you can simply use this xpath:
/div/ul/li/div/div[2]/a
however I'm quite sure it should be some better locator based on other attributes like class names etc.

Using XPath to get the first child for every child of a node

I'm trying to parse some HTML with the following structure, how can I extract the first <a> element of every <li> element using xpath?
<ul>
<li>
<a>
<span>
<a>
</li>
<li>
<a>
<span>
<a>
</li>
...
</ul>
#Mathias : You are correct, I apologize. //li/a[1] did not work because it wasn't a direct child (there is an article tag in between, which I omitted for simplicity).
Then let me post this as a solution with some more explanation.
If, as you have described, //li/a[1] does not return anything while (//li//a)[1] does, then the HTML sample you show is not representative for your actual document. Then, a would be a descendant of li, but not a direct child of it.
A correct XPath expression in this case is
//li//a[1]
but only use it if the level of nesting varies, i.e. if there could be other elements nested between li and a:
<li>
<article>
<other>
<a/>
If the nesting is consistent, but it is not always the article element which is in between li and a then use
//li/*/a[1]
Which avoids the // axis that is computationally more expensive than /.
Finally, if you know that the a elements you are interested in are always grandchildren of li elements and if it is always the article element in between them, use
//li/article/a[1]
When I correct the expression to be //li/article/a[1]', I get the first a` for the first li.
//li/article/a[1] returns several results if there are several a elements that are children of article and grandchildren of li. If this only returns a single result either
you invoke this XPath expression in a context where only a single result is expected, e.g. if you use an XPath library in a programming language or
the structure of your input document is even more intricate
I think that the XPath to accomplish that would be .//ul/li/a[position()=1] .
Explanation:
The reason I spell it all out as .//ul/li/a is because, when you use the xpath, if there is an error, your stack-trace will reveal exactly what the locator pointed at, and is less vague. But, you can obviously short-hand it if you dont care: .//a .
Using the position clause, you can do =1 or >1 , or whatever. I would choose using [position()=1] over using [1] because Xpath doesn't use 0-based arrays, which might confuse others looking at your locator. I mean position=0, by logic, means null, right?
I start my locator with a . because personally, sometimes I like to chain my locators together in a combination. You don't really need to start with the dot char but since i use the // wildcard in this case, its effectively the same as starting without a dot, but with the additional ability to be chained.
Answer tested on http://the-internet.herokuapp.com/

Parsing inconsistent HTML with XPath

I'm trying to gather information from a web page that has inconsistent HTML, for example:
<ul><li>Item #1</li></ul><ul><li>sub Item #1</li></ul>
and that's alright, I use the XPath expression
//div[#id="content"]/ul/li/text()
and it does the job (except that doesn't gather the information from sub Item #1.,
Also the HTML varies and this is other way:
<dl><dd><ul><li>Item #1</li></ul></dd></dl><dl><dd><ul><li>sub Item #1</li></ul></dd></dl>
Well, I'm trying to gather Item #1 and sub Item #1. But with this inconsistent HTML I'm not able to find an XPath expression that will allow me to gather the information in any case, could you help me with this?
There will always be a list, the Item #1 and sub Item #1 always will be inside a <ul><li>
You could try using descendant axis (//) to select ul/li/text() no matter how deep it is nested within a consistent ancestor/parent. For example, assuming that ancestor/parent of ul/li is always a div having id equals "content" :
//div[#id="content"]//ul/li/text()

Nokogiri: How do you get the parent node of a DOM element when all you have is the string index of the element you want in the dom?

Here is what I have:
DOM stored as text
I have the string index of the area I want to get the parent node of, the index may or may not be the beginning of a tag (it will never be a partial of a tag, as it is a user selection
I also have the htmltext at the index (obviously)
This is as far as I've gotten:
doc = Nokogiri::HTML(content.body)
I know nokogiri can do xpath things, but I don't know if xpath can do standard text searches? the selection text could span multiple nodes, and I think that breaks xpath searching o.o
I'm using Ruby 1.8.7, and rails 2.3.8
There is no correlation between the index in a particular serialization of the XML Document and an element. The closest you could do:
Recursively, at each level of the DOM, serialize the element and see if its length (added to what you have so far) has reached your index.
Unfortunately this is not guaranteed to work, since:
Many different (non-canonical) serializations are possible that describe the same XML document (e.g. foo="You said, "Hi!"" vs. foo='You said, "Hi!"').
Depending on whether you consider blank whitespace nodes as significant, two different XML documents might be treated the same (e.g. <foo><bar> vs. <foo>\n\t<bar>)
In HTML, additional non-significant whitespace might be stripped (e.g. <p>a b</p> vs. <p>a b</p>).

how to apply font properties on <span> while passing html to pdf using itextsharp

I am converting html to pdf using itextsharp and I want to set the font size for tags. How can I do this?
Currently I am using:
StyleSheet
styles = new StyleSheet();
styles.LoadTagStyle(HtmlTags.SPAN, HtmlTags.FONTSIZE, "9f");
string contents = File.ReadAllText(Server.MapPath("~/PDF TEMPLATES/DeliveryNote.html"));
List
parsedHtmlElements = HTMLWorker.ParseToList(new StringReader(contents), styles);
But it didn't work.
The constants listed in HtmlTags are actually a hodgepodge of HTML tags and HTML and CSS properties and values and it can be a little tricky sometimes figuring out what to use.
In your case try HtmlTags.SIZE instead of HtmlTags.FONTSIZE and you should get what you want.
EDIT
I've never really seen a good tutorial on what properties do what, I usually just go directly to the source code. For instance, in the ElementFactory class there's a method called GetFont() that shows how font information is parsed. Specifically on line 130 (of revision 229) you'll see where the HtmlTags.SIZE is used. However, the actual value for the size is parsed in ChainedProperties in a method called AdjustFontSize(). If you look at it you'll see that it first looks for a value that ends with pt such as 12pt. If it finds that then it drops the pt and parses the number literally. If it doesn't end with pt it jumps over to HtmlUtilities to a method called GetIndexedFontSize(). This method is expecting either values like +1 and -1 for relative sizes or just integers like 2 for indexed sizes. Per the HTML spec user agents are supposed to accept values 1 through 7 for the font size and map those to a progressively increasing font size list. What this means is that your value of 9f is actually not a valid value to pass to this, you should probably be passing 9pt instead.
Anyway, you kind of half to jump around in the source to figure out what's being parsed where.