Create nested list and not code block in markdown - html

I am trying to create a nested list after an equation in a markdown document in MarkdownPad but instead I am getting a code block. I am unsure how to escape it in order to get nested list (2nd order instead):
Here is the code:
1st order list:
2nd order list:
Some other text here which should be followed by a 2nd order nested list:
- 4 spaces followed by a "-" gives a code block instead of a second order list

Short version: you can't.
Since you have inserted a new paragraph (Some other text here which should be followed by a 2nd order nested list:), you have closed the list block. You can't jump straight to a sub-list[^1] without first having an enclosing list[^2].
If, however the some other text is supposed to be an aside regarding the first 2nd order item (and so the following 2nd order item is actually the 2nd 2nd order item of the list), then you can achieve it by not breaking the outer 1st order list:
- 1st item
- 2nd item
other text
- also 2nd item
[^1]: i.e. a nested list.
[^2]: This may not be true for all markdown engines, but is the case for the engine used by MarkdownPad. As a side point, the base markdown spec doesn't define a syntax for nested lists.

Related

What is the Correct XPath to Identify Element with Text Occuring Minimum Number of Times?

I'm trying to identify an element that has certain text but I only want to identify the element if the desired text occurs a specific number of times.
For example, imagine we have the following two HTML snippets on the same page:
Snippet 1:
<span id="price">
$36.46
<span>
($0.38 / Count)
</span>
</span>
Snippet 2:
<span id="price">$38.38</span>
I could identify both elements using the XPath: .//span[contains(text(),'$')] However, I only want to identify the element if it (or any descendant of span element) contain at least two instances of the character: $
In above example, it would only identify the first snippet because the second snippet only contains one instance of $, not two.
What is the correct XPath syntax to use?
You can use the XPath //span[count(.//text()[contains(., "$")]) >= 2]
This is a moderately complicated XPath, so to explain it some by expanding outwards:
.//text()[contains(., "$")]
Select all text elements descending from the current node whose self contains "$".
count(.//text()[contains(., "$")])
Count the number of text elements descending from the current node whose self contains "$".
//span[count(.//text()[contains(., "$")]) >= 2]
Select all span elements with two or more text descendants whose self contains "$"
As a caveat, this only works if the dollar sign is in two different text elements. If you want to include the span in this example:
<span>
$$
<span>
foo
</span>
</span>
...then you'll need a different approach:
//span[string-length(.) - string-length(translate(., "$", "")) >= 2]
This predicate compares the string length of the span to the string length of the same span with all "$" characters removed.
One usable XPath-1.0 expression is
string-length(/span[#id='price'])-string-length(translate(/span[#id='price'],'$',''))
In a predicate this could look like
//span[string-length(.)-string-length(translate(.,'$',''))>=2]
This expression selects only the elements with a count of $ >= 2

XPath - all elements except those in header

Trying to figure out XPATH which match all elements except header or inside header. Let's assume that header can be detected by three conditions:
outer tag is header eg. <header><div.....></header>
outer tag has id which contains string "header"
outer tag has class which contains string "header"
My xpath: //*[not(ancestor::header)] and //*[not(ancestor::*[contains(#id,"header")])] and //*[not(ancestor::*[contains(#class,"header")])]
is not correct.
EDIT:
This should match all links which are inside header:
//*[ancestor::*[contains(#id,"header") or contains(#class,"header") or header]]
Now I want to get all elements except these.
Do you know how to make it work?
Each of the expressions in your original XPath were being evaluated separately, testing whether there is an element in the XML document that satisfies those conditions, and returning a boolean().
Now that you have combined the predicates to order select the particular element(s) that you don't want, you just need to negate the test:
//*[not(ancestor-or-self::header) and
not(ancestor::*[contains(#id,"header") or contains(#class,"header")])
]

XPath: Way to match text inside an arbitrary number of nested elements?

Is it possible for one XPath expression to match all the following <a> elements using the text in the element, in this case "Link"?
Examples:
Link
<span>Link</span>
<div>Link</div>
<div><span>Link</span></div>
This simple XPath expression,
//a[contains(., 'Link')]
will select the a elements of all of your examples because . represents the current node (a), and contains() will check the string value of a to see if it contains 'Link'. The string value of a already conveniently abstracts away from any descendent elements.
This even simpler XPath expression,
//a[. = 'Link']
will also select the a elements in all of your examples. It's appropriate to use if the string value of a will exactly equal, rather than just contain, "Link".
Note: The above expressions will also select Li<br/>nk, which may or may not be desirable.
You could use the following:
//a[(.//*|.)[contains(text(), "Link")]]
This will select a elements that contain the text "Link" or a elements that have a descendant element that contains the text "Link".
//a - Select all a elements
( - Open OR grouping
.//* Select all the descendant nodes
| - Or..
. - Select the current node
) - Close OR grouping
[contains(text(), "Link")] - If they contain the text "Link"
Alternatively, you could also use:
//a[(.//*|.)[.="Link"]]

Parsing inconsistent HTML with XPath

I'm trying to gather information from a web page that has inconsistent HTML, for example:
<ul><li>Item #1</li></ul><ul><li>sub Item #1</li></ul>
and that's alright, I use the XPath expression
//div[#id="content"]/ul/li/text()
and it does the job (except that doesn't gather the information from sub Item #1.,
Also the HTML varies and this is other way:
<dl><dd><ul><li>Item #1</li></ul></dd></dl><dl><dd><ul><li>sub Item #1</li></ul></dd></dl>
Well, I'm trying to gather Item #1 and sub Item #1. But with this inconsistent HTML I'm not able to find an XPath expression that will allow me to gather the information in any case, could you help me with this?
There will always be a list, the Item #1 and sub Item #1 always will be inside a <ul><li>
You could try using descendant axis (//) to select ul/li/text() no matter how deep it is nested within a consistent ancestor/parent. For example, assuming that ancestor/parent of ul/li is always a div having id equals "content" :
//div[#id="content"]//ul/li/text()

using Xpath to scrape inconsistent DOM

I want to scrape the post name, which for pattern one it's located within a span
but the forum thread can goes like this (line 7)
because the thread is a poll.
so in my case I can't target the span (line 8 first picture), I used descendants-or-self but hardly to get it right. What's wrong here?
$postTitle = $xpath->query("//tr/td[#class='row1'][3]/div/div[1]//descendant-or-self::text()");
With this expression you will select the first <a> in the <div> where the text you wish to extract is located:
//tr/td[#class='row1'][3]/div/div[1]/a[1]
I'm assuming you intend to select one element (and not a node-set). For that you can get the string-value of this expression (which will return all the text in the descendant nodes) using string() or normalize-space() (which trims and removes extra spaces):
normalize-space(//tr/td[#class='row1'][3]/div/div[1]/a[1])
This will extract Salary vs age or /ktards are you... depending on the node found.
If there is more than one match it will return a collection, which you should iterate over and get the string value of each one individually. Using those functions on a node-set will give you the text in the first element, discarding the others.
If you only have to deal with two cases: 1) text inside a/span, 2) text inside a, you can select the text nodes directly using a union (|) operator:
//tr/td[#class='row1'][3]/div/div[1]/a[1]/text() | //tr/td[#class='row1'][3]/div/div[1]/a[1]/span/text()