I'm trying to write a Regex that searches the source code of a series of webpages in a CSV file. I'm using the following to do the match:
$linkContent = $web.DownloadString($linkToBeConverted)
$object = [regex]::Matches($linkContent, $regex)
I'm trying to search in a list with class="menu" to see if it has links somewhere in it. Unfortunately, I seem to be matching way more than I need. I want a way to stop the match when I hit a certain string. Specifically div class="test" as per the example below.
This is my regular expression now:
(?sm)<ul class="menu">.*?(<a href="h).*?(<\/ul>)
The following is the source code I'm trying to search in. This SHOULD NOT be a match if my regular expression was correct. However, because there is a link somewhere between and the second list (which is not defined as class="menu") I get a match. Is there any way I can write this regular expression so that it stops when div class="test" is found? As a result of the template, div class="test" should always be in the code right after the menu list.
<ul class="menu">
<li>
<p>Yes there are paragraph tags and random stuff in these lists...</p>
</li>
<li>
<div><span>Example</span>
</div>It's pretty random
</li>
<li>Nothing here!</li>
</ul>
<div class="test">
<p><a href="http://match.html"></p>
<ul>
<li>Unfortunately this will cause a match since there's another list</li>
</ul>
Thank you so so much for your help in advance! I've been working on this all morning and I'm completely lost. If there's a way to do this in PowerShell I'm open to that as well.
Related
Our project usually has a professional web developer that handles this sort of thing. But ours left, and we're still looking, so this weirdness falls to me. We have a situation where a list of lines under a node has to affect the parent node's look.
<span class="feedback">
<ul>
<li class="info">Just an info message</li>
<li class="error">This is an error</li>
</ul>
</span>
In the above, if all the li's are not error, we want one look, but if there's even ONE error, we want a different look. Is that something CSS can do? if so, how? Is there something I can do on feedback that makes it show one way under no errors, and a different way if there are?
Appending class has-errors to span.feedback on the back-end is not an option? If you have control over what comes from the back-end I'd try to do something like such approach instead (it's somehow simpler):
<span class="feedback has-errors">
<ul>
<li class="info">Just an info message</li>
<li class="error">This is an error</li>
</ul>
</span>
Otherwise, as far as I know, it's not possible to accomplish precisely what you're asking only through CSS. But if you use JS, you could also make it happen (far from ideal), IMO.
What would the most relevant HTML tags be for a numbered progress bar such as:
(1) Step 1: do something
(2) Step 2: do something else
(3) Step 3: complete
I considered using an <ol> but in this case, the '1', '2', '3', will need styling.
Also to indicate which step the user is currently on.
As you've already decided, an ordered list is the most semantic way to show a progress bar. Unfortunately there isn't a specific HTML element or attribute for "current step" or anything similar (I wish I knew why!). So you'll need to make something yourself. The simplest way I've used before is to use an image of the step number with an alt attribute, e.g.
<li><img src='step1.png' alt='Step 1: Current step'>Do something</li>
But you could use visually-hidden text instead. The image would need to be visually distinct from the other steps so that colour-blind users are informed as well.
Probably nothing more semantic currently than:
<ol>
<li class='currentStep'>
<span>1</span>
<span>do something</span>
</li>
<li>
<span>2</span>
<span>do something else</span>
</li>
</ol>
Semantically I agree with the use of ol but I believe you can achieve the desired effect with a bit of css and use of data-attributes
The markup would look something like this
<ul>
<li data-step="1">Do something</li>
</ul>
Here's a sample of the implementation: http://jsbin.com/gifisireku/edit?html,css,output
The benefit of using this is you can have clean and accessible markup which is great for SEO as well ;)
I'm using Xpath in Google docs to get the text inside <div>.
I want to save the text inside <div id="job_description"> in one cell of Google doc spreadsheet, but it shows each <div> in separate cell.
<div id="job_description">
<div>
<strong>
Basic Purpose:
</strong>
<br></br>
</div>
<div>
Work closely with developers, product owners and Q…
<br></br>
</div>
<div>
The Test Analyst is accountable for the developmen…
<br></br>
</div>
<div>
<strong>
Duties and Responsibilities:
</strong>
</div>
<ul>
<li></li>
<li></li>
</ul>
<div>
<strong>
Requirements:
</strong>
<br></br>
</div>
<ul>
<li></li>
<li></li>
</ul>
</div>
Image:
http://i.stack.imgur.com/K0mAY.png
and this is the code I wrote:
=IMPORTXML(E4,"//div[#id='job_description']")
May you help me to put all of the text (including <div> <ul> ...) inside the <div id="job_description"> in only one cell ?
Using JOIN is a good start, but you can make it a single operation.
You did not show the URL to the page you're importing, so I can only give you an example with another page. For instance, if you are importing www.w3.org and looking for a div where #class='event closed expand_block', use
=JOIN(CHAR(10),IMPORTXML("http://www.w3.org/","//div[#class='event closed expand_block']//text()"))
Notice that I also modified the XPath expression: //text() makes sure only descendant text nodes are retrieved, that is, all the text.
EDIT: Responding to your comment:
May I know what is CHAR(10) referring to?
Yes, of course. CHAR returns a character and takes a number as input. In the case of CHAR(10), a newline character is returned (I assume because of
).
In the formula, CHAR(10) is used as the first argument of JOIN, which is the delimiter of the objects that are to be joined.
For now I found a solution , I'll put it here so that others can know my answer, but if there is any other solution please let us know
I used JOIN to put the separate cells (L3:X3) into one single cell
=Trim(JOIN(" ",L3:X3))
you can also use regexreplace to remove the line breaks, with
=REGEXREPLACE(IMPORTXML(E4,"//div[#id='job_description']"),"\n","")
this should wrap it all into one cell for you.
I have inherited a website, in which I am having to update about 3500 files with very 95% similar content in each (product pages).
In order to make some changes, I am using Regex (in Dreamweaver) to do some bulk editing.
I've been able to get everything done ok, but I am running into a problem with content within a tag.
I need to be able to grab all the content within that tag and save it for when I replace the other content on the page (this is one of the few things whose content is different from page to page).
Here is an example:
<ul>
<li style="padding-top:10px; text-align:right;">Single Item - $99.99 <img src="../../images/buy-now-button.gif" alt="Buy Now" width="50" height="20" border="0"> </li>
<li style="padding-top:10px; text-align:right;"><strong>Set of 6 Items - $299.99</strong> <img src="../../images/buy-now-button.gif" alt="Buy Now" width="50" height="20" border="0"> </li>
<li style="padding-top:10px"><img src="../../images/free_shipping.jpg" alt="Free Upgrade." width="227" height="107"> </li>
</ul>
I would go more individually and get the content in the individual <li> tabs, but the problem is that some pages have only one <li> within the <ul>, or up to 6 depending on the number of product variations on that page.
So my overall question is this: how do I grab all the content (including new lines, other tags, etc.) within a given tag and save it for when the rest of the content needs to be replaced? I know how to use parentheses around the content and then $# in the Replace section.
The websites I've worked on thus far have been much smaller, and I've not had much need for Regex because it was typically easier to make changes manually or just using literal text in Find/Replace.
How complex are these web pages? If <ul> elements are never nested inside other <ul> elements, and you don't have to deal with bogus tags inside (for example) SGML comments or CDATA sections, this is probably all you need:
<ul>[\s\S]*?</ul>
[\s\S] is how you match any character including newlines in JavaScript regexes (which is what Dreamweaver uses, or so I've read).
*? tells it to match zero or more, reluctantly--meaning it quits matching as soon as it becomes possible for the next part of the regex (</ul>) to match.
I'm currently trying to come up with a good and accessible way to format a status indicator which should be rendered within a set of wizard-like pages on a website. The website should provide a multipage form with a status indicator on top of it as demonstrated in the wireframe below:
Given the new progress-tag in HTML my first thought was to do something like this:
<progress value="2" max="3">
<ul>
<li>Beginning</li>
<li class="now">Right now</li>
<li>End</li>
</ul>
</progress>
... but since <progress> only accepts phrasing content using a list is not really an option. So right now I would probably go with something like this, integratinng the ARIA progressbar-role:
<ul aria-role="progressbar" aria-valuenow="2" aria-valuemin="1" aria-valuemax="3" aria-describedby="state2" aria-valuetext="Right now">
<li id="state1">Beginning</li>
<li id="state2" class="now">Right now</li>
<li id="state3">End</li>
</ul>
But again, I'm not really sure if the progressbar role can be applied in such a way to a list.
Another problem is, that <progress> is rendered as progress bar in Opera, for instance, so >progress> itself is probably not really a viable solution altogether :-(
Can anyone perhaps recommend an accessible status bar that does not only rely on using a single image?
Current solution
For now I will go with following markup:
<section class="progress">
<h1 class="supportive">Your current progress</h1>
<ol>
<li><span class="supportive">Completed step:</span> Login</li>
<li class="now"><span class="supportive">Current step:</span> Right now</li>
<li><span class="supportive">Future step:</span> End</li>
</ol>
</section>
All elements of the class "supportive" will be positioned off-screen. IMO this way we should have a nice compromise of semantic markup (the state succession is in my opinion really an ordered list ;-)) and accessibility thanks to the additional header and status text for each step.
According to whatwg, you're not supposed to assign progressbar role to <ul> elements.
I'd just ditch <ul> and describe progress using (surprise) phrasing content:
<section role="status">
<h2>Task Progress</h2>
<p>You're now at <progress value=2 max=3>"Right now" step</progress>.
</section>
Update: You're right, progress doesn't suit here, it's more like an interactive form widget. I should've checked first, before taking it from your first example. But anyway, the point is there's no need to use a list (even more so, unordered list), when you can just describe what's going on in plain text. In the case that the list of past and future steps is necessary, I'd just add two more paragraphs, one before the status (‘You've completed the "Beginning" step’), and one after (‘Next step will be the "End" step’).
However, I admit that this isn't a complete answer to your question.
Also, I'd say some aria attributes look redundant to me. For example, aria-valuetext perhaps would make more sense in the context of interactive widget, when there's no other human-friendly description of its state. Though I may be wrong here.