Finding a regex to match HTML - html

I'm trying to find a regex pattern to use with this example to match only the number values after /p/
/store/shop/en/model/brand/Heisman-On/p/15890735"target="">Heisman-on
/store/shop/en/model/brand/Heisman/p/03518616"target="">Heisman
/store/shop/en/model/brand/2tee-Cove3/p/67834675"target="">2tee-Cove3
such as :
15890735
03518616
67834675

This would do the trick:
\/p\/(\d+)
Here is a preview.
Use the first capturegroup.
Here's a way you could get the numbers:
var input = '/store/shop/en/model/brand/Heisman-On/p/15890735"target="">Heisman-on';
var regex = /\/p\/(\d+)/;
var numbers = regex.exec(input)[1];
console.log(numbers); // Output: 15890735

Try the following:
/(?:\/p\/)(\d+)/i
http://regexr.com/3blr5
Your aim would be for the $1 output

Related

How to extract the hyperlink text from a <a> html tag?

Given a string containing 'blabla text blabla', I want to extract 'text' from it.
regexp doc suggests '<(\w+).*>.*</\1>' expression, but it extracts the whole <a> ... </a> thing.
Of course I can continue using strfind like this:
line = 'blabla text blabla';
atag = regexp(line,'<(\w+).*>.*</\1>','match', 'once');
from = strfind(atag, '>');
to = strfind(atag, '<');
text = atag((from(1)+1):(to(2)-1))
, but, can I use another expression to find text at once?
You can use the extractHTMLText function in Matlab, you can read about it in the following link.
Example that get the desired output:
line = 'blabla text blabla';
l = split(extractHTMLText(line), ' ');
l{2}
If you don't want to use a built in function you could use regex as Nick suggested.
line = 'blabla text blabla';
[atag,tok] = regexp(line,'<(\w+).*>(.*?)</\1>','match','tokens');
t = tok(1,1){1};
t{2}
and you'll get the desired output
You can simply use a Group.
Update of your pattern will be something like this:
<(\w+).*>(.*)<\/\1>
and this one include all tags:
<.*>(.*)<.*>
Regex101
If you are using JQuery try this. No Regex required. But this might negatively impact performance if the DOM is hefty.
$jqueryobj = $(line);
var text = $jqueryobj.find("a").text();

How to write regex expression for this type of text?

I'm trying to extract the price from the following HTML.
<td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td>
What is the regex expression to get the number 75.00?
Is it something like:
<td>$*/<span class='small font-weight-bold text-danger'>
The dollar sign is a special character in regex, so you need to escape it with a backslash. Also, you only want to capture digits, so you should use character classes.
<td>\$(\d+[.]\d\d)<span
As the other respondent mentioned, regex changes a bit with each implementing language, so you may have to make some adjustments, but this should get you started.
I think you can go with /[0-9]+\.[0-9]+/.
[0-9] matches a single number. In this example you should get the number 7.
The + afterwards just says that it should look for more then just one number. So [0-9]+ will match with 75. It stops there because the character after 5 is a period.
Said so we will add a period to the regex and make sure it's escaped. A period usually means "every character". By escaping it will just look for a period. So we have /[0-9]+\./ so far.
Next we just to add [0-9]+ so it will find the other number(s) too.
It's important that you don't give it the global-flag like this /[0-9]+\.[0-9]+/g. Unless you want it to find more then just the first number/period-combination.
There is another regex you can use. It uses the parentheses to group the part you're looking for like this: /<td>\$(.+)<span/
It will match everything from <td>$ up to <span. From there you can filter out the group/part you're looking for. See the examples below.
// JavaScript
const text = "<td>$something<span class='small font-weight..."
const regex = /<td>\$(.+)<span/g
const match = regex.exec(text) // this will return an Array
console.log( match[1] ) // prints out "something"
// python
text = "<td>$something<span class='small font-weight..."
regex = re.compile(r"<td>\$(.+)<span")
print( regex.search(text).group(1) ) // prints out "something"
As an alternative you could use a DOMParser.
Wrap your <td> inside a table, use for example querySelector to get your element and get the first node from the childNodes.
That would give you $75.00/.
To remove the $ and the trailing forward slash you could use slice or use a regex like \$(\d+\.\d+) and get the value from capture group 1.
let html = `<table><tr><td>$75.00/<span class='small font-weight-bold text-
danger'>Piece</span></small> *some more text here* </td></tr></table>`;
let parser = new DOMParser();
let doc = parser.parseFromString(html, "text/html");
let result = doc.querySelector("td");
let textContent = result.childNodes.item(0).nodeValue;
console.log(textContent.slice(1, -1));
console.log(textContent.match(/\$(\d+\.\d+)/)[1]);

I want to ignore whitespaces with regular expression in jmeter

I want to ignore the whitespace in regex for correlation in jmeter.
This is my expression given below and i need to correlate the values inside single quotes.
Here there is a space and 3 tab after the variable "var_SampleData"
var __SampleData = [['CONSN_4578', '787', '01/01/2010', 'Active']];
I tried using regular expressions like:
var __SampleData\s+ = [['(.+?)', '(.+?)', '01/01/2010', 'Active']];
var __SampleData\s* = [['(.+?)', '(.+?)', '01/01/2010', 'Active']];
var __SampleData = [['(.+?)', '(.+?)', '01/01/2010', 'Active']];
Thanks
Bichu
var __SampleData\s*= \[\['(.+?)', '(.+?)', '01\/01\/2010', 'Active'\]\];
Demo

Losing leading 0s when string converts to array

I have a textInput control that sends .txt value to an array collection. The array collection is a collection of US zip codes so I use a regular expression to ensure I only get digits from the textInput.
private function addSingle(stringLoader:ArrayCollection):ArrayCollection {
arrayString += (txtSingle.text) + '';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
The US zip codes start at 00501. Following the debugger, after the zip is submitted, the variable 'arrayString' is 00501. But once 'newArray' is assigned a vaule, it removes the first two 0s and leaves me with 501. Is this my regular expression doing something I'm not expecting? Could it be the array changing the value? I wrote a regexp test in javascript.
<script type="text/javascript">
var str="00501";
var patt1=/\D/;
document.write(str.match(patt1));
</script>
and i get null, which leads me to believe the regexp Im using is fine. In the help docs on the split method, I dont see any reference to leading 0s being a problem.
**I have removed the regular expression from my code completely and the same problem is still happening. Which means it is not the regular expression where the problem is coming from.
Running this simplified case:
var arrayString:String = '00501';
var re:RegExp = /\D/;
var newArray:Array = arrayString.split(re);
trace(newArray);
Yields '00501' as expected. There's nothing in the code you've posted that would strip leading zeros. You may want to dig around a bit more.
This smells suspiciously like Number coercion: Number('00501') yields 501. Read through the docs for implicit conversions and check if any pop up in your code.
What about this ?
/^\d+$/
You can also specify exactly 5 numbers like this :
/^\d{5}$/
I recommend just getting the zip codes instead of splitting on non-digits (especially if 'arrayString' might have multiple zip codes):
var newArray:Array = [];
var pattern:RegExp = /(\d+)/g;
var zipObject:Object;
while ((zipObject = pattern.exec(arrayString)) != null)
{
newArray.push(zipObject[1]);
}
for (var i:int = 0; i < newArray.length; i++)
{
trace("zip code " + i + " is: " + newArray[i]);
}

Regular Expression Help AS3?

I am working on a regular expression and I need to extract two parts of an expression that is being imported through a flashvars.
//sample data similar to what comes in from the flashvars. Note that the spaces are not after the and symbol, they are there because the html strips it.
var sampleText:String = "height1=60& amp;height2=80& amp;height3=95& amp;height4=75& amp;"
var heightRegExp:RegExp = /height\d/g; //separates out the variables
var matches:Array = sampleText.match(heightRegExp);
Now I need help isolating the values of each variable and putting them in an array...For instance, 60, 80, etc. I know I should be able to write this regular expression, but I just can't get the exec expression right. Any help would be really appreciated!
sorry for not answering the question with regexes directly. I would do this:
var keyvalues:Array = sampleText.split("& amp;");
var firstkey:String = keyvalues[0].split("=")[0];
var firstvalue:String = keyvalues[0].split("=")[1];
Would that help beside the fact, that it is not using RegEx?
Neither the =, & or the ; are special characters, so I think you can use
=|&
in a split call and then the values will be in the odd indices and the height2 style names would be in the even indices.
You can use URLUtil.stringToObject()
Something like this should work:
var s:String = "name=Alex&age=21";
var o:Object = URLUtil.stringToObject(s, "&", true);
However, if you're just getting the flashvars, you should pull them from the loaderInfo of the root.
this.root.loaderInfo.parameters;