How to write regular expression for found href of a tags? [duplicate]

How to write regular expression for found href of a tags? [duplicate] - html

This question already has answers here:
How to get one "a href" out of many in one html class with jSoup
(2 answers)
Closed 7 years ago.
I need to found href of a tags in string such as this .
<li>باغ بلور<span class="ur">bipardeh94.blogfa.com</span><span class="ds">فرهنگی-خبری-علمی</span></li>
<li>هزار نکته <span class="ur">avaejam.blogfa.com</span><span class="ds"> يك نكته از هزار نكته باشد تا بعد </span></li>
<li>روابط عمومی دانشگاه آزاداسلامی کنگاور<span class="ur">prkangavar.blogfa.com</span><span class="ds">اخبار دانشگاه</span></li>
I use this code :
string regex = "href=\"(.*)\"";
Match match = Regex.Match(codeHtml, regex);
if (match.Success)
{
textBox1.Text += match.Value +"\n";
}
This code found first href and then return all codes.

Does this regex work?
string regex = "href=\"([^\"]*)\"";
[^\"]* allows everything inside the href's quotes to be anything but a quote
For how to match all tags, please use Regex.Matches

Related

Is there a way to fix quotes that are inside of each other without them clashing? [duplicate]

This question already has answers here:
How to escape double quotes in a title attribute
(7 answers)
How do I properly escape quotes inside HTML attributes?
(6 answers)
Closed 2 days ago.
I'm making a list of links that have bookmarklets inside. The problem is that there are quotes in the bookmarklet that clash with the quotes. Is there a way to fix this, or otherwise is there a different way to do it?
Code:
<a href='javascript:(function() { var l = document.querySelector("link[rel*='icon']") || document.createElement('link'); l.type = 'image/x-icon'; l.rel = 'shortcut icon'; l.href = 'https://google.com/favicon.ico'; document.getElementsByTagName('head')[0].appendChild(l); document.title = 'Google';})();'>Code</a>
I tried changing the quote type, but that doesn't work. I want the javascript to be inside the link.

BeatifulSoup Extract String in div tag [duplicate]

This question already has answers here:
how to get text from within a tag, but ignore other child tags
(2 answers)
Closed 2 years ago.
I have the following HTML:
<div class="interesting"><span>a</span> <span>b</span> c</div><div>d</div>
I am trying to use beautifulsoup to extract the string c.
However, soup.div.string is None. I could call get_text() to get a b c and then I parse the text again. But I feel it defeats the purpose of using beautifulsoup.
Any suggestion?
=====================
Update:
I added to my example string above as I noticed that it actually causes soup.div.find(text=True, recursive=False) fails to return text in div. So this question isn't a duplicate anymore.
soup = BeautifulSoup('<div class="interesting"><span>a</span> <span>b</span> c</div><div>d</div>', 'html.parser')
div = soup.find('div', class_='interesting')
print(div.find_all_next(text=True)[-1])
above code prints d

This should help you:
div = soup.find('div',class_ = "interesting")
print(div.find_all(text=True)[-1].strip()) #Prints the last text present within the div tag
Output:
c
Here is the full code:
from bs4 import BeautifulSoup
html = '<div class="interesting"><span>a</span> <span>b</span> c</div><div>d</div>'
soup = BeautifulSoup(html,'html5lib')
div = soup.find('div',class_ = "interesting")
print(div.find_all(text=True)[-1].strip())

html5 - can't format `\n` as new line in rendered string [duplicate]

This question already has answers here:
Why does the browser renders a newline as space?
(6 answers)
Closed 3 years ago.
I have the following tag but '\n' inside item.value not formatted correctly .
<td ng-if="flag">{{item.value}}</td>

HTML needs a <br/> tag. Use this regex on your value.
item.value = item.value.replace(/(?:\r\n|\r|\n)/g, '<br>');
let item = {};
item.value= "Hi I am some text with a \n line break";
item.value = item.value.replace(/(?:\r\n|\r|\n)/g, '<br>');
document.write(item.value);

RegEx for capturing an attribute value in a HTML element [duplicate]

This question already has answers here:
Extract Title from html link
(2 answers)
Closed 3 years ago.
I have a problem to extract text in the html tag using regex.
I want to extract the text from the following html code.
Google
The result:
TEXTDATA
I want to extract only the text TEXTDATA
I have tried but I have not succeeded.

Here we want to swipe the string up to a left boundary, then collect our desired data, then continue swiping to the end of string, if we like:
<.+title="(.+?)"(.*)
const regex = /<.+title="(.+?)"(.*)/gm;
const str = `Google`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im also helps to visualize the expressions.
PHP
$re = '/<.+title="(.+?)"(.*)/m';
$str = 'Google';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;

Use this regex:
title=\"([^\"]*)\"
See:
Regex

Google
Remvoe Title and try

Extract plain text from an html file in C [duplicate]

This question already exists:
Using Regex in C [closed]
Closed 9 years ago.
I am really desperate. I need to extract all html elements including html tags. I want to retain just plain text. I am required to do this in C. I am discouraged to use Regex. If I use string functions, it just removes delimiters , not the string inside. I need to create a program which extracts plain text from an html file. Any guide would be appreciated on how to do so. Thanks!

Here's a starting point for you:
void remove_html(char* str) {
char* html_str = str;
while(*str) {
if(*html_str == '<')
while(*html_str && *html_str++ != '>');
*str++ = *html_str++;
}
}
int main() {
char foo[] = "hello <p>friends<b>!</b></p>";
remove_html(foo);
puts(foo);
}
It only strips the angular syntax - doesn't do any parsing. Also, it doesn't convert escape characters.

If you open up a html file in notepad, you'll find it is plain text (no images or anything).
All tags start with < and end with >, everything else is text. In this way, you can read through the file only once, excluding the characters that appear between < > symbols.
Pseudocode:
bool intag=false;
for (i=0;i<filesize;i++) {
char c = readchar();
if (c=='<') intag=true;
if (!intag) writechar(c);
if (c=='>') intag=false;
This logic should work for most cases, though you may have to do some more work to deal with indented text and possibly any javascript on the page.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to write regular expression for found href of a tags? [duplicate] - html

Does this regex work? string regex = "href=\"([^\"])\""; [^\"] allows everything inside the href's quotes to be anything but a quote For how to match all tags, please use Regex.Matches

Related

Is there a way to fix quotes that are inside of each other without them clashing? [duplicate]

BeatifulSoup Extract String in div tag [duplicate]

html5 - can't format `\n` as new line in rendered string [duplicate]

RegEx for capturing an attribute value in a HTML element [duplicate]

Extract plain text from an html file in C [duplicate]

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to write regular expression for found href of a tags? [duplicate] - html

Does this regex work? string regex = "href=\"([^\"]*)\""; [^\"]* allows everything inside the href's quotes to be anything but a quote For how to match all tags, please use Regex.Matches

Related

Is there a way to fix quotes that are inside of each other without them clashing? [duplicate]

BeatifulSoup Extract String in div tag [duplicate]

html5 - can't format `\n` as new line in rendered string [duplicate]

RegEx for capturing an attribute value in a HTML element [duplicate]

Extract plain text from an html file in C [duplicate]

Categories

Resources

Does this regex work? string regex = "href=\"([^\"])\""; [^\"] allows everything inside the href's quotes to be anything but a quote For how to match all tags, please use Regex.Matches