Regular expression remove some links - html

i need a regular expression to strip html tags for some links
example
link
fasafiso
should be converted to
link
fasafiso

Depending on your programming language, you could come up with sth. like:
~<a href="sample\.com" [^>]*>(.*?)</a>~
# delimiter ~
# look for <a, everything that is not > and >
# capture everything lazily in a group
# look for a closing tag
# delimiter ~
In your example, group 1 would hold fasafiso and could be replaced/insert via the group $1.
See a demo for this approach on regex101.com.
Hint:
This is just a quick-and-dirty solution (e.g. for text editors). If this is getting more complicated, consider using a parser instead.

I'll assume you want to replace all links whose target is sample.com by their content :
match <a[^>]*href="sample.com"[^>]*>([^<]*)</a>
replace by \1
For example with sed :
sed 's/<a[^>]*href="sample.com"[^>]*>([^<]*)</a>/\1/'
Please also keep in mind that if your requirements are complex enough you should instead be using an HTML parser.

Related

HTML input pattern: all except URL

Is it real to set input pattern to all as usually, but with one exception: url are not acceptable. I mean for example all input patterns are ok, but:
ftp://example.com
http://example.com
https://example.com
we could not enter...
is it real to do without using javascript or no ?
With JavaScript and using the regex found here: What is the best regular expression to check if a string is a valid URL?, you could do something like this:
function isValid(inputVal){
return !/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/.test(inputVal);
}
isValid(document.getElementById("inputID").value);
EDIT
Without JavaScript you can do it like such
<input pattern="^(?!((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?))" >
^ # start of the string
(?! # start negative look-ahead
.* # zero or more characters of any kind (except line terminators)
foobar # foobar
)
Choose the URL validation regex from internet ( or write your own :) ).
Put it in negative look-ahead (?!).
Add .* for match everything else.
Use your new regex in pattern attribute of the inputs.
For example if the URL validation regex is ^(((https?)|(ftp)):\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$ the inputs will be like
<input type="text" pattern="^(?!(((https?)|(ftp)):\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?).*$" />
Note: not every regex will work if you add it in negative look-ahead so just use JavaScript and inverse the result of the original regex. Also your input must be inside a form to trigger the patern validation (on form submit).
The question indicates you already know the regex and just want to know whether you should be using Javascript (or HTML) for this. The answer would be: probably not.
If you are filtering input for - say - a forum, using Javascript would be a bad idea because it runs locally, so the user can easily avoid the check. Use a server-sided language (most-probably PHP) to do the check.

Regular expression to match CSS rules

Example original CSS:
.sk-hybrid{}
.sk-hybrid header > .row,
.sk-hybrid > #content.row,
.sk-hybrid footer > .row{}
.sk-border-radius-sm{}
.sk-gradient-gray-sm{}
I want to preg_replace all instances of .sk-whatever{ to .sk-whatever-i{ and all instances of the same terms followed by a space, like **.sk-whatever ** to **.sk-whatever-i **
Basically what I'm trying to achieve is, write some PHP code that will parse my CSS file to add the "-i" to all instances of my .sk-someword class. So I can then append the !important declaration to the ruleset but that's easy enough.
I need the regX only to add the "-i". Please note that .sk-whatever might have special characters between.
.sk-some-class-term(space)
or
.sk-some-class-term{
I'm such a slob when it comes to regX. I'm pretty sure others can write this easily. I can't. Help please? :(
The result of my example CSS should be:
.sk-hybrid-i{}
.sk-hybrid-i header > .row,
.sk-hybrid-i > #content.row,
.sk-hybrid-i footer > .row{}
.sk-border-radius-sm-i{}
.sk-gradient-gray-sm-i{}
This regex should work for your purposes:
^(\.sk-\w+(?:[^\{\s,>]+))
RegEx Demo
Explanation
^ Matches beginning of string
( Begin capture group
\.sk-\w+ Matches .sk- followed by letters or numbers
(?: Begin non-capturing group
[^\{\s,>]+ Matches any non-whitespace, non-{, non-,, or non-> character
) End non-capturing group
) End capture group
Match
(\.sk(-\w+)+)
and replace with $1-i: http://regex101.com/r/nC8kU1/1
However, much better would be to use a dedicated tool, that, unlike regexes, "understands" the underlying language. For php, there's https://github.com/sabberworm/PHP-CSS-Parser, pay attention to the Prepend id to selectors example - it's almost what you're looking for.
PHP to do what you are asking where $cssString is set to your css.
$cssString = preg_replace ( '(\.sk-\w+(?:[^\{\s,>]+))', '$1-i', $cssString);
Though if you are running on linux, just use sed
sed 's/^(\.sk-\w+(?:[^\{\s,>]+))/$0-i/g' example.css

Emmet - Wrap with Abbreviation - Token that represents the wrapped text i.e. {original text}

I'm attempting to convert a list of URLs into HTML links as lazily as possible:
www.annaandsally.com.au
www.babylush.com.au
www.babysgotstyle.com.au
... etc
Using wrap in abbreviation, I'd like to do something like: a[href="http://${1}/"]*
The expanded abbreviation would result in:
www.annaandsally.com.au
www.babylush.com.au
www.babysgotstyle.com.au
... etc
The missing piece of the puzzle is an abbreviation token that represents the text being wrapped.
Any idea if this can be done?
If they are already on their own lines (which in the question, they look like they are), a simple Find and Replace with RegEx turned on will work. The Params are as follows:
Find What:
(.+)
Replace With:
$1
Before
After
Sergey from Emmet was kind enough to point me in the right direction. The $# token contains the original content:
a[href="http://$#/"]*>{$#}
By specifying $# as the href attribute, the original content is no longer 'wrapped' and must be be reinserted via {$#}.
http://docs.emmet.io/actions/wrap-with-abbreviation/#controlling-output-position

How to replace a specific line of HTML code with Regular Expression In Dreamweaver?

I want to replace <whatever>Some Title</whatever> with <something>Some Title</something> using the Find and Replace tool inside of Dreamweaver. How do I perform?
Not a Dreamweaver user, but this simple approach works in my editor (Emacs):
Replace:
<whatever>\(.*\)</whatever>
With:
<something>\1</something>
This is a pretty straightforward approach but it may fall short of your needs. Do some or all of your <whatever> element pairs occupy more than one line of text? Or do you have more than one <whatever> pair on a single line?
i guess what you want is to change all your <whatever> tag with an <something> tag whitout changing your text, right?
If it is so, you want to use find and replace with regular expression. Find (in source code) <whatever>(.*)</whatever> and replace it with <something>$1</something>. The $1 is used as a variable for anything fits the (.*) part DW finds for each instance.
For example, you you want to comment all instances of an
document.NAMEOFANYFORMONTHEPAGE.WHATEVERNAME.focus();
in a JavaScript file, you would use find:
document\.(.*)\.focus\(\);
and replace it with:
// document.$1.focus();
Don't forget to escape special characters and, please, try a few instances before using Replace All

How can I retrieve a collection of values from nested HTML-like elements using RegExp?

I have a problem creating a regular expression for the following task:
Suppose we have HTML-like text of the kind:
<x>...<y>a</y>...<y>b</y>...</x>
I want to get a collection of values inside <y></y> tags located inside a given <x> tag, so the result of the above example would be a collection of two elements ["a","b"].
Additionally, we know that:
<y> tags cannot be enclosed in other <y> tags
... can include any text or other tags.
How can I achieve this with RegExp?
This is a job for an HTML/XML parser. You could do it with regular expressions, but it would be very messy. There are examples in the page I linked to.
I'm taking your word on this:
"y" tags cannot be enclosed in other "y" tags
input looks like: <x>...<y>a</y>...<y>b</y>...</x>
and the fact that everything else is also not nested and correctly formatted. (Disclaimer: If it is not, it's not my fault.)
First, find the contents of any X tags with a loop over the matches of this:
<x[^>]*>(.*?)</x>
Then (in the loop body) find any Y tags within match group 1 of the "outer" match from above:
<y[^>]*>(.*?)</y>
Pseudo-code:
input = "<x>...<y>a</y>...<y>b</y>...</x>"
x_re = "<x[^>]*>(.*?)</x>"
y_re = "<y[^>]*>(.*?)</y>"
for each x_match in input.match_all(x_re)
for each y_match in x_match.group(1).value.match_all(y_re)
print y_match.group(1).value
next y_match
next x_match
Pseudo-output:
a
b
Further clarification in the comments revealed that there is an arbitrary amount of Y elements within any X element. This means there can be no single regex that matches them and extracts their contents.
Short and simple: Use XPath :)
It would help if we knew what language or tool you're using; there's a great deal of variation in syntax, semantics, and capabilities. Here's one way to do it in Java:
String str = "<y>c</y>...<x>...<y>a</y>...<y>b</y>...</x>...<y>d</y>";
String regex = "<y[^>]*+>(?=(?:[^<]++|<(?!/?+x\\b))*+</x>)(.*?)</y>";
Matcher m = Pattern.compile(regex).matcher(str);
while (m.find())
{
System.out.println(m.group(1));
}
Once I've matched a <y>, I use a lookahead to affirm that there's a </x> somewhere up ahead, but there's no <x> between the current position and it. Assuming the pseudo-HTML is reasonably well-formed, that means the current match position is inside an "x" element.
I used possessive quantifiers heavily because they make things like this so much easier, but as you can see, the regex is still a bit of a monster. Aside from Java, the only regex flavors I know of that support possessive quantifiers are PHP and the JGS tools (RegexBuddy/PowerGrep/EditPad Pro). On the other hand, many languages provide a way to get all of the matches at once, but in Java I had to code my own loop for that.
So it is possible to do this job with one regex, but a very complicated one, and both the regex and the enclosing code have to be tailored to the language you're working in.