HTML input pattern: all except URL - html

Is it real to set input pattern to all as usually, but with one exception: url are not acceptable. I mean for example all input patterns are ok, but:
ftp://example.com
http://example.com
https://example.com
we could not enter...
is it real to do without using javascript or no ?

With JavaScript and using the regex found here: What is the best regular expression to check if a string is a valid URL?, you could do something like this:
function isValid(inputVal){
return !/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/.test(inputVal);
}
isValid(document.getElementById("inputID").value);
EDIT
Without JavaScript you can do it like such
<input pattern="^(?!((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?))" >
^ # start of the string
(?! # start negative look-ahead
.* # zero or more characters of any kind (except line terminators)
foobar # foobar
)

Choose the URL validation regex from internet ( or write your own :) ).
Put it in negative look-ahead (?!).
Add .* for match everything else.
Use your new regex in pattern attribute of the inputs.
For example if the URL validation regex is ^(((https?)|(ftp)):\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$ the inputs will be like
<input type="text" pattern="^(?!(((https?)|(ftp)):\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?).*$" />
Note: not every regex will work if you add it in negative look-ahead so just use JavaScript and inverse the result of the original regex. Also your input must be inside a form to trigger the patern validation (on form submit).

The question indicates you already know the regex and just want to know whether you should be using Javascript (or HTML) for this. The answer would be: probably not.
If you are filtering input for - say - a forum, using Javascript would be a bad idea because it runs locally, so the user can easily avoid the check. Use a server-sided language (most-probably PHP) to do the check.

Related

How to set RegEx global flag in HTML pattern attribute of the input element?

I want to client validate a form input (username + password) before sending it to the server (php).
Therefore I applied the pattern attribute in the input tag.
I came up with a RegEx expression that does the job on the server side:
(preg_match_all('/^[a-zA-Z0-9. _äöüßÄÖÜ#-]{1,50}$/', $_POST['username']) == 0)
thereby the global flag is set using preg_match_all (instead of preg_match).
Now I wanted to implement the same RegEx in my pattern attribute in the HTML form.
HTML standard defines that RegEx in the pattern attribute follows RegEx in JavaScript, which devides the expression into "pattern, flags" divided by a comma. I would translate that into HTML like this:
pattern="^[a-zA-Z0-9. _äöüßÄÖÜ#-]{1,50}$,g"
That doesn't work.
All JavaScript RegEx validators I have found enclose the pattern into slashes:
/^[a-zA-Z0-9. _äöüßÄÖÜ#-]{1,50}$/
and say, that the global flag would be behind the last slash:
pattern="/^[a-zA-Z0-9. _äöüßÄÖÜ#-]{1,50}$/g"
That doesn't work either.
Mozilla also states in their developer guide (I also read it elsewhere):
No forward slashes should be specified around the pattern text.
So, how can I get the global flag into the pattern attribute of the input element?
There are a couple of facts you should be aware when using pattern attribute regex:
There is no need to use g flag, the whole string must match the regex, and the regex check will only be performed once, a single match is enough
There is no need wrapping the pattern with regex delimiters, and if you add slashes at the start and end, they will be treated as literal slashes making part of the regex pattern, and in 99.9% of cases that would ruin the regex
You do not even need ^ and $ anchors as the pattern regex must match the entire string input. In fact, the pattern is automatically enclosed with ^(?: and )$, so if you use pattern="^\d+$" (just a quick example), the final regex (in Chrome, e.g.) will look like /^(?:^\d+$)$/u, which looks rather redundant.
So, all you need is
pattern="^[a-zA-Z0-9. _äöüßÄÖÜ#-]{1,50}"
// Or even
pattern="^[\w. äöüßÄÖÜ#-]{1,50}"
Note that [A-Za-z0-9_] = \w in JavaScript regex.

data-val-regex-pattern is not working to negate some specfic characters

I am developing a view using html5, I want to validate a VIN field with some particular regex pattern,
So I used data-val-regex-pattern to achieve this.
My validation is to not allow the user to enter i,o,q,I,O,Q he can enter anything in a-zA-Z0-9
So I have written the regex as ^[a-zA-Z0-9&&[^iIoOqQ]]$this regex is not working.
Not working mean when ever I enter ghtygfrt9090 it is saying invalid.
Below is the code:
<input type="text" maxlength="17" data-val-regex-pattern="^[a-zA-Z0-9&&[^iIoOqQ]]$" data-val-regex="VIN is not valid">
Please help !!
The pattern you tried ^[a-zA-Z0-9&&[^iIoOqQ]]$ does not have a quantifier for the character class and if supported will match only a single occurrence of the listed.
Repeating it would look like ^[a-zA-Z0-9&&[^iIoOqQ]]+$
In some regex engines, you could use character class intersection using $$
If it is not supported, you could make use of a negative lookahead:
^(?!.*[iIoOqQ])[a-zA-Z0-9]+$
Regex demo
Another option is to update the ranges excluding the chars
^[a-hj-npr-zA-HJ-NPR-Z]+$
Regex demo

Using Regex to find "<img .../>" and "<script ...> </script>" in HTML string

I am trying to use Regular Expressions for the first time to search for images and scripts in webpages in Scala. The expressions I've come up with are
Images:
/(<img\S+\s+\/>)+/
Scripts:
/(<script\s+\S+><\/script>)+/
I don't really know anything about HTML code or using Regex so I'm not sure what I need in order to specify that it should match <img .../> where the ... could be any amount of characters or whitespace. This is just a small part of a programming assignment I'm writing in Scala and we have to use Regex.
A regex like <img[^>]*> would match <img..........>.
A regex like <script.*?</script> would match a single <script...>...</script> instance. The ? is necessary to prevent it from matching everything from the first <script...> tag to the last </script> tag.
(Feel free to add back in the capturing ( )'s, the \ escapes, and surround with the regex delimiting / / tokens. I removed them to focus on the regular expressions themselves, without the leaning toothpick syndrome and other noise.)
While these are better than the ones you proposed, they will still break in many circumstances. RegEx is not designed to parse HTML.
<script>
<!-- This "</script>" doesn't end the script, but fools the RegEx -->
</script>

Regex pattern not working properly

I'm working on a simple check for my input fields. I got 3 places where I'm validating user-input: javascript regex, html pattern and php regex. The Javascript and PHP part work fine, but my HTML pattern somehow returns an error for every input except blank. I tested it on regexpal.com (regex tester) and it works perfectly fine there, so I reckon I must be doing something wrong.
Here's my regex:
/^[a-zA-Z0-9\!\?\,\.\s]{0,50}$/
I'm trying to allow users to input the following:
Alphabetic characters, including capitals
Numeric characters
Puncation: exclamation(!), question(?), comma(,) and dot(.)
Spaces
Here's how I implement it:
<input type="text" id="name" name="name" aria-required="true" pattern="/^[a-zA-Z0-9\!\?\,\.\s]{0,50}$/" value="loaded value from db">
Please note: I'm allowing 0 characters to be entered because I will check it with PHP, and if the input field(s) is/are empty, a pre-set value will be written to the database.
Basically it should allow users to enter general words or sentences, but somehow it doesn't allow anything. The only way I don't get an "error" is when I leave the inputfield blank. What am I doing wrong? Is my regex wrong? Am I not implementing it correctly? I can provide more code if necessary.
Help is much appreciated!
Thanks in advance.
Try removing the forward slashes (/) from the input's pattern attribute.

regex to ignore duplicate matches

I'm using an application to search this website that I don't have control of right this moment and was wondering if there is a way to ignore duplicate matches using only regex.
Right now I wrote this to get matches for the image source in the pages source code
uses this to retrieve srcs
<span> <img id="imgProduct.*? src="/(.*?)" alt="
from this
<span> <img id="imgProduct_1" class="SmPrdImg selected"
onclick="(some javascript);" src="the_src_I_want1.jpg" alt="woohee"> </span>
<span> <img id="imgProduct_2" class="SmPrdImg selected"
onclick="(some javascript);" src="the_src_I_want2.jpg" alt="woohee"> </span>
<span> <img id="imgProduct_3" class="SmPrdImg selected"
onclick="(some javascript);" src="the_src_I_want3.jpg" alt="woohee"> </span>
the only problem is that the exact same code listed above is duplicated way lower in the source. Is there a way to ignore or delete the duplicates using only regex?
Your pattern's not very good; it's way too specific to your exact source code as it currently exists. As #Truth commented, if that changes, you'll break your pattern. I'd recommend something more like this:
<img[^>]*src=['"]([^'"]*)['"]
That will match the contents of any src attribute inside any <img> tag, no matter how much your source code changes.
To prevent duplicates with regex, you'll need lookahead, and this is likely to be very slow. I do not recommend using regex for this. This is just to show that you could, if you had to. The pattern you would need is something like this (I tested this using Notepad++'s regex search, which is based on PCRE and more robust than JavaScript's, but I'm reasonably sure that JavaScript's regex parser can handle this).
<img[^>]*src=['"]([^'"]*)['"](?!(?:.|\s)*<img[^>]*src=['"]\1['"])
You'll then get a match for the last instance of every src.
The Breakdown
For illustration, here's how the pattern works:
<img[^>]*src=['"]([^'"]*)['"]
This makes sure that we are inside a <img> tag when src comes up, and then makes sure we match only what is inside the quotes (which can be either single or double quotes; since neither is a legal character in a filename anyway we don't have to worry about mixing quote types or escaped quotes).
(?!
(?:
.
|
\s
)*
<img[^>]*src=['"]\1['"]
)
The (?! starts a negative lookahead: we are requiring that the following pattern cannot be matched after this point.
Then (?:.|\s)* matches any character or any whitespace. This is because JavaScript's . will not match a newline, while \s will. Mostly, I was lazy and didn't want to write out a pattern for any possible line ending, so I just used \s. The *, of course, means we can have any number of these. That means that the following (still part of the negative lookahead) cannot be found anywhere in the rest of the file. The (?: instead of ( means that this parenthetical isn't going to be remembered for backreferences.
That bit is <img[^>]*src=['"]\1['"]. This is very similar to the initial pattern, but instead of capturing the src with ([^'"]*), we're referencing the previously-captured src with \1.
Thus the pattern is saying "match any src in an img that does not have any img with the same src anywhere in the rest of the file," which means you only get the last instance of each src and no duplicates.
If you want to remove all instances of any img whose src appears more than once, I think you're out of luck, by the way. JavaScript does not support lookbehind, and the overwhelming majority of regex engines that do wouldn't allow such a complicated lookbehind anyway.
I wouldn't work too hard to make them unique, just do that in the PHP following the preg match with array_unique:
$pattern = '~<span> <img id="imgProduct.*? src="/(.*?)" alt="~is';
$match = preg_match_all($pattern, $html, $matches);
if ($match)
{
$matches = array_unique($matches[1]);
}
If you are using JavaScript, then you'd need to use another function instead of array_unique, check PHPJS:
http://phpjs.org/functions/array_unique:346