Use of Parentheses in HTML with "href=tel:" - html

Surprised I can't find a definitive answer about this anywhere online: I am setting up a translated HTML page in French with a different contact number that begins with "+33 (0)." Since I can't personally test it on this number -- a canonical question: can I get away with an anchor tag that begins <a href="tel:+33(0)..." i.e. has a number contained in parentheses with the remaining numbers following and have the link work?

Good question. Finding a clear, authorative answer regarding href="tel:" specifically is difficult.
RFC 3986 (Section 2.2) defines parenthesis as "reserved sub-delims". This means that they may have special meaning when used in certain parts of the URL. The RFC says:
URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.
(Emphasis mine)
Basically, you can use any character in the US-ASCII character set in a URL. But, in some situations, parentheses are reserved for specific uses, and in those cases, they should be percent-encoded. Otherwise they can be left as is.
So, yes, you can use parentheses in href="tel:" links and they should work across all browsers. But as with any web standard in the real world, performance relies on each browser correctly implementing that standard.
However, regarding your example (<a href="tel:+33(0)...), I would steer clear of the format you have given, that is:
[country code]([substituted leading 0 for domestic callers])[area code][phone number]
While I was unable to find a definitive guide to how browsers handle such cases, I think you will find, as #DigitalJedi has pointed out, that some (perhaps all?) browsers will strip the parentheses and leave the number contained therein, ultimately resulting in an incorrect number, e.g.
+33 (0) 123 456 7890
...which may result in a call to +3301234567890.
Will this still work? Maybe? We're getting into phone number routing territory now.
Some browsers/devices may be smart enought to figure out what is intended and adapt accordingly, but I would play it safe and instead simply use:
[country code][area code][phone number], e.g.
+33 123 456 7890
or
(0) 123 456 7890
There is no downside (that I know of) to having your local users dialing the international country code - it will result in the same thing as if they had omitted it and substituted the leading zero.
As a side note, according to the ITU's (International Telecommunications Union) E.123 document, section 7.2,
The ( ) should not be used in an international number.
This recommendation concerns how phone numbers are written, but is of some relevance in terms of the text that should be used when creating an href="tel:" link, and is the reason for the two alternative examples I have provided above.
(Credit to #NiKiZe for this semi-related info).
Finally, here is some semi-related, useful information regarding browser treatment of telephone links: https://css-tricks.com/the-current-state-of-telephone-links/

The () in the href attribute should not be a problem: (href-wise)
http://example.com/test(1).html
HOWEVER, if I do a href="tel:+000 (1) 000 000", after clicking the link on phone, the dial on my phone will show +0001000000 This is tested and confirmed on a Android device
This means that the parentheses are removed, as well as the spaces.
But this could still vary because of different OS on the phone.
P.S.:
If you think the + in your number is an issue... I did test this too, and the + does not have any unexpected behavior.
MORE: href="tel:" and mobile numbers

According to ITU-T E.123 Section 7.2 Use of parentheses
The ( ) should not be used in an international number.
So (0) should not be included at all.

I read and remember the correct format should be:
+33.1 23 45 67 89
But I can't find where from.

Related

MySQL regex to validate email not working - curly brace quantifier ignored

I'm trying to use the following regular expression to validate emails in a MySQL database:
^[^#]+#[^#]+\.[^#]{2,}$
in a condition like this:
...and email REGEXP '^[^#]+#[^#]+\.[^#]{2,}$'
For the most part, the expression is working. But it's allowing single-character top level domains. For example, both the following emails pass validation:
something#hotmail.com and something#hotmail.c
The second case is clearly a typo. The {2,} of the regex should allow any string of characters other than the # symbol, of length 2 or more, after the dot.
I've run the regex itself through multiple testers running different protocols, (Perl, TCL, etc.) and it works as expected every time, rejecting the single-character TLD version of the email address. It's only when I use this regex in a MySQL context that it fails.
I've checked, and there are no additional characters after the ".c" in the erroneous email address. Is there something inherent to MySQL or this version that could be preventing this from working?
Running MySQL version 5.5.61-cll
You may try using the following regex pattern:
^[^#]+#[^.]+[.][^.]{2,}([.][^.]{2,})*$
The rightmost portion of the pattern means:
[.] match a literal dot
[^.]{2,} followed by a domain component (any 2 or more characters but dot)
([.][^.]{2,})* followed by dot and another component, zero or more times
Demo
So this would match:
jon.skeet#google.com
jon.skeet#google.co.uk
But would not match:
gordonlinoff#blah
By golly we are hackers and we validate stuff! What's more, we understand concepts like ACID compliance, data integrity, and single points of authority. So obviously, we should ensure that no invalid emails enter the DB. What's more? We have such a wonderful tool with which to do it: a proper schema with check constraints!
Unfortunately, emails are one of those details that are notoriously difficult to validate with simple regular expressions. It is possible, sure. No, actually, it's not. None of those links offer 100% compliance.
A much better approach is simply to test the address by sending it an activation email. David Gilbertson explains it far better than I'm going to in a concise SO answer, but the highlights:
Don't even try validate.
Just test the address with an actual email.
For my projects, both personal and professional, this is the regex I use for email address sanity checking prior to sending an activation/confirmation email:
\S+#\S+
This is extremely simple (and yep, still excludes some technically valid email addresses), extremely simple to debug, and works for any legitimate traffic to our sites. (I have yet to see an email address even close to something like #!$%&’*+-/=?^_{}|~#example.com in our logs.)

Regex getting the tags from an <a href= ...> </a> and the likes

I've tried the answers I've found in SOF, but none supported here : https://regexr.com
I essentially have an .OPML file with a large number of podcasts and descriptions.
in the following format:
<outline text="Software Engineering Daily" type="rss" xmlUrl="http://softwareengineeringdaily.com/feed/podcast/" htmlUrl="http://softwareengineeringdaily.com" />
What regex I can use to so I can just get the title and the link:
Software Engineering Daily
http://softwareengineeringdaily.com/feed/podcast/
Brief
There are many ways to go about this. The best way is likely using an XML parser. I would definitely read this post that discusses use of regex, especially with XML.
As you can see there are many answers to your question. It also depends on which language you are using since regex engines differ. Some accept backreferences, whilst others do not. I'll post multiple methods below that work in different circumstances/for different regex flavours. You can probably piece together from the multiple regex methods below which parts work best for you.
Code
Method 1
This method works in almost any regex flavour (at least the normal ones).
This method only checks against the attribute value opening and closing marks of " and doesn't include the possibility for whitespace before or after the = symbol. This is the simplest solution to get the values you want.
See regex in use here
\b(text|xmlUrl)="[^"]*"
Similarly, the following methods add more value to the above expression
\b(text|xmlUrl)\s*=\s*"[^"]*" Allows whitespace around =
\b(text|xmlUrl)=(?:"[^"]*"|'[^']*') Allows for ' to be used as attribute value delimiter
As another alternative (following the comments below my answer), if you wanted to grab every attribute except specific ones, you can use the following. Note that I use \w, which should cover most attributes, but you can just replace this with whatever valid characters you want. \S can be used to specify any non-whitespace characters or a set such as [\w-] may be used to specify any word or hyphen character. The negation of the specific attributes occurs with (?!text|xmlUrl), which says don't match those characters. Also, note that the word boundary \b at the beginning ensures that we're matching the full attribute name of text and not the possibility of other attributes with the same termination such as subtext.
\b((?!text|xmlUrl)\w+)="[^"]*"
Method 2
This method only works with regex flavours that allow backreferences. Apparently JGsoft applications, Delphi, Perl, Python, Ruby, PHP, R, Boost, and Tcl support single-digit backreferences. Double-digit backreferences are supported by JGsoft applications, Delphi, Python, and Boost. Information according this article about numbered backreferences from Regular-Expressions.info
See regex in use here
This method uses a backreference to ensure the same closing mark is used at the start and end of the attribute's value and also includes the possibility of whitespace surrounding the = symbol. This doesn't allow the possibility for attributes with no delimiter specified (using xmlUrl=http://softwareengineeringdaily.com/feed/podcast/ may also be valid).
See regex in use here
\b(text|xmlUrl)\s*=\s*(["'])(.*?)\2
Method 3
This method is the same as Method 2 but also allows attributes with no delimiters (note that delimiters are now considered to be space characters, thus, it will only match until the next space).
See regex in use here
\b(text|xmlUrl)\s*=\s*(?:(["'])(.*?)\2|(\S*))
Method 4
While Method 3 works, some people might complain that the attribute values might either of 2 groups. This can be fixed by either of the following methods.
Method 4.A
Branch reset groups are only possible in a few languages, notably JGsoft V2, PCRE 7.2+, PHP, Delphi, R (with PCRE enabled), Boost 1.42+ according to Regular-Expressions.info
This also shows the method you would use if backreferences aren't possible and you wanted to match multiple delimiters ("([^"])"|'([^']*))
See regex in use here
\b(text|xmlUrl)\s*=\s*(?|"([^"]*)"|'([^']*)'|(\S*))
Method 4.B
Duplicate subpatterns are not often supported. See this Regular-Expresions.info article for more information
This method uses the J regex flag, which allows duplicate subpattern names ((?<v>) is in there twice)
See regex in use here
\b(text|xmlUrl)\s*=\s*(?:(["'])(?<v>.*?)\2|(?<v>\S*))
Results
Input
<outline text="Software Engineering Daily" type="rss" xmlUrl="http://softwareengineeringdaily.com/feed/podcast/" htmlUrl="http://softwareengineeringdaily.com" />
Output
Each line below represents a different group. New matches are separated by two lines.
text
Software Engineering Daily
xmlUrl
http://softwareengineeringdaily.com/feed/podcast/
Explanation
I'll explain different parts of the regexes used in the Code section that way you understand the usage of each of these parts. This is more of a reference to the methods above.
"[^"]*" This is the fastest method possible (to the best of my knowledge) to grabbing anything between two " symbols. Note that it does not check for escaped backslashes, it will match any non-" character between two ". Whilst "(.*?)" can also be used, it's slightly slower
(["'])(.*?)\2 is basically shorthand for "(.*?)"|'(.*?)'. You can use any of the following methods to get the same result:
(?:"(.*?)"|'(.*?)')
(?:"([^"])"|'([^']*)') <-- slightly faster than line above
(?|) This is a branch reset group. When you place groups inside it like (?|(x)|(y)) it returns the same group index for both matches. This means that if x is captured, it'll get group index of 1, and if y is captured, it'll also get a group index of 1.
For simple HTML strings you might get along with
Url=(['"])(.+?)\1
Here, take group $2, see a demo on regex101.com.
Obligatory: consider using a parser instead (see here).

What data format is this?

I was checking one share trading site's AJAX response and below is what it showed up in Firebug Response tab of XHR section. Can anyone explain me what format is this and how is it parsed ?
<ST=tat>
<SI=0>
<TB=txtSearch>
<560v=Tata Motors Ltdv=TATMOT>
<566v=Tata Steel Ltdv=TATSTE>
<3199v=Ashram Online.com Ltdv=ASHONL>
<4866v=Kreon Finnancial Services Ltdv=KREFIN>
<552v=Tata Chemicals Ltdv=TATCHE>
<554v=Tata Power Company Ltdv=TATPOW>
<2986v=Tata Metaliks Ltdv=TATMET>
<300v=Tata Sponge Iron Ltdv=TATSPO>
<121v=Tata Coffee Ltdv=TATCOF>
<2295v=Tata Communications Ltdv=TATCOM>
<0v=Time In Milli-Secondsv=0>
I think what we are dealing with here is some proprietary format, likely an Eldricht SGML Horror of some sort.
Banking in general has all sorts of Eldricht horrors running about.
On a related note, this is very much not XML.
Edit:
A quick analysis* indicates that this is a format consisting of a series of statements bracketed by <>; with the parts of the statements separated by = or v=. = seems to indicate a parameter to a control statement, indicated by a two-letter code. (<ST=tat>), while v= seems to indicate an assignment or coupling of some kind (short for "value"?), or perhaps just a field separator.
<ST appears to be short for "search term"; <TB appears to be short for "(source) table". The meaning of <SI eludes me. It is possible that <TB terminates the metadata section, but it's equally possible that the metadata section has a fixed number of terms.
As nothing refers to the number of fields in each statement in the data section, and they are all of the same length (3 fields), it is likely that the number of fields is fixed, but it might derive from the value of <TB, or even <SI, in some way.
What is abundantly clear, however, is that this data is not intended for consumption by other applications than the one that supplies it.
*Caveat: Without a much larger sample it's impossible to tell if this analysis is valid.
It is not a commonly used "web format".
It is probably a proprietary format used by that site and will be parsed by their custom JavaScript.

PDF Open Parameters: comment=commentID doesn't work

According to Adobe's Manual on PDF Open Parameters PDF files can be opened with certain parameters from command line or from a link in HTML.
These open Parameters include page=pagenum, zoom=scale, comment=commentID and others (the first parameter should be preceded with a # and the next should be preceded with a &
The official PDF Open Parameters from adobe gives this example:
#page=1&comment=452fde0e-fd22-457c-84aa-2cf5bed5a349
but the comment part doesn't work for me!
page=pagenum and zoom=scale work for me well. But comment=commentID does not work. I tried on Adobe reader 6.0.0 and Adobe Pro Extended 9.0.0: I can't get to the specified comment.
Also, I get the comment ID by exporting the comments in XFDF format and in the resulting file, there is a name attribute for every comment that I hope corresponds to the ID (well, the appearance looks like the example in the manual).
I thought maybe there is a setting that I should first enable (or maybe disable in adobe) or maybe I am getting the comment IDs wrong, or maybe something else?!
Any help would be extremely appreciated
According to the docs, you must include a page=X along with your comment=foo. Your copied sample has it, but it's copied from the docs, not something you did yourself.
Are you missing a page= when setting comment?
BASTARDS!
From the last page of the manual you linked:
URL Limitations
●Only one digit following a decimal point is retained for float values.
●Individual parameters, together with their values (separated by & or #), can be no greater then 32 characters in length.
Emphasis added.
The comment ID is a 16-byte value expressed as hex, with four hyphens thrown in to break up the monotony. That's 36 characters right there... starting with "comment=" adds another 8 characters. 44 characters total.
According to that, a comment ID can NEVER WORK, including the samples they show in their docs.
Are you just trying it on the command line, or have you tried via a web browser too? I wonder if that makes a difference. If not, we're looking at a feature that CANNOT WORK. EVER... and probably never has.

How to seperate an address string mashed together in MySQL

I have an address string in MySQL that has been mashed together from the source. I think it is possible to use a regular expression or some other method to seperate the string into usable parts in MySQL, but I am not aware of how this could be acheived.
Basically each string looks something like these examples (I have added a marker to the top to show what each bit is):
<-------------><-------><-><-->
123 Fake StreetRESERVOIRVIC3001
<-----------------><--------------------><------><-><-->
Brooks Nursing Home123 Little Fake StreetSMITHTONNSW2001
<-------------------><-------------------><--- ><><-->
Grange Police StationShop 1 Fairytale LaneGRANGEWA8001
The address supposed to be broken up into optionally two lines of address information, suburb, state and post code. I'm in Australia so the state will be either NSW,VIC,QLD,WA,SA,NT or ACT and the postcode will always be a 4 digit number at the very end.
The possible ways to break it up are that the suburb will always be capitalised, the state and postcode will be predicatable within the last 6 or 7 characters (depending on state) and the first two lines of address information will be broken up by a change in case with no space character in between.
I have some 100,000 records like this, so to go through and do it by hand would be very time consuming. Any help on a way of doing this programatically would be much appreciated.
With no spaces? Most gross...
MySQL doesn't have the tools to deal with that, so you'll have to access the database with an external program. I tend to use Perl for manipulations like this.
Start from the end and work backwards... we know the last four should be digits, and the letters preceding that one of 7 options. Use that knowledge and you'll be down 2 fields and 6-7 characters.
It looks like your example now has a town in all capital letters at the end... Parse out that, and it should match to the state and area code. I'm certain you can find a database of zip codes within some minutes online.
With the name and street address remaining, that will have some variability to it, and I wish you a bit of luck there. You may have a head-start with being able to concentrate on the lack of a space between a lowercase and capital, or a letter and number as a breaking point.
Challenge accepted. I'll even throw in some basic punctuation to allow for "101 St. Mark's St." and the like.
/^(([\w\'\.](?=[a-z \'\.])| )+[a-z\'\.])?(([\w\'\.](?=[a-z \d\'\.])| )+[a-z\.\'])([A-Z]+)(NSW|VIC|QLD|WA|SA|NT|ACT)(\d{4})/
Could probably use a little more clean-up, but it should work in any language which supports basic regex with lookahead (some implementations, like JavaScript's and (I think) Ruby's, support lookahead, but not lookbehind). (That, and this puzzle kept me up well past my bed time.) At the very least, it worked on the three examples you provided.
By the way, 2problems.com is a great site for quickly testing regular expressions. It's what I used to work this puzzle out. The guy who built it must have been a real genius. (koff koff)
Rubular is another good option, though since it works by making Ajax calls to a Ruby script behind-the-scenes, it's a bit slower. It does have the nice feature of being able to link to entered patterns and haystacks, though; here's this pattern on Rubular. The 2problems guy really should get around to implementing something like that some day.