How to validate email message-id in google workspace (Gmail) - smtp

I noticed that sometimes when I send a mail to a Gmail account. Gmail reject the message-id and regenerate the id.
You can identify this be by looking at the original message and seeing "SMTPIN_ADDED_BROKEN" string with in it.
For example:
<60f17262.1c69fb81.3e091.249dSMTPIN_ADDED_BROKEN#mx.google.com>
Then my original message-id added in the header "X-Google-Original-Message-ID"
As it makes it harder for me to track those messages at a later stage.
I want to make sure in advance that all my message-id are valid.
I guess that Google is validating the message-id against one of the RFC standards (probably https://datatracker.ietf.org/doc/html/rfc5322#section-3.6.4). Standards tend to be very wide and sometimes vague. Can someone provide more specific rules on how Google workspace validate message-ids.
Here is am example for message id that will be considered valid by the standard but rejected by Gmail
<CAFDdQNjt907n4M"kpM2zaXHA82ZCSppZOc+bYoeKuWkatrSbmw#mail.gmail.com>
=?UTF-8?Q?<762"51b6a8a859d4c2bd7ecd70a3aff811991d6b7#localhost.eu>?=
I guess gmail don't like the "
See also:
regex to validate a message-ID as per RFC2822
gmail is modifying header(Message-ID) of incoming mails
Update
Following the answer from Rafa below.
In rfc2822 you can find the following definition.
msg-id = [CFWS] "<" id-left "#" id-right ">" [CFWS]
id-left = dot-atom-text / no-fold-quote / obs-id-left
id-right = dot-atom-text / no-fold-literal / obs-id-right
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
Looking at DQUOTE for example if I follow the syntax correctly
DQUOTE - DQUOTE( ) finds the first double quote mark in the string and returns all characters from that point, until a second double quote mark is found. If the string does not contain at least two double quote marks, a null string is returned.
So the following should be a valid message id in RFC-2822
<test.a"rfc2822"c.123#message.com>
But it's rejected

Answer:
Gmail uses RFC 2822 rather than RFC 5322 for the message-id in a message header. Make sure to use this specification when generating message-ids.
More Information:
From the documentation on creating messages:
The Gmail API requires MIME email messages compliant with RFC 2822 and encoded as base64url strings.

Related

XSS validation: Is it safe to check only if value contains <> and %3C %3E

I've been doing server side XSS validation. Here is what I found to use:
List of forbidden attributes: javascript:,mocha:,eval(,alert(,vbscript:,livescript:,expression(,url(,&{,&#,/*,*/,onclick,oncontextmenu,ondblclick,onmousedown,onmouseenter,onmouseleave,onmousemove,onmouseover,onmouseout,onmouseup,onkeydown,onkeypress,onkeyup,onblur,onchange,onfocus,onfocusin,onfocusout,oninput,oninvalid,onreset,onsearch,onselect,onsubmit,ondrag,ondragend,ondragenter,ondragleave,ondragover,ondragstart,ondrop,oncopy,oncut,onpaste,ontouchcancel,ontouchend,ontouchmove,ontouchstart,onwheel
However they seem to be too strict since some correct values such as "™" are considered as illegal if I use this list.
I'm thinking if to check only value doesn't contain any of those characters like '<', '>', "%3E", "%3C" would be safe to prevent XSS attack?
You do not need to block any of those characters, long lesson-learned you're just cutting out other language support potentially and you can work with these characters safely.
Imagine we have a chat-box, as this is where these may get used most often in a textarea. The user can pass the following <html></html> and if it wasn't handled properly every user receiving the message will open up a new HTML document inside this chat (scary).
This gets handled on the client side fortunately, some hacks to make it "text-only".
When dealing with login you could regex check username, if that passes you can then SQL search the username if match is made check if password matches too by encrypting the password just given and matching it, no match get out.
I've never needed to block characters or treat anything special knowing proper encoding/decoding (escape practices and whatnot). Maybe this will help your search.

MySQL regex to validate email not working - curly brace quantifier ignored

I'm trying to use the following regular expression to validate emails in a MySQL database:
^[^#]+#[^#]+\.[^#]{2,}$
in a condition like this:
...and email REGEXP '^[^#]+#[^#]+\.[^#]{2,}$'
For the most part, the expression is working. But it's allowing single-character top level domains. For example, both the following emails pass validation:
something#hotmail.com and something#hotmail.c
The second case is clearly a typo. The {2,} of the regex should allow any string of characters other than the # symbol, of length 2 or more, after the dot.
I've run the regex itself through multiple testers running different protocols, (Perl, TCL, etc.) and it works as expected every time, rejecting the single-character TLD version of the email address. It's only when I use this regex in a MySQL context that it fails.
I've checked, and there are no additional characters after the ".c" in the erroneous email address. Is there something inherent to MySQL or this version that could be preventing this from working?
Running MySQL version 5.5.61-cll
You may try using the following regex pattern:
^[^#]+#[^.]+[.][^.]{2,}([.][^.]{2,})*$
The rightmost portion of the pattern means:
[.] match a literal dot
[^.]{2,} followed by a domain component (any 2 or more characters but dot)
([.][^.]{2,})* followed by dot and another component, zero or more times
Demo
So this would match:
jon.skeet#google.com
jon.skeet#google.co.uk
But would not match:
gordonlinoff#blah
By golly we are hackers and we validate stuff! What's more, we understand concepts like ACID compliance, data integrity, and single points of authority. So obviously, we should ensure that no invalid emails enter the DB. What's more? We have such a wonderful tool with which to do it: a proper schema with check constraints!
Unfortunately, emails are one of those details that are notoriously difficult to validate with simple regular expressions. It is possible, sure. No, actually, it's not. None of those links offer 100% compliance.
A much better approach is simply to test the address by sending it an activation email. David Gilbertson explains it far better than I'm going to in a concise SO answer, but the highlights:
Don't even try validate.
Just test the address with an actual email.
For my projects, both personal and professional, this is the regex I use for email address sanity checking prior to sending an activation/confirmation email:
\S+#\S+
This is extremely simple (and yep, still excludes some technically valid email addresses), extremely simple to debug, and works for any legitimate traffic to our sites. (I have yet to see an email address even close to something like #!$%&’*+-/=?^_{}|~#example.com in our logs.)

How does STR-Transform works?

How does the STR-Transform transformation algorithm works for XML Signature, when working with WS Security? I need to sign the SecurityTokenReference used for the signature on a SOAP message, and this is the required transformation for the security token. I am using an x509 certificate to do the signature, so the security token is this certificate. However, in the message I only need the reference to the certificate thumbprint.
Here is the signature structure that I need to replicate, and the only bit that I am missing is the signature reference to the SecurityTokenReference:
<dsig:Signature xmlns:dsig="http://www.w3.org/2000/09/xmldsig#">
<dsig:SignedInfo>
<dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
<dsig:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<dsig:Reference URI="#Timestamp_C1Ih1AB1vpPT5uG2">
<dsig:Transforms>
<dsig:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
</dsig:Transforms>
<dsig:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
<dsig:DigestValue>fVSyToUO8yS131cV8oT1h6fa69Jvtt+pKFeP4BFf1P4=</dsig:DigestValue>
</dsig:Reference>
<!-- Other signature references -->
<dsig:Reference URI="#str_U1sjQ5j8JtKnObLk">
<dsig:Transforms>
<dsig:Transform Algorithm="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#STR-Transform">
<wsse:TransformationParameters>
<dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>
</wsse:TransformationParameters>
</dsig:Transform>
</dsig:Transforms>
<dsig:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/>
<dsig:DigestValue>gRa3zakGn13XISoKpekB3zl0iDqb/LmNy7+aMDtzKIY=</dsig:DigestValue>
</dsig:Reference>
</dsig:SignedInfo>
<dsig:SignatureValue>ptO...E9Q==</dsig:SignatureValue>
<dsig:KeyInfo>
<wsse:SecurityTokenReference
xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"
xmlns:wsse11="http://docs.oasis-open.org/wss/oasis-wss-wssecurity-secext-1.1.xsd"
xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"
wsse11:TokenType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-x509-token-profile-1.0#X509v3"
wsu:Id="str_U1sjQ5j8JtKnObLk">
<wsse:KeyIdentifier
EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary"
ValueType="http://docs.oasis-open.org/wss/oasis-wss-soap-message-security-1.1#ThumbprintSHA1">h5...ow=</wsse:KeyIdentifier>
</wsse:SecurityTokenReference>
</dsig:KeyInfo>
</dsig:Signature>
Can someone explain me how to do the signature for such token? The step-by-step description of the algorithm, or an example using any language/library will be good.
In this document is the description of the transformation, from page 38, but I am unable to understand how to apply it in practice.
OK, after checking Oracle's WebLogic server debug and verbose log file having a working service example and setting the flags -Dweblogic.xml.crypto.dsig.debug=true -Dweblogic.xml.crypto.dsig.verbose=true -Dweblogic.xml.crypto.keyinfo.debug=true -Dweblogic.xml.crypto.keyinfo.verbose=true -Dweblogic.wsee.verbose=* -Dweblogic.wsee.debug=* (more info here, here, here and here), thank God there was a preety good insight on how the security token was de-referenced. Basically, having a SecurityTokenReference for an x509 certificate, with a KeyIdentifier, is dereferenced as a BinarySecurityToken in this way:
<wsse:BinarySecurityToken xmlns="" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" ValueType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-x509-token-profile-1.0#X509v3">CertificateBase64String</wsse:BinarySecurityToken>
Some important things to notice are:
The ValueType, and thus the content of the BinarySecurityToken, is defined by the TokenType of the SecurityTokenReference. In this case, the text of the BinarySecurityToken is the x509 certificate that is referenced by the KeyIdentifier element, encoded as a base64 string.
According to the specification, the BinarySecurityToken only includes the ValueType attribute. So it should not include the EncodingType attribute, neither the Id attribute that the SecurityTokenReference has.
The same namespace prefix of the SecurityTokenReference is used. Also, the namespace for this prefix is included in the tag.
The default namespace attribute is set as empty: xmlns=""
So basically the whole SecurityTokenReference element is replaced by the new BinarySecurityToken, and this is the one to be canonicalized and hashed (to get its digest value). Beware that it is canonicalized and digested as is, so the operation may provide a wrong result if the XML is simplified by removing the empty xmlns namespace or the prefix namespace, or by changing the namespace prefix.
The example BinarySecurityToken is already canonicalized using the algorithm "http://www.w3.org/2001/10/xml-exc-c14n#", so in .NET, to get the DigestValue using the digest algorithm "http://www.w3.org/2001/04/xmlenc#sha256", is enough to do this:
System.Security.Cryptography.SHA256 sha = System.Security.Cryptography.SHA256.Create();
byte[] hash = sha.ComputeHash(Encoding.UTF8.GetBytes("<wsse:BinarySecurityToken xmlns=\"\" xmlns:wsse=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd\" ValueType=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-x509-token-profile-1.0#X509v3\">MIIF...2A8=</wsse:BinarySecurityToken>"));
string digestValue = Convert.ToBase64String(hash);

Actionscript 3.0 Reg Exp Find first URL, Ignore Email

I've got a bit of code where I have a loop with a string.search() to parse a string of HTML. The purpose is to seek out any valid URLs and surround each one with HREF tags appropriately while ignoring anything else, like email addresses. The problem is no matter how I modify the regular expression, it either kicks back the part of the email address after the # sign and highlights it or it highlights or ignores everything.
An example string would be:
"</span><span class='blue'>Weaselgrease:</span><span class='magenta'>weaselgrease#weasel.grs vs weaselgrease.weasel.grs</span><span class='blue'> [12:41:33 AM]</span>"
Where 'weaselgrease.weasel.grs' would be identified as a proper URL and 'weaselgrease#weasel.grs' would be ignored.
The code I have currently is /([fh]t{1,2}ps?:\/\/)?[\w-]+\.\w{2,4}/
I know it's rather simple, but it doesn't need to be complex yet.
I've tried a conditional and gotten nowhere. I may just be missing something, but my searching and even playing legos with http://regex101.com/ has gotten me no closer.
Ultimately I'm going to have it do the following:
Identify a valid URL's index in the string
Ignore if it's an email
Ignore if it's just an IP address (no prepending http:// and no trailing slash)
But I'd be happy with just an inkling of help on what I need to do to get it to ignore email addresses.
URL without proper protocol (i.e. http, https, ftp) cannot be validated as such, because this means that almost everything that has . (dot) in it is a valid url.
So there is not a way to properly check if it's url or e-mail if you don't use the protocol. Example:
end of sentence.New sentence -> sentence.New is valid url in your case
weaselgrease#weasel.grs -> everything before # is ignored and weasel.grs is valid url

Why is this %2B string being urldecoded?

[This may not be precisely a programming question, but it's a puzzle that may best be answered by programmers. I tried it first on the Pro Webmasters site, to overwhelming silence]
We have an email address verification process on our website. The site first generates an appropriate key as a string
mykey
It then encodes that key as a bunch of bytes
&$dac~ʌ����!
It then base64 encodes that bunch of bytes
JiRkYWN+yoyIhIQ==
Since this key is going to be given as a querystring value of a URL that is to be placed in an HTML email, we need to first URLEncode it then HTMLEncode the result, giving us (there's no effect of HTMLEncoding in the example case, but I can't be bothered to rework the example)
JiRkYWN%2ByoyIhIQ%3D%3D
This is then embedded in HTML that is sent as part of an email, something like:
click here.
Or paste <b>http://myapp/verify?key=JiRkYWN%2ByoyIhIQ%3D%3D</b> into your browser.
When the receiving user clicks on the link, the site receives the request, extracts the value of the querystring 'key' parameter, base64 decodes it, decrypts it, and does the appropriate thing in terms of the site logic.
However on occasion we have users who report that their clicking is ineffective. One such user forwarded us the email he had been sent, and on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into your browser.
That is, the %2B string - but none of the other percentage encoded strings - had been converted into a plus. (It's definitely leaving us with the right values - I've looked at the appropriate SMTP logs).
key=JiRkYWN%2ByoyIhIQ%3D%3D
key=JiRkYWN+yoyIhIQ%3D%3D
So I think that there are a couple of possibilities:
There's something I'm doing that's stupid, that I can't see, or
Some mail clients convert %2b strings to plus signs, perhaps to try to cope with the problem of people mistakenly URLEncoding plus signs
In case of 1 - what is it? In case of 2 - is there a standard, known way of dealing with this kind of scenario?
Many thanks for any help
The problem lies at this step
on inspection the HTML had been transformed into (to put it in terms of the example above)
click here
Or paste <b>http://myapp/verify?key=JiRkYWN+yoyIhIQ%3D%3D</b> into
your browser.
That is, the %2B string - but none of the other percentage encoded
strings - had been converted into a plus
Your application at "the other end" must be missing a step of unescaping. Regardless of if there is a %2B or a + a function like perls uri_unescape returns consistent answers
DB<9> use URI::Escape;
DB<10> x uri_unescape("JiRkYWN+yoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
DB<11> x uri_unescape("JiRkYWN%2ByoyIhIQ%3D%3D")
0 'JiRkYWN+yoyIhIQ=='
Here is what should be happening. All I'm showing are the steps. I'm using perl in a debugger. Step 54 encodes the string to base64. Step 55 shows how the base64 encoded string could be made into a uri escaped parameter. Steps 56 and 57 are what the client end should be doing to decode.
One possible work around is to ensure that your base64 "key" does not contain any plus signs!
DB<53> $key="AB~"
DB<54> x encode_base64($key)
0 'QUJ+
'
DB<55> x uri_escape('QUJ+')
0 'QUJ%2B'
DB<56> x uri_unescape('QUJ%2B')
0 'QUJ+'
DB<57> $result=decode_base64('QUJ+')
DB<58> x $result
0 'AB~'
What may be happening here is that the URLDecode is turning the %2b into a +, which is being interpreted as a space character in the URL. I was able to overcome a similar problem by first urldecoding the string, then using a replace function to replace spaces in the decoded string with + characters, and then decrypting the "fixed" string.