How do I stop Gmail from stripping the values out of URLs? - html

I recently learned that webmail clients like Gmail will do alterations on HTML emails, for example adding target="_blank" to <a> tags.
I've also discovered that other alterations happen as well. When I send an HTML email to Gmail (and possibly other web mail clients) from my PHP script, variable values included in the URL of any links are being stripped out. So, for example, this is the value I'm setting in my PHP code:
$mailContent = '<p><a target="_blank" href="https://example.com/confirmation.html?verification=x1x1x1x1x1x1x1x&email=yyyy#email.com">click here to go to the web site and activate your account!</a></p>';
But when the email is received in Gmail, the HTML code comes out like this:
<p><a target="_blank" href="https://example.com/confirmation.html?verification=&email=">click here to go to the web site and activate your account!</a></p>
The values x1x1x1x1x1x1x1x and yyyy#email.com have been stripped out from within the <a> tag.
How do I protect the values of the variables that I want to pass to the URL so that Gmail won't remove them?

Click View original/source on the message in Gmail to see if the URLs looks like they should then. If so you know that the problem is how Gmail is formatting the message for your viewing. If it's mutilated even in the source I was wondering if there's anything in your webpage/php/CMS (do you use one) that changes the code.
You should try URL-encoding as #Crisp said. Here's the W3 reference.

Emailing in html uses Quoted-printable Encoding. The problem with your $mailContent is that the "=" must be represented by =3D
Try adding this:
$mailContent = quoted_printable_encode($mailContent);

This may not be the perfect answer, but if your application allows for it, I have used URL shorteners a number of times.
http://goo.gl/ is my preferred because the API is super easy to implement and google is very fast. I have a function in a class and I just run my url through it and send the return wherever I need it to be.

Another non-perfect answer here but, my problem was that I was including an http url in the html body and apparently is not valid so I changed them to https. This was on a dev environment so no problem on production.
Here is more info about this:
Any URL's in the body of the mail which lead to insecure sites may also need to be removed. Use https://transparencyreport.google.com/safe-browsing/search to validate these links.. All links should be correctly prefixed with "https". https://en.wikipedia.org/wiki/HTTPS Google seem to be rejecting "http". Sometimes, but not always, removing links from any signature can help.

Related

What is the difference between these URL syntax?

I was sent a hyperlink to a Tableau Public link by a client. When I tried opening it, I got a 404 exception. I wrote back to the client but was told by the same that the link was working fine. I visited his profile page and was able to open the presentation there, but the URL that ended up working was slightly different than the one behind the original, non-functioning link.
Here's the anonymized URL behind the original link
https://public.tableau.com/profile/[client_name]%23!/vizhome/Project-AirportDelay/FlightPerformancesinUSA?publish=yes
And here's the URL via the profile page:
https://public.tableau.com/profile/[client_name]#!/vizhome/Project-AirportDelay/FlightPerformancesinUSA
The only differences I see are ?publish=yes and %23!. I tried appending the former, ?publish=yes, to the working URL, and it was still functional. So I suspect that it has to do with the other difference %23! vs. #!. Could the first work because he is opening it from his computer where he is likely logged onto Tableau Public? What's the difference between these syntax? Any ideas about why the original hyperlink might not be functional?
For obvious privacy reasons, I can't provide the whole URL.
It looks like the basic URL pattern for passing filters ?publish=yes
and
%23 is the URL encoded representation of #
The first # after the authority component starts the fragment component. If the # should be part of the path component or the query component, it has to be percent-encoded as %23.
As # is a reserved character, these URIs aren’t equivalent:
http://example.com/foo#bar
http://example.com/foo%23bar
There are countless ways how a URI reference could become erroneous. The culprit is often a software, like a word processor, where someone pastes the correct URI, and the software incorrectly percent-encodes it (maybe assuming that the user didn’t paste the real/correct URI).
Copy-pasting the URI from the browser address bar into a plain text document should always work correctly.

Hide Email Address from Bots - Keep mailto:

tl;dr
Hide email address from bots without using scripts and maintain mailto: functionality. Method must also support screen-readers.
Summary
Email obfuscation without using scripts or contact forms
Email address needs to be completely visible to human viewers and maintain mailto: functionality
Email Address must not be in image form.
Email address must be "completely" hidden from spam-crawlers and spam-bots and any other harvester type
Desired Effect:
No scripts, please. There are no scripts used in the project and I'd like to keep it that way.
Email address is either displayed on the page or can be easily displayed after some sort of user interaction, like opening a modal.
The user can click on on the email address which in turn would trigger the mailto: functionality.
Clicking the email will open the user's email application.
In other words, mailto: functionality must work.
The email address in not visible or not identified as an email address to bots (This includes the page source)
I don't have an inbox that's full of spam
What does NOT Work
Adding a contact form - or anything similar - instead of the email address
I hate contact forms. I rarely fill up a contact form. If there's no email address, I look for a phone number, and if that's not there, I start looking for an alternative service. I would only fill up a contact form if I absolutely have to.
Replacing the address with an image of the address
This creates a HUGE disadvantage to someone using a screenreader (please remember the visually impaired in your future projects)
It also removes the mailto: functionality unless you make the image clickable and then add the mailto: functionality as the href for the link, but that defeats the purpose and now the email is visible to bots.
What might work:
Clever usage of pseudo-elements in CSS
Solutions that make use of base64 encoding
Breaking up the email address and spreading the parts across the document then putting them back together in a modal when the user clicks a button (This will probably involve multiple CSS classes and the usage of anchor tags)
Alterting html attributes via CSS
#MortezaAsadi gracefully brought up the possibility in the comments below. This is the link to the full - The article is from 2012:
What if We Could Use CSS to Alter HTML Attributes?
Other creative solutions that are beyond my scope of knowledge.
Similar Questions / Fixes
JavaScript: Protect your email address by Joe Maller
(This a great fix suggested by Joe Maller, it works well but it's script based. Here's what it looks like;
<SCRIPT TYPE="text/javascript">
emailE = 'example.com'
emailE = ('yourname' + '#' + emailE)
document.write('' + emailE + '')
</script>
<NOSCRIPT>
Email address protected by JavaScript
</NOSCRIPT>
Looking for a PHP only email address obfuscator function
(A Clever solution using both PHP and CSS to first reverse the email using PHP then reverse it back with CSS) A very promising solution that Works great! But it's too easy to solve.
Is it worth obfuscating email addresses on the web these days?
(JavaScript fix)
Best way to obfuscate an e-mail address on a website?
The selected answer works. It actually works really well. It involves encoding the email as html entities. Can it be improved?
Here's what it looks like;
<A HREF="mailto:
yourname#domain.com">
yourname#domain.com
</A>
Does e-mail address obfuscation actually work?
(The selected answer to this SuperUser question is great and it presents a study of the amount of spam received by using different obfuscation methods.
It seems that manipulating the email address with CSS to make it rtl does work. This is the same method used in the first question I linked to in this section.
I am uncertain what effects adding mailto: functionality to the fix would have on the results.
There are also many other questions on SO which all have similar answers. I have not found anything that fits my desired effect
The Question:
Would it be possible to increase the efficiency (ie as little spam as possible) of the email obfuscation methods above by combining two or more of the fixes (or even adding new fixes) while:
A- Maintaining mailto: functionality; and
B- Supporting screen-readers
Many of the answers and comments below pose a very good question while indicating the impossibility of doing this without some sort of js
The question that's asked/implied is:
Why not use js?
The answer is that I am allergic to js
Joking aside though,
The three main reasons I asked this question are:
Contact forms are becoming more and more accepted as a replacement
for providing an email address - which they should not.
If it can be done without scripting then it should be done without
scripting.
Curiosity: (as I am in fact using one of the js fixes currently) I wanted to see if discussing the matter would lead to a better way of doing it.
The issue with your request is specifically the "Supporting screen-readers", as by definition screen readers are a "bot" of some sort. If a screen-reader needs to be able to interpret the email address, then a page-crawler would be able to interpret it as well.
Also, the point of the mailto attribute is to be the standard of how to do email addresses on the web. Asking if there is a second way to do that is sort of asking if there is a second standard.
Doing it through scripts will still have the same issue as once the page is loaded, the script would have been run and the email address rendered in the DOM (unless you populate the email address on click or something). Either way, screen readers will still have issues with this since it's not already loaded.
Honestly, just get an email service with a half decent spam filter and specify a default subject line that is easy for you to sort in your inbox.
Email me
What you're asking for is if the standard has two ways to do something, one for bots and the other for non-bots. The answer is it doesn't, and you have to just fight the bots as best you can.
Defeating email bots is a tough one. You may want to check out the Email Address Harvesting countermeasures section on Wikipedia.
My back-story is that I've written a search bot. It crawled 105,000+ URLs during it's initial run many years ago. From what I've learned from doing that is that web crawling bots literally see EVERYTHING that is text, which appears on a web page. Bots read everything except images.
Spam can't be easily stopped via code for these reasons:
CSS & JS are irrelevant when using the mailto: tag. Bots specifically look at HTML pages for that "mailto:" keyword. Everything from that colon to the next single quote or double quote (whichever comes first) is seen as an email address. HTML entity email addresses - like the example above - can be quickly translated using a reverse ASCII method/function. Running the JavaScript code snippet above, quickly turns the string which starts with: your... into... yourname#example.com. (My search bot threw away hrefs with mailto:email addresses, as I wanted URLs for web pages & not email addresses.)
If a page crashes a bot, the bot author will tune the bot to fix the crash with that page in mind, so that the bot won't crash at that page again in the future. Thus making their bot smarter.
Bot authors can write bots, which generate all known variations of email addresses... without crawling pages & never using any starter email addresses. While it may not be feasible to do that, it's not inconceivable with today's high-core count CPUs (which are hyper-threaded & run at 4+ GHz), plus the availability of using distributed cloud-based computing & even super computers. It's conceivable that someone can now create a bot-farm to spam everyone, without knowing anyone's email address. 20 years ago, that would have been incomprehensible.
Free email providers have had a history of selling their free user accounts to their advertisers. In the past, simply signing up for a free email account automatically guaranteed them a green light to start delivering spam to that email address... without ever using that email address online. I've seen that happen multiple times, with famous company names. (I won't mention any names.)
The mailto: keyword is part of this IETF RFC, where browsers are built to automatically launch the default email clients, from links with that keyword in them. JavaScript has to be used to interrupt that application launching process, when it happens.
I don't think it's possible to stop 100% of spam while using traditional email servers, without using filters on the email server and possibly using images.
There is one alternative... You can also build a chat-like email client, which runs internally on a website. It would be like Facebook's chat client. It's "kind of like email", but not really email. It's simply 1-to-1 instant messaging with an archiving feature... that auto-loads upon login. Since it has document attachment + link features, it works kind of like email... but without the spam. As long as you don't build an externally accessible API, then it's a closed system where people can't send spam into it.
If you're planning to stick with strictly traditional email, then your best bet may be to run something like Apache's SpamAssassin on a company's email server.
You can also try combining multiple strategies as you've listed above, to make it harder for email harvesters to glean email addresses from your web pages. They won't stop 100% of the spam, 100% of the time... while also allowing 100% of the screen readers to work for blind visitors.
You've created a really good starting look at what's wrong with traditional email! Kudos to you for that!
A good screen reader is JAWS from Freedom Scientific. I've used that before to listen to how my webpages are read by blind users. (If you hear a male voice reading both actions [like clicking on a link] & text, try changing 1 voice to female so that 1 voice reads actions & another reads text. That makes it easier to hear how the web page is read for the visually impared.)
Good luck with your Email Address Harvesting countermeasure endeavours!
Here is an approach that does make use of JavaScript, but with a rather small foot-print. It's also very "ghetto", and generally I would not recommend an approach with inline JS in the HTML except you have an extreme reluctance to use JS, at all.
<a
href="#"
data-contact="bGUtZW1haWxAdGhlLWRvbWFpbi5jb20="
data-subj="QW4gQW1hemluZyBTdWJqZWN0"
onfocus="this.href = 'mailto:' + atob(this.dataset.contact) + '?subject=' + atob(this.dataset.subj || '')"
>
Send an email
</a>
data-contact is the base64 encoded email address. And, data-subj is an optional base64 encoded subject.
The main challenge with doing this without JS is that CSS can't alter HTML attributes. (The article you linked is a "pie-in-the-sky" musing and does not have any bearing on what is possible today or in the near future.)
The HTML entities approach you mentioned, or some variation of it, is likely the simplest option that will have some efficacy. Additionally, the iframe approach is clever and the server redirect approach is pretty awesome. But, all three are vulnerable to bots:
The HTML entities just need to be converted (and detecting that is simple)
The document referenced by the iframe might simply be followed
The server redirect might simply be followed, as well
With the approach outlined above, the use of a base64 encoded email address in a data-contact attribute is very "one-off" – as long as the scraper is not specifically designed for your site, it should work.
Simple + Lot of # + Editable without tools
<a href="mailto:user#domain##com"
onmouseover="this.href=this.href.replace('##','.')">
Send email
</a>
Have you considered using google's recaptcha mailhide?
https://www.google.com/recaptcha/admin#mailhide
The idea is that when a user clicks the checkbox (see nocaptcha below), the full e-mail address is displayed.
While recaptcha is traditionally not only hard for screen readers but also humans as well, with the roleout of google's nocaptcha recaptcha which you can read about
here as they relate to accessibility tests. It appears to show promise with to screen readers as it renders as a traditional checkbox from their view.
Example #1 - Not secure but for easy illustration of the idea
Here is some code as an example without using mailhide but implementing something using recaptcha yourself: https://jsfiddle.net/43fad8pf/36/
<div class="container">
<div id="recaptcha"></div>
</div>
<div id="email">
Verify captcha to get e-mail
</div>
function createRecaptcha() {
grecaptcha.render("recaptcha", {sitekey: "6LcgSAMTAAAAACc2C7rc6HB9ZmEX4SyB0bbAJvTG", theme: "light", callback: showEmail});
}
createRecaptcha();
function showEmail() {
// ideally you would do server side verification of the captcha and then the server would return the e-mail
document.getElementById("email").innerHTML = "email#example.com";
}
Note: In my example I have the e-mail in a JavaScript function. Ideally you would have the recaptcha validated on the server end, and return the e-mail, otherwise the bot can simply get it in the code.
Example #2 - Server side validation and returning of e-mail
If we use an example more like this, we get additional security: https://designracy.com/recaptcha-using-ajax-php-and-jquery/
function showEmail() {
/* Check if the captcha is complete */
if ($("#g-recaptcha-response").val()) {
$.ajax({
type: ‘POST’,
url: "verify.php", // The file we’re making the request to
dataType: ‘html’,
async: true,
data: {
captchaResponse: $("#g-recaptcha-response").val() // The generated response from the widget sent as a POST parameter
},
success: function (data) {
alert("everything looks ok. Here is where we would take 'data' which contains the e-mail and put it somewhere in the document");
},
error: function (XMLHttpRequest, textStatus, errorThrown) {
alert("You’re a bot");
}
});
} else {
alert("Please fill the captcha!");
}
});
Where verify.php is:
$captcha = filter_input(INPUT_POST, ‘captchaResponse’); // get the captchaResponse parameter sent from our ajax
/* Check if captcha is filled */
if (!$captcha) {
http_response_code(401); // Return error code if there is no captcha
}
$response = file_get_contents("https://www.google.com/recaptcha/api/siteverify?secret=YOUR-SECRET-KEY-HERE&amp;response=" . $captcha);
if ($response . success == false) {
echo ‘SPAM’;
http_response_code(401); // It’s SPAM! RETURN SOME KIND OF ERROR
} else {
// Everything is ok, should output this in json or something better, but this is an example
echo 'email#example.com';
}
People who write scrapers want to make their scrapers as efficient as possible. Therefore, they won't download styles, scripts, and other external resources. There's no method that I know of to set a mailto link using CSS. In addition, you specifically said you didn't want to set the link using Javascript.
If you think about what other types of resources there are, there's also external documents (i.e. HTML documents using iframes). Almost no scrapers would bother downloading the contents of iframes. Therefore, you can simply do:
index.html:
<iframe src="frame.html" style="height: 1em; width: 100%; border: 0;"></iframe>
frame.html:
My email is me#example.com
To human users, the iframe looks just like normal text. Iframes are inline and transparent by default, so we just need set its border and dimensions. You can't make the size of the iframe match its content's size without using Javascript, so the best we can do is giving it predefined dimensions.
First, I don't think doing anything with CSS will work. All bots (except Google's crawler) simply ignore all styling on websites. Any solution has to work with JS or server-side.
A server-side solution could be making an <a> that links to a new tab, which simply redirects to the desired mailto:
That's all my ideas for now. Hope it helps.
Short answer to fulfill all your requirements is that it's impossible
Some of the script-based options answered here may work for certain bots, but you wanted no-script, so, no, you can't.
based on the code of MaanooAk, here is my version:
<a href="mailto: Mike Myers"
onclick="this.href=this.href.replace(' Mike ','MikeMy'); this.href=this.href.replace('Myers','ers#vwx.yz')">✉ Send Email</a>
The difference to MaanookAks version is, that on hover you don't see mailto: and a broken email adress but mailto: and the name of contact. And when you click on it, the name is replaced by the email adress.
In the code the email adress is splitted into two parts. Nowhere in the code the email adress is visible complete.
Here is my new solution for this. I first build the email adress string by addition of small pieces and then use this string also as title:
adress = 'mailt' + 'o:MikeM' + 'yers#v' + 'wx.yz';
document.getElementsByClassName('Email')[0].title = adress;
function mail(){window.location.href = adress;}
<a class='Email' onclick='mail()'>✉ Send Email</a>
I use this in a footer of a website. Many pages with all the same footer.
PHP solution
function printEmail($email){
$email = ''.$email.'';
$a = str_split($email);
return "<script>document.write('".implode("'+'",$a)."');</script>";
}
Use
echo printEmail('test#example.com');
Result
<script>document.write('<'+'a'+' '+'h'+'r'+'e'+'f'+'='+'"'+'m'+'a'+'i'+'l'+'t'+'o'+':'+'t'+'e'+'s'+'t'+'#'+'g'+'m'+'a'+'i'+'l'+'.'+'c'+'o'+'m'+'"'+'>'+'t'+'e'+'s'+'t'+'#'+'g'+'m'+'a'+'i'+'l'+'.'+'c'+'o'+'m'+'<'+'/'+'a'+'>');</script>
P.S. Requirement: user must have JavaScript enabled
The one method I found effective is using it with CSS like below:
<a href="mailto:myemail#ignore-domain.com">myemail#<span style="display:none;">ignore-</span>example.com
and then write a JavaScript to remove the ignoreme- word from the href="mailto:..." attribute with regex. This will hide email from bot as it will append ignore- word before real domain and this will work on screen reader and when user clicks on the link custom JS function will remove the ignore- word from href attribute so it will open the real email.
This method has been working very effectively for me till date. you can read more on this - http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfuscate-e-mail-addresses-compared/

Email clients changing my links & to &

I am sending out email alerts with a static google map on them. Certain email clients are changing the content of the link so that it no longer works. Specifically, "&" is getting changed to & a m p ; (no spaces)
Google will not accept the link with the substitutions. I have also tried sending %26 in place of &, but google will not accept that. This is only happening on certain systems like AOL and maybe Hotmail.
Here's an example link:
http://maps.google.com/maps/api/staticmap?sensor=false&center=36.124023600000,-115.170356400000&size=500x300&maptype=roadmap&markers=label:S%7Ccolor:red%7C36.114646000000,-115.172816000000&markers=label:H%7Ccolor:green%7C36.124023600000,-115.170356400000
Is there anyway to tell the email client NOT to change the link, or is there anyway to change the original link to work with Google and bypass the email client substitution?
TIA,
David
Your email program is probably telling the receiving programs what type of message it is. There is a header "Content-Type:" which is usually either "text/plain" or "text/html", with it defaulting to text/plain.
Probably what is happening is that your message is going out as text/html, and the receiving program is rendering as HTML, but first fixing things up so that ampersands get displayed as ampersands instead of as HTML directives. Otherwise, if I send a message that says "John & Carol are having a surprise party", the ampersand will screw things up or not get shown. (I forget right now which.)
It is also possible that your email program is doing the conversion before the message gets sent out and most of the receiving email programs are recognizing the issue and fixing it.
In order to tell exactly what is going on, send yourself one of these alerts and view the source. How you view the source varies depending upon which email program you use to read the message. With GMail, open the message, and in the upper right hand of the message, there is a drop-down "More"; choose "show original". Look for what the content type is and whether the ampersands in the URL are & or &.
Now, as to how to fix it, probably what you are doing right now is just typing the URL into the message. Instead, try making it a link. How you do this will vary depending upon which program you use to compose, but on GMail for example, there is a little symbol like a chain. Press that button and enter in the URL. That will generate HTML something like
<a href="http://mydomain.com/blah/blah?foo=1&bar=2&baz=3>http://mydomain.com/blah/blah?foo=1&bar=2&baz=3</a>
Thus, even if the ampersands get displayed as &, when the user clicks on the link, they will get the URL with proper ampersands.
I can't be certain this will work, as I don't know what email program you are using and thus how it does things. However, I think it is likely to work.
If it doesn't, you might be able to get it to work by sending as text/plain instead of text/html.

URLENCODE Variable in Salesforce Vertical Response Email

This is a rather simple question, but I cannot find documentation about it from Salesforce.
I am setting up an HTML Newsletter from Salesforce Vertical Response, and I need to put a link in the body of the email that goes to another site which takes the user's email address as a query string. I am doing this so that when the user clicks the link from the HTML email, they will automatically be signed up for a different blog mailing list.
The link will look like this www.mywebsite.com/blog/subscribe?email=your_email#email.com.
I can easily accomplish this by using the {EMAIL_ADDRESS} variable, such that the link looks like this:
Subsribe
This workds, but when the user gets the email and clicks the link, the '#' symbol gets stripped from the URL. Now I'm trying to figure out how to get around this. I saw some documentation on the URLENCODE() function for SalesForce, but when I try to use it in the HTML email editor in SalesForce, like URLENCODE({EMAIL_ADDRESS})it doesn't execute it, and instead interprets it literally as text. Can anyone help me? is it even possible to use functions from within the SalesForce HTML email editor?
Thanks
I havent used VerticalResponse, but if it leans on salesforce communication templates then you can always create an email template as Visualforce page. Then you can apply Encode functions to merge fields.
I'm glad you were able to find a workaround. If you ever go back to dealing with the URL, it's a good idea to disable our click-tracking when working with merge fields. This can be accomplished by adding nr_ before the http. Example: Subsribe - If you ever try that and it doesn't work, or if you have any other questions, please let us know via one of our Support channels:
support#verticalresponse.com
866-683-7842 x1
We also have live chat available: http://help.verticalresponse.com/
Regards,
Keith Gluck
VerticalResponse Customer Support

HTML Signature - Embedding a website

I'm trying to use an html email signature that pulls the html from another site. So, imagine I have the html hosted at blahblah.com/blah.html, and blah.html is:
<html>
<body>
Jon Jones
jon#blahblah.com
</body>
</html
And then my html signature would be something like <embed src="blahblah.com/blah.html/> that way I can manipulate the signature without having to constantly change the actual signature in Outlook (which I use to check my email).
I can't figure out any html that will do what I'm trying to do. The embed tag that I posted above doesn't do the trick. What simple line of html can I use to say "display what you find at blahblah.com/blah.html"
I would venture a guess and say this isn't the best way to do this.
From a security standpoint, I wouldn't want to be viewing any email sent by you that also brings in somesite.com/signature.htm. Even if it did, it would invoke a "click to view linked elements in this email" banner, and hide it until I did so (but chances are I'm not clicking).
From a recipient stand point, some spam filters block emails with externally-linked content (your intended recipient may not even get your email, or (best-case) see it with [spam] in the subject line.)
If you want an easy up-keep, you could place the signature in your my documents/some other folder and link to it via outlook's settings, but that about the least intense method (while also not causing concerns or issues to anyone viewing your email.)
It looks like instructions for what you want are here: http://www.emailaddressmanager.com/tips/html-email.html
Under "How to add HTML links in Outlook HTML emails," point to blahblah.com/blah.html
On the other hand, HTML in emails is generally not a great thing because it often isn't very secure (you could send me a page with HTML that would load a virus), so many clients won't be able to recieve it or will flag it as spam.