I'm developing a Google Workspace extension for Gmail which makes requests to backend API and fetches images from different CDN servers. Therefore it is not possible to include every possible URL in urlFetchWhitelist property of the manifest. Is it possible to add only the server's hostname and use some kind of wildcard to include all of its possible suffixes?
Wildcards are supported in urlFetchWhitelist, but they are limited to sub-domains.
See documentation linked below:
https://developers.google.com/apps-script/add-ons/concepts/workspace-manifests#allowlist_urls
When you're writing the manifest.json file, you have to specify matches for your content scripts. The http and https work fine, but if I try to include chrome://*/* or any variant of it, I get an error that I'm attempting to use an invalid scheme for my matches.
Is it not allowed?
By default you cannot run on a chrome:// url page.
However, there is an option in chrome://flags/#extensions-on-chrome-urls:
Extensions on chrome:// URLs (Mac, Windows, Linux, Chrome OS, Android)
Enables running extensions on chrome:// URLs, where extensions explicitly request this permission.
You still have to specify pages that your extension can run on and wildcards are not accepted - so you have to specify the full URL eg chrome://extensions/
The authorized schemes for matches are http, https, file, ftp.
Therefore, chrome is not a valid scheme.
Yes, it is not allowed. You can't link to them from hrefs on a webpage either.
I have a system whenever user upload an image, it will send an email to the registered user's gmail. But in the email, i see something like this, the thumbnail is not viewable.
I inspect on the element, and found the src linked to this url:
https://ci5.googleusercontent.com/proxy/VI2cPXWhfKZEIarh-iyKNz1j9q7Ymh8ty4Yz19lXh82RjSlACBzS0aRajfIj913uXAsX2ylcLEDs5FBsj4cR9TcU75Pw5djdHx4htxdCAQxs_ue1Q1wi5TV43uLLBpigpjH1xN747mUHSRdTBJmXQWFyykInJCRXicM1KhNk=s0-d-e1-ft#https://www.somedomain.com/files/1658/thumbnail_71JtDozxS1L._SY450_.jpg
Obviously it is being cached by google proxy
But i can view the image without google user content, by accessing https://www.somedomain.com/files/1658/thumbnail_71JtDozxS1L._SY450_.jpg (i masked the domain so the image might not available to you).
I tried to clear browser cache but the problem still persist. How can i bypass the googleusercontent thingy or at least make the thumbnail able to display.
I checkout on this link Images not displayed for Gmail but im not using localhost and the image itself is accessible outside of my local network.
How does Google Image Proxy work
The Google Image Proxy is a caching proxy server. Every time an image link is included in email the request will go to the Google Image Proxy first to see if it has been cached, if so it should serve it up from the proxy or it will go fetch it and cache it there after.
The solution for most issues
The Google Image Proxy server will fetch your images if this images:
have extensions like .png, .jpg/.jpeg or .gif only. May be .webp too. But not .svg.
do not use any kind of query string part in the image URL like ?id=123
have an URL which is mapped onto the image directly.
have not a long name.
Requirements for image server:
The response from image server/proxy server must include the correct header like Content-Type: image/jpeg.
File extension and content-type header must be in the same type.
Status code in server response must be 200 instead of 403, 500 and etc.
What could help too?
Google support answer:
Set up an image URL proxy whitelist
When your users open email messages, Gmail uses Google’s secure proxy
servers to serve images that might be included in these messages. This
protects your users and domain against image-based security
vulnerabilities.
Because of the image proxy, links to images that are dependent on
internal IPs and sometimes cookies are broken. The Image URL proxy
whitelist setting lets you avoid broken links to images by creating
and maintaining a whitelist of internal URLs that'll bypass proxy
protection.
When you configure the Image URL proxy whitelist, you can specify a
set of domains and a path prefix that can be used to specify large
groups of URLs. See the guidelines below for examples.
Configure the Image URL proxy whitelist setting:
Sign in to your Google Admin console. Sign in using your administrator account (does not end in #gmail.com).
From the Admin console Home page, go to Apps > G Suite > Gmail > Advanced settings. Tip: To see Advanced settings,
scroll to the bottom of the Gmail page.
On the left, select your top-level organization.
Scroll to the Image URL proxy whitelist section.
Enter image URL proxy whitelist patterns. Matching URLs will bypass image proxy protection. See the guidelines below for more details and
instructions.
At the bottom, click Save.
It can take up to an hour for changes to propagate to user accounts.
You can track prior changes under Admin console audit log.
Guidelines for applying the Image URL proxy whitelist setting
Security considerations
Consult with your security team before configuring the Image URL proxy
whitelist setting. The decision to bypass image proxy whitelist
protection can expose your users and domain to security risks if not
used with care.
In general, if you have a domain that needs authentication via cookie,
and if that domain is controlled by an administrator within your
organization and is completely trusted, then whitelisting that URL
should not expose your domain to image-based attacks.
Important: Disabling the image proxy is not recommended. This option is available to provide flexibility for administrators, but
disabling the image proxy can leave your users vulnerable to malicious
attacks.
Entering Image URL patterns
To maintain a whitelist of internal URLs that'll bypass proxy
protection, enter the image URL patterns in the Image URL proxy
whitelist setting. Matching URLs will bypass the image proxy.
A pattern can contain the scheme, the domain, and a path. The pattern
must always have a forward slash (/) present between the domain and
path. If the URL pattern specifies a scheme, then the scheme and the
domain must fully match. Otherwise, the domain can partially match the
URL suffix. For example, the pattern google.com matches
www.google.com, but not gle.com. The URL pattern can specify a
path that's matched against the path prefix.
Important: Enter your actual domain name as you enter the image URL pattern. Always include a trailing forward slash (/) after the
domain name.
Examples of Image URL patterns
The following patterns are examples only. The following patterns:
http://rule_fixed_scheme_domain.com/
rule_flex_scheme_domain.com/
rule_fixed_subpath.com/cgi-bin/
... will match the following URLs:
http://rule_fixed_scheme_domain.com/
http://rule_fixed_scheme_domain.com/test.jpg?foo=bar#frag
http://rule_fixed_scheme_domain.com
rule_flex_scheme_domain.com/
t.rule_flex_scheme_domain.com/test.jpg
http://t.rule_flex_scheme_domain.com/test.jpg
https://t.rule_flex_scheme_domain.com/test.jpg
http://rule_fixed_subpath.com/cgi-bin/
http://rule_fixed_subpath.com/cgi-bin/people
Note: The URL scheme (http://) is optional. If the scheme is omitted, the pattern can match any scheme, and allows partial matches
on the domain suffix.
Previewing the image URL patterns
Click Preview to see if the URLs match the image URL patterns
you've set. If the image URL matches a pattern, you'll see a
confirmation message. If the image URL does not match, an error
message appears.
Bharata has a great and detailed answer on this, but just wanted to add one addition that I identified with a similar issue.
We had a x-webkit-csp content security header that turned out to be the culprit.
Removing it and all worked through the image proxy.
Google's response was that x-webkit-csp is deprecated and to use the Content-Security-Policy header instead.
However this seems like a bug that an unsupported header throws a fatal error rather than simply ignoring it.
TL;DR: Make sure your server isn't blocking external connections (through AWS or .htaccess or some other firewall)!
I was having this problem too. I ran through every solution I could think of and every one I found online. Nothing fixed it.
Finally, I inspected the image in Gmail so that I could get the full CDN address Google creates for it. I tried to open that in a new tab and it failed. So I realized that Google wasn't able to grab the image.
In the end, I'd forgotten that I have the server locked down from all traffic except for my own (just a basic .htaccess IP deny). It's just a simple security layer I use while I'm in my development. Keep in mind that you might have it locked down within AWS or something more rugged like that.
I opened up all IPs for a minute, tested it, and sure enough it worked as expected. The old emails that were previously failing also worked, so it seems that Google tries to work their magic anytime the email is opened and they don't have the image saved. Once I closed the IP address again, the image continued to work whatever Google. I'm guessing once they write it to their CDN is remains there indefinitely.
So if you're certain that you've done everything else correctly, I would suggest making sure that the server is indeed open to the outside world so Google can process the image.
I had the same problem and I solved it by specifying the "https://" protocol in the "src" url of the img, otherwise by default "http" is prepended
I am using a paid geolocation script to direct users to specific sites based on country.
However, I am getting charged a lot because robots keep crawling every page of my large site.
If I were to disallow google within the robots.txt and provide a sitemap within the robots.txt would google still index my page without crawling?
Example
User-agent: *
Disallow: /
Sitemap: sitemap.xml
Google index only with crawling...
The best thing to do for you, is to disable the geolocation script when you detect a Google robot (or other)
You can recognize them in various ways: HTTP_USER_AGENT or HTTP_FROM, or IP
I thought Chrome apps were sandboxed, but I see that you can allow permissions for file:// urls, and there's a special matching pattern:
<all_urls> Matches any URL that uses a permitted scheme. (See the beginning of this section for the list of permitted schemes.)
Are the file urls restricted to within the sandbox or can it really go anywhere?
Yes, you can access the entire filesystem (subject to the permissions of the user account running Chrome). However, the user must explicitly activate access to file: URLs by checking the "Allow access to file URLs" box by your extension's listing in chrome://extensions. Your extension must satisfy both requirements:
include the permission in the manifest (either a particular file://... match pattern or <all_urls>), and
have the "Allow access to file URLs" box checked by the user.
See also How to allow a Google Chrome plugin to access files as webpages.