Display a "No Javascript" div, but not to google / facebook share service

Display a "No Javascript" div, but not to google / facebook share service - html

I would like to show a div near the top of a site to suggest to visitors that do not have javascript enabled that they should enable their javascript. I thought I had found a good method by using the noscript tag.
Unfortunately I found that this solution was less than ideal because of services like Google's indexer and Facebook's link sharing functionality. These services scrape the page and read the text in the noscript div as the summary of the page. This happens because these services are not utilising javascript (obviously).
So, my question to the masses is: What techniques do you prefer for avoiding having your "please enable your javascript" messages appearing in Google's results etc.
Ideally, I'm hoping to discover the best practice for solving this issue, but am interested in hearing any techniques you have user successfully, or unsuccessfully in the past.
Thanks!

In a pure HTML scenario (as tagged), consider placing your message at the bottom of the page, and using CSS to position it visibly at the top. This should push your warning far enough down the page as to avoid it showing up in typical search results.
If your HTML is generated by server script, then you may be able to conditionally present the element based on the client UserAgent. A good search engine user agent list would be convenient in this case.

Related

Auditing unused CSS on complex web pages

I know there are several tools available to find unused CSS on a static web page. But in most real world scenarios I encounter, a lot of the CSS is used after some or the other interaction on the page, maybe a new modal opening up or an options popup etc.
In such scenarios, what would you suggest? How do I keep a tab on my ever-growing render blocking CSS?
The only way I guess one could do that is by running regular unused-css-detector type tools in conjunction with Selenium - test known interactions and see whats left unused. But a big assumption here is that I'd need to know all interactions on my page which could use new CSS. Is there a way to achieve my goal without making this assumption?
In an ideal world, I'd be able to post-back all CSS used by a visitor's browser on my page to my server. Then I'd collect data over a month, aggregate, and get a pretty accurate idea about actual unused CSS.
Any good ideas?

I am the author of a tool that is aiming at doing what you are describing. Everywhere I worked, the CSS is this "append-only" thing that is too risky, too time-consuming to clean up. And even when you try, the ROI is so low that it not worth it.
So I am working on a tool that is very similar to what you are describing. The goal is to bring confidence on what can be removed, and to actually do it automatically by submitting pull requests.
A snippet of JavaScript is running in the browser and sends reports of what is being used to a server. Once enough data is accumulated to build some "confidence score", it can create Pull Request automatically to remove selectors that are actually not used.
It is still very early stage, but you are welcome to try it and give some feedback about it.
https://www.bleachcss.com/

How to hide content from File2HD?

There is a website called file2hd.com which can download any type of content from your website including audio, movies, links, applications, objects and style sheets. Of course this doesn't work for high profile websites such as Google, but is there there a type of method I can use to cloak content on my website and prevent this?
E.g. Using a HTML Code, or using .htaccess method?
Answers are appreciated. :)

If you hide something from the software, you also hide it from regular users. Unless you have a password-protected part of your website. But even then, those users with passwords will be able to fetch all loaded content - HTML is transparent. And since you didn't provide what kind of content are you trying to hide, it's hard to give you a more accurate answer.
One thing you can do, but it works just for certain file types, is to server just small portions of a file. For example, you have a video on your page and you're fetching 5-second bits of the video from the server every 5 seconds. That way, in order for someone to download the whole thing, they'd have to get all the bits (by watching the whole thing) and then find a way to join the parts... and it's usually just not worth it. Think of Google Maps... and Google uses this/similar technique on a few other products as well.

Getting html content from one page and adding it to my website

I have affiliated with expedia and I am using their API system. One of their requirements for launching the site is adding the terms and agreements to my page and they give us this page: http://travel.ian.com/index.jsp?pageName=userAgreement&locale=en_US&cid=xxx. I do not want to go to a different site, and I can not copy and paste the information because of updates. I also prefer not to use an iframe. Does anyone have any ideas on how to do this? Here is a webpage using this on their site with their domain: http://www.helloweekends.com/terms.htm. Does anyone know how they did this? Any help would be greatly appreciated!

Since it originates from another domain, it wouldn't be possible to use JavaScript, due to the same origin policy. Also, relying on JavaScript for the update would be trouble for users who has JavaScript disabled, as they wouldn't see the terms. Since you don't want to use an iframe, or copy the content, I guess your best shot would be to scrape their page with a server-side language of your choice, and then display it on your page.
Scraping can be a bit tricky though, if you rely on their markup. If they change their markup, there is a chance that your script will break, thus stop updating the terms.
There are various tutorials available on how to scrape sites. Here are a few PHP examples:
Web scrape with PHP
PHP Screen Scraping Tutorial
Note Make sure that they allow you to scrape the page prior to implementing it, so that you don't violate their rules.

Do you know if their API serves something with JSON? A JSONP call can get the values to you, but it will make your page rely on javascript for the users to see the updated page.
Another option is to use PHP of any other server side language to get the contents of the url, process it and return the block you require.

I would suggest the load() function offered by jQuery. It makes a simple AJAX call to retrieve a file, and you could even use a selector to only grab part of the page. For example, load the contents of a HTML page into a div:
$('#div_id').load('my_file.html');
Or just load a part of the page:
$('#div_id').load('my_file.html #main_text_id');

Prevent site configuration info from showing up on Google

I have a site that's running WordPress.
The main page has an embedded Flash player and an imbedded iframe, and for some reason, all the configuration info from the Flash player is showing up on Google for my site, and nothing else.
How can I have the main site information show up on Google, without having that Flash player config info show up?
And can I customize what shows up at all?
If there's some way to tag the info I don't want to show up, or tag the info I want to show up, I can probably do most ofthe edits myself, I just don't know where to start...
EDIT: I tried most of the suggestions below, and I didn't get anywhere...
Any other ideas?
Thanks a lot!

If you don't want Google, or other crawler to access certain parts of your website you should use a robots.txt file. Inside you specify which parts are accessible and which aren't, when the crawlers get to your website will always look for this file for instructions.
You can check some documentation on how to do it here and here

In order to influence what text is used on the google search result try putting this within your head tags
<meta name="description" content="WHATEVER YOU WANT DISPLAYED ON GOOGLE">
Source: http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en/us/webmasters/docs/search-engine-optimization-starter-guide.pdf
Some more information from google on controling parts of a page. Apparently there are google off/google on tags.
http://perishablepress.com/press/2009/08/23/tell-google-to-not-index-certain-parts-of-your-page/
Hope this helps.

If you want Google to index only part of your pages, you can't follow normal SEO routines. You should provide a mechanism to understand whether the current client (requester) is a robot or not. If yes, then don't render that part. This is the only way. Otherwise, a robot either gets the whole rendered content, or doesn't have access based on robots.txt file (Robot Exclusion Protocol).
Another way (which is not really smart, and can't be guaranteed to work) is to dynamically inject your content into the page via JavaScript. Because AMAIK, robots don't run JavaScript.

As search spiders won't render javascript generated markup (JS is not run as it is client-side in the browser), a quick fix would be to don't output any of flash / markup initially in the HTML document and then use JS to add the flash stuff on load.
Note: as far as I'm aware, Google is currently testing a JS reading spider so this may not work long term.

Google is returning this data because it simply can't find any content where it normally would. Search engines require content - they're not advanced enough to process your multimedia to determine what it's all about.
Google will IGNORE your meta description if it doesn't feel that it reflects your page content (of which there is only iframes and JS)
Use SWFObject to provide alternate content for users without flash (including search engines) - ensure it's not some dinky text like "download flash here" - but a lengthy descriptive content piece about your site or media that they would normally experience if they could experience.
Use robots.txt or <meta name="robots" content="noindex,follow"> for the iframe content to prevent it from being indexed.
For the love of all things holy, please look at reducing the number of JS files and inline JS on your site (i'd recommend WP-minify since it's so obvious that you love plugins)

Embed section of HTML from another site?

Is there a way to embed only a section of a website in another HTML page?
Example: I see an answer I want to blog about, so I grab the HTML content, and splat it in somewhere, and show only that, styled like it is on stackoverflow. Basically, I want to blockquote the section of the page with original styling, if that makes sense. Is that something the site itself has to provide, or can I use an iframe and tell it to show only a certain element or something crazy? Open to all options, but I want it to show up as HTML, not as an image (that's really a last resort).
If this is even possible, are there security concerns I need to aware of?

Don't think image should really be last resort. You have no control over the HTML/CSS of the source page, so even if you craft a solution (probably by using JavaScript to parse out the desired snippet) there is no guarantee that tomorrow the site doesn't decide to change its layout.
Even Jeff, who has control over the layout of stackoverflow.com, still prefers to screen-capture the site, rather than pull in the contents live.
Now if your goal was to have the contents auto-update, that would be a different story. But still, unless you use some agreed-upon method of sharing content, such as RSS, your solution would be very fragile.

The concept you are describing is roughly what is called a "purple include" or "transclusions". There is a library out there for it, but its not exactly actively developed. Here's a couple ajaxian articles on it.

I'd recommend using a server side solution with Python; using urllib2 to request the page, then using BeautifulSoup to parse out the bit that you need. BeautifulSoup has a very flexible selection api with which you can craft heuristics for the section you are interested in.
To illustrate:
soup = BeautifulSoup(html)
text = soup.find(text="Some text on the page that is unlikely to change")
print soup.parent.prettify()
That way if the webmaster later changes the markup on the page, your scraping script should still work.

On client side <iframe> is the only practical option. It is possible to scroll it, but it might not work in the long term, because it's technically close to clickjacking attack.
There's also cross-site XHR, but requires opt-in from destination site, and today works only in few latest browsers.
Getting HTML on server side is easy (every decent web framework has ability to download page and parse HTML and you can use XPath/XSLT or DOM to extract bit you want).
Getting styles however is going to be tricky – CSS rules may not work with HTML fragment taken out of context. You'd have to parse CSS, extract and transform rules or use browser and read currentStyle of every node.
Obviously you have to heavily filter HTML you extract to avoid XSS. It's harder than it seems.
If you don't need to automate this, a good HTML+CSS WYSIWYG editor might be able to extract content fragment with styles.

That sounds like something that IE8's Web Slices would be perfect for. However, it's only available in IE8, and the site of origin would have to implement for you to be able to take advantage of it.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Display a "No Javascript" div, but not to google / facebook share service - html

Related

Auditing unused CSS on complex web pages

How to hide content from File2HD?

Getting html content from one page and adding it to my website

Prevent site configuration info from showing up on Google

Embed section of HTML from another site?

Categories

Resources