How to identify ads on the website - html

I'd like to programatically analyze the content of a website and find possible spots where ads might be placed (or the ads themselves). Different websites may have the ads from different vendors placed in many different formats and I'd like my solution to pick as many of them as possible.
How would you programatically solve this problem. So far I have found only one solution but I'm not very happy about it (the reason below).
The obvious solution would be to do a serious of regex searches on a source code looking for ad-engine specific JS and/or HTML. I belive this is something similar to what AdBlock uses to strip ads from websites in browser. However since there's so many ad engines this wouldn't be neither effective nor easy to maintain (even if we consider using AdBlock black lists to feed the search engine).
I'd like to find a more generic solution to this problem and I'm not necessarily looking for a final solution. Different views on the problem will be helpful.

I don't think maintaining a list of ad vendors it that difficult, especially given that there are only a few major players who serve up 90%+ of all ads.
If you're not looking for a catch-all solution, detecting 90%+ would be an accepable hit rate I'd say.
Doing it 'heuristically', you could simply flag up any Flash or similar media objects served from a domain which is different to that on which the hosting page resides?

Related

Always avoid using <iframe>?

Some days ago, some friends of mine told me to avoid using <iframe> for virtually anything, which of course includes Google Maps. That made me do some research and, among other things, find this thread in Quora (http://www.quora.com/Google-Maps/What-are-best-practices-and-recommendations-to-implement-Google-maps-within-an-iframe-on-a-webpage), which I think isn't conclusive, at least in my case. I've made a simple site which includes displaying a Google Map. I used an <iframe> because it is very simple and, as pointed out before, it is the option that Google offers within every map, so I guessed it was the optimal one.
My question is: using an <iframe> is always a bad solution, or in a simple case like mine (only displaying a location map), is it recommended?
Thank you all, please let me hear your thoughts on this,
João
Using an iframe is like having another page loaded in your browser. Which takes resources. I think this is what the suggestion to avoid it based on. But naturally, the solution is to avoid those who suggest that you should avoid something always. Just use it when it makes sense and know where to stop.

cloud based "knowledge base" approach with links, snippets and excerpts for google chrome

I'm working as a web developer and have lots (hundreds) of links with hacks, tutorials and code snippets that I don't want to memorize. I am currently using evernote to save the content of my links as snippets and have them searchable and always available (even if the source site is down).
I spend a lot of time on tagging, sorting, evaluating and saving stuff to evernote and I'm not quite happy with the outcome. I ended up with a multitude of tags and keep reordering and renaming tags while retagging saved articles.
My Requirements
web based
saving web content as snippets with rich styling (code sections, etc.)
interlinked entries possible
chrome plugin for access to content
chrome plugin for content generation
web app or desktop client for faster sorting / tagging / batch processing
good and flexible search mechanism
(bonus) google search integration (search results from KnowledgeBase within google search results)
I had a look at kippt but that doesn't seem to be a solution for me. If I don't find a better solution, I'm willing to stay with evernote as it meets nearly all my needs but I need a good plan to sort through my links/snippets once and get them in order.
Which solutions do you use and how do you manage your knowledge base?
I'm a big Evernote fan but a stern critic of all my tools. I've stuck with Evernote because I'm happy enough with its fundamental information structures. I am, however, currently working on some apps to provide visualisations and hopefully better ways to navigate complex sets of notes.
A few tips, based on years of using Evernote and wiki's for collaboration and software project management:
you can't get away from the need to curate things, regardless of your tool
don't over-think using tags, tags in combination with words are a great way to search (you do know you can say tag:blah in a search to combine that with word searches?)
build index pages for different purposes - I'm using a lot more of the internal note links to treat Evernote like a wiki
refactor into smaller notebooks if you use mobile clients a lot, allowing you to choose to have different collections of content with you at different times

Tree view navigation, good idea?

I'm thinking of using a tree view for page navigation in my web application, similar to Windows Explorer. There are a lot of things for administrators to configure in the application so I figured listing all links in a single page in tree form would keep things organized. Related page links are grouped in a "folder", and all folders will show closed initially.
Obviously, this page is for administrators only, so they'd be provided with some training. That being said, is this a good design from user's point of view? Do you see any usability or potential implementation issues?
The best answer involves empirical evidence. A yes or no answer could really vary based on the specific task and your intended audience. Try doing a simple 5 minute usability test with your users. Draw out your page layouts on paper and have a couple of users pretend to use the site (see Paper Protyping). Give them a few simple tasks to complete using your interface and observe what they do.
If they get confused or have trouble with the concept, then it's probably best to find another way to provide navigation.
It totally depends on how your users are using your site. If they're often jumping from one part of the site to a completely different, unrelated place in the site, a tree may be the best way to let them quickly find that "other page" they were looking for.
However, for the vast majority of websites I've ever seen or used, I'd prefer to find what I'm looking for either via Search functionality, or by links on the page I'm looking at that lead me to related data.

Why do Google and Twitter use table layout? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Does it make sense to use the <table> tag on a “modern” website?
Everywhere I go I see don't use table layout, it's evil, even Google says that, so why one of the most visited websites, Google and Twitter, use it for their homepage?
I don't think it is any kind of mistake, or any other stupid problem.
The only reason I can think of is, that they want the page to look similar even if client doesn't support CSS, like with Lynx.
So why everyone say it is so bad, if the biggest websites on the internet use it.
In my opinion, there are some cases, like those mentioned above, where it is vitaly important to have the same look everywhere, than it is OK to use tables.
edit: The same question goes about html elements like <center>, or formatting text with attribute align and other "html attributes and elements used to substitute CSS functionality"
Page load time is king for these guys, and bandwidth usage is extreme.
I'd have to say they use tables for raw data speed, since they are serving up so much bandwidth every day.
Also, notice that they use inline styles in the page header to reduce the number of HTTP requests to help speed up page load time.
Table layout gets more grief than it deserves. It's easy for developers to use, it's consistent across nearly every web browser, and it allows you to easily add rows/columns with little to no effort.
The only downside is that it goes against the mantra that your document should only contain content, and your design should be contained separately (in a CSS file).
Google doesn't have to care about ranking high in search engines... ;)
Yep, agreed. Sometimes, just sometimes tables are just fine. Not everyone is writing websites that are targeted at every possible browser, that needs to support text to speech etc. In general, try to learn and grow your skills and use CSS positioning etc, but nothing bad will happen if someone uses a table to position things on a webpage.
Maybe this question should be community wiki, though?
It's evil, it's a pain, but rendering is pretty much guaranteed to be consistent across different browsers.
Table layout is working on most browsers. Google and others want to reach all user, not some or only modern users. Having different layouts or layout technologies is hard to maintain and costly in delivering content.
Table layout is not tricky. Its straight forward. You dont have to look for css hacks, browser incompatibilities or others.
Table layouts are bad. Because thy mix layout and content.
Twitter works pretty well from phones in web mode. Some web browsers are truly gruesome, so I assume Twitter does what it has to.
Given how poorly so many web sites work on phones, I'm more concerned about mobile compatibility than with the concerns of CSS evangelicals.
Three main reasons:
Tables are mainly bad for search engine reasons (there's also the issue of them messing up the DOM a bit, but that's not too bad). People don't search for Google on search engines, and people don't search for Twitter posts on generally either.
Tables render consistently on nearly every browser, including smartphones (which is a big concern for Twitter especially).
Tables consume less bandwidth. Both sites have immense data loads and need every bit of speed they can get.
Browser Support - These guys need to have their websites render perfectly on ALL web browsers (New, Old and Obscure). No matter who's using their websites and what OS/Browser their using, these websites need to work.
Each web browser supports it's own implementation of CSS and this causes a similar issue to that of the JavaScript DOM support in various browsers.
Page Load Time - Also their pages are optimized for Page load times. If it takes a user too long to load the page they'll just go somewhere else. There are still plenty of users without broadband, even a lot of mobile devices don't have very fast connections depending on where you are.

In MediaWiki is there a way to force a group of pages to have a particular skin?

The reason I am keen to do this is that we have a wiki which works great, but I would like to store help pages for an internal application in the wiki and link to those pages direct from the app. Although we wouldn't have concerns with people seeing the non-article stuff (i.e. the help pages) when viewing the pages from the rest of the wiki, for it to be streamlined when viewed from the application I thought it would be ideal if I gave it a simplified skin which I would design.
I have already found out that URLs can have the useskin= added (e.g. as is done in the Preview Skin page within the User Preferences pages), but following the links will revert you to your normal chosen skin.
Is there perhaps some way to adjust the skin, so that all the links contain useskin=? (I think this might have issues, since you appear to need the full pagename for useskin to work (e.g. ..../w/index.php?title=blah....&useskin=cologneblue as opposed to the short URLs).
If this isn't a smart way to go, I could consider different approaches (I run the box the wiki is on and could create a distinct wiki perhaps, although there might be disadvantages to this, such as needing to combine the user tables and maybe this would still pick up the user's preferred skin unless I re-coded things).
Any sensible suggestions gratefully received! Let me know if there's any more info you might need or if I need to clarify any points about my objective.
[I did submit this on the MediaWiki.org Support Desk page, but it got no response... I hope my question isn't that bad!!]
You could put all your content in its own namespace, then set the skin for that namespace using this extension (I've used it, it works well enough):
http://www.mediawiki.org/wiki/Extension:SkinPerNamespace
If you don't want to lock them all into a single namespace, you can also use the SkinPerPage extension to mark the pages individually.
Why not change the default skin to the skin you want?