When a tech document states that something is IP-Aware was does that specifically mean? - terminology

I need a class room definition for IP-aware for a research assignment. I.E. something that would be used, lets say, in a dictionary. Does anyone know or provide me with a link to a defintiion, I am having no luck with searches. Are IP-Aware devices only devices that can read/find/make decisions based on IP addresses. For example a router would be one, but a network printer would not?

I've never come across the phrase "IP aware" specifically (though I've worked in networking for over ten years); I'm not sure that it has a generally-accepted meaning.
It's possible that some people/companies have given it a meaning for their products - can you provide the context in which you found the phrase?
(And are you sure it's to do with networking? For instance, "IP" often stands for "Intellectual Property".)

Related

If I have a collection of random websites, how do I get specific information from each?

Say I have a collection of websites for accountants, like this:
http://www.johnvanderlyn.com
http://www.rubinassociatespa.com
http://www.taxestaxestaxes.com
http://janus-curran.com
http://ricksarassociates.com
http://www.condoaudits.com
http://www.krco-cpa.com
http://ci.boca-raton.fl.us
What I want to do is crawl each and get the names & emails of the partners. How should I approach this problem, at a high-level?
Assume I know how to actually crawl each site (and all subpages) & parse the HTML elements -- I am using Oga.
What I am struggling with is how to make sense of data that is presented in a wide variety of ways. For instance, the email address for the firm (and or partner) can be found in one of these ways:
On the About Us page, under the name of the partner.
On the About Us page, as a generic catch-all email.
On the Team page, under the name of the partner.
On the Contact Us page, as a generic catch-all email.
On a Partner's page, under the name of the partner.
Or it could be any other way.
One way I was thinking about approaching the email, is just to search for all mailto a tags and filter from there.
The obvious downside for this is that there is no guarantee that the email will be for the partner and not some other employee.
Another issue that is more obvious is detecting the partner(s) names just from the markup. I was initially thinking I could just pull all the header tags and text in them, but I have stumbled across a few sites that have the partner names in span tags.
I know SO is usually for specific programming questions, but I am not sure how to approach this and where to ask this. Is there another StackExchange site that this question is more appropriate for?
Any advice on specific direction you can give me would be great.
I looked at the http://ricksarassociates.com/ website and I cant find any partners at all so in my opinion you better stand to gain from this if not you better look for some other invention.
I have done similar datascraping from time to time, and in norway we have laws - or should I say "laws" - that you are not allowed to email people however you are allowed to email the company - so in a way the same problem from another angle.
I wish I knew maths and algorythms by heart because I am sure there is a fascinating sollution hidden in AI and machine learning, but in my mind the only sollution I can see is building a rule set that over time probably gets quite complex. Maby you could apply some bayesian filtering - it works very well for email.
But - to be a little more productive here. One thing i know is inmportant, you could start by creating the crawler environment and building the dataset. Have the database for URLS so you can add more at any time, and start the crawling on what you have already so that you do your testing querying your own data with a 100% copy. This will save you enormous time instead of live scraping while tweaking.
I did my own search engine some years ago, scraping all NO domains however I needed only the index file that time. Took over a week alone just to scrape it down and I think it was 8GB of data just for that single file, and I had to use several proxyservers aswell to make it work due to problems with to much DNS traffik. Lots of problems that needed being taken care of. I guess I am only saying - if you are crawling a large scale you might aswell start getting the data down if you want to work efficient with the parsing later.
Good luck, and do post if you get a sollution. I do not think it is posible without an algorythm or AI though - people design websites the way they like and they pull templates out of their arse so there are no rules to follow. You will end up with bad data.
Do you have funding for this? If so its simpler. Then you could just crawl each site, and make a profile for each site. You could employ someone cheap to manual go through the parsed data and remove all the errors. This is probably how most people does it, unless someone already have done it and the database is for sale / available from webservice so it can be scraped.
The links you provide are mainly US site, so I guess you are focusing on English names. In that case, instead of parsing from html tags, I would just search the whole webpage for name. (There are free database of first name and last name) This may also work if you are donig this for some other Europe company, but it would be a problem for company from some countries. Take Chinese as an example, while there is a fix set of last name, one may use basically any combination of Chinese character as first name, so this solution won't work for Chinese site.
It is easy to find email from a webpage as there is a fixed format of (username)#(domain name) with no space in between. Again I won't treat it as html tags but just as normal string so that the email can be found no matter it is in mailto tag or in plain text. Then, to determine what email is it:
Only one email in page?
Yes -> catch-all email.
No -> Is name found in that page as well?
No -> catch-all email (can have more than one catch-all email, maybe for different purpose like info + employment)
Yes -> Email should be attached to the name found right before it. It is normal that the name should appear before the email.
Then, it should be safe to assume the name appear first belongs to more important member, e.g. Chairman or partner.
I have done similar scraping for these types of pages, and it varies wildly from site to site. If you are trying to make one crawler to sort of auto find the information, it will be difficult. However, the high level looks something like this.
For each site you check, look for element patterns. Divs will often have labels, ID's, and classes which will easily let you grab information. Perhaps you find that many divs will have a particular class name. Check for this first.
It is often better to grab too much data from a particular page, and boil it down on your side afterwards. You could, perhaps, look for information which comes up on a screen by utilizing type (is link) or regex (is email) to look for formatted text. Names and occupation will be harder to find by this method, but might be related positionally on many pages to other well formatted items.
Names will often be affixed with honorifics (Mrs., Mr., Dr., JD, MD, etc.) You could come up with a bank of those, and check against them for any page you end up on.
Finally, if you really wanted to make this process general purpose, you could do some heuristics to improve your methods based off of expected information; names, for example, are most often within a particular list. If it was worth your time, you could check certain text for whether it matches a list of more common names.
What you mentioned in your initial question seems that you would have a lot of benefit with a general purpose Regular Expressions crawler, and you could make improvements on it as you know more about the sites which you interact with.
There are excellent posts on this topic with a lot of useful links throughout these webpages:
https://www.quora.com/What-is-a-good-web-scraper-for-pulling-emails-names-etc-even-if-the-contact-info-is-another-page-deep-a-browser-add-on-is-a-plus
http://www.hongkiat.com/blog/web-scraping-tools/
http://www.garethjames.net/a-guide-to-web-scraping-tools/
http://www.butleranalytics.com/15-web-scraping-tools/
Some of the examined applications are working in macOS.

SOLID principles, and hard code configuration inside a class

I have noticed in a lot of code lately that people put hard coded configuration (like port numbers, etc.) values deep inside of classes/methods, making it difficult to find, and also not configurable.
Is this a violation of the SOLID principles? If not, is there another "principle" that I can cite to my team members about why it's not a good idea? I don't want to just say "it's bad because I don't like it" but I am having trouble thinking of a good argument.
A good argument against hardcoding a TCP port number in a class would be 'Context independence' violation. From GOOS, with my emphasis:
Context Independence
... the
"context independence" rule helps us decide whether an object hides
too much or hides the wrong information. A system is easier to change
if its objects are context-independent; that is, if each object has no
built-in knowledge about the system in which it executes. This allows
us to take units of behavior (objects) and apply them in new
situations. To be context-independent, whatever an object needs to
know about the larger environment it’s running in must be passed in.
In this specific case of Context Independence I would call it 'Environment Independence'. In other words a class with hardcoded port number has inappropriate dependency on a runtime OS environment, essentially stating 'I know that port 7778 will always be available' which is clearly wrong.
The SOLID principles cover class design.
I suspect the idea that you should store configuration in configuration files isn't normally regarded as controversial enough to warrant inventing a special principle to persuade people! :)
Most people just figure it out from experience, the first time they try get the software running anywhere other than their own development workstation.
While not strictly SOLID, another principle of OOD is the The Common Closure Principle, which states that classes that change together are packaged together. While not exactly a class, you could stretch this idea to configuration information. Since e.g. port numbers change based on different criteria than the surrounding code, it seems to violates this.
The Single Responsibility Principle (the S in SOLID) states that a class should only have one reason to change. This article gives an example of a Modem interface, and discusses how the details of how to connect and hang up are a separate responsibility from the communication of data, and will probably change for different reasons. You could use this to make a similar case for why port numbers are an extra "reason for change", separate from the class's main responsibility.

Write programs that do one thing and do it well

I can grasp the part "do one thing" via encapsulation, Dependency Injection, Principle of Least Knowledge, and You Ain't Gonna Need It; but how do I understand the second part "do it well?"
An example given was the notion of completeness, given in the same YAGNI article:
for example, among features which allow adding items, deleting items, or modifying items, completeness could be used to also recommend "renaming items".
However, I found reasoning like that could easily be abused into feature creep, thus violating the "do one thing" part.
So, what is a litmus test for seeing rather a feature belongs to the "do it well" category (hence, include it into the function/class/program) or to the other "do one thing" category (hence, exclude it)?
The first part, "do one thing," is best understood via UNIX's ls command as a counterexample for its inclusion of excessive number of flags for formatting its output, which should have been completely delegated to another external program. But I don't have a good example to see the second part "do it well."
What is a good example where removing any further feature would make it not "do it well?"
I see "Do It Well" as being as much about quality of implementation of a function than about the completeness of a set functions (in your example having rename, as well as create and delete).
Do It Well manifests in many ways, some ways of thinking:
Behaviour in response to "special" inputs. Example, calculating the mean of some integers:
int mean(int[] values) { ... }
what does this do if the array has zero elements? If the items total more than MAX_INT?
Performance Characteristics. Has sufficient attention been given to behaviour as the data volumes increase?
Dependency Failures. If our implementation depends upon other modules or infrastructure what happens when these fail. Example: File System Full, Database Down?
Concerning feature creep itself, I think you're correct to indentify a tension here. One thing you might consider: you don't need to implment every feature providing that it's pretty obvious that a feature can be added easily without a complete rewrite.
The whole purpose of this advice is to make you favor quality over quantity.
The concept of one thing is subjective and depends on granularity. Would you say that a spreadsheet application does more than one thing if it can also print, or is that part of that one thing?
The point is that you should make sure that any feature, and the application itself, is done and will delight customers before you scramble to add new features.
I think your question points out the fundamentally organic nature of feature creep, and in understanding that nature, you will be empowered to meditate on the larger question.
Think of it like a garden: If you plant one thing and plant it well, say, a chrysanthemum, you aren't done at simply planting the seed. In fact you'll need to ensure that the soil is well tended, that the area is sufficiently protected, that the season is right, etc.
As your chrysanthemum (your one thing) grows, so too will other competitive plants - some that need to be weeded out and others that may actually compliment the original one thing. In fact, these other organisms may in some cases prove vital for the survival of your one thing.
Like those features that YAGN, a bit of vigilance is required to determine which weeds represent feature creep and which represent vital and complimentary functions.
Regardless, having done it well means simply that your chrysanthemum is hearty, healthy, and on-time. :-)
I would say an email program without the ability to add attachments would be a good example.
This may sound like an odd example, but I'd say dropbox is a good, albeit complex example.
Its managed to beat off a swathe of similar competing apps, through a dedication to simplification and a lack of feature creep tha,t as you mentioned, would violate the 'do one thing' principle. The ap lets you store documents in a folder that you can access anywhere, and that's about the limit of it. They drilled down to the core problem, and solved it in a way that works perfectly well in 90+% of cases.
Its hard to put a hard and fast rule to it, but I'd say that catering to around the 90% majority of use cases and ignoring 'fringe requirements' is the best way to stick to this rule.
I'd guess 90+% of ls use is with no arguments or maybe two or three of the most popular. The 'do it well' principle should focus on what the majority of users need, instead of catering for power users or fringe cases, as ls does with its plethora of options.
This is what dropbox does successfully and why it is pretty well agreed upon as an example of good application design.

How should web sites deal with localization settings? (from “What are common UI misconceptions and annoyances?”)

I’ve chosen to take this as a question in its own right since it was generating so much debate in the comments of the original post.
It’s interesting to see that a lot of people on SO (who are developer's) just don't get localization. Here’s my take on how it should work:
In all browsers that I've looked at (and for the .NET developers out there too) when you look at a user's culture preferences it is in the following format:
language-Culture.
So we have:
en-GB - English language - UK culture
en-US - English language - US culture
en - English language - Invariant culture.
fr-FR – French language – French culture
fr-CH – French language – Swiss culture
de-CH – German language – Swiss culture
de-DE – German language – German culture
See MSDN for a complete list that the .NET framework supports.
When I go to a website it knows that I want the English language from the en part and it knows I’m interested in it being slanted to the UK (number formatting, date formatting). So when I go to google.com and it takes me to google.de (because of my IP address) that’s completely fine if google.de displays everything to me in English but completely wrong since google.de is in German. I have little control over my IP address but complete control over my language and culture settings. If you’re interested Microsoft’s new search engine (bing.com) handles things properly. Let's hope Microsoft can learn how to do search as well as Google or Google can learn to localize as well as Microsoft ;)
MSDN has another good article here for more information
So what are your recommendations for how sites should deal with localizations?
The solution here is so simple, it's annoying that dev's do anything else.
Respect the browser setting. If it says English then by god it's English.
If you absolutely must, then simply add a button at the top to pick something else. Then, and ONLY then, do you override the browser.
If you think your way is better. Stop, have someone slap you. It's not. Repeat as necessary.
Get rid of those web splash pages that ask for someone's country. Just show your normal page, based off of the browser defaults, and see item 2 above. I have yet to run into a site where it actually matters. update: a few years later and there is now a reason to do this. In 2013 the UK instituted policies surrounding cookies that website operators need to respect for sites based in that country that are serving pages to visitors from that country. So pay attention to the laws in the countries you are hosted in.
IF you happen to have a site that really is served by multiple servers across multiple countries, then you can probably detect which one of your servers is really closer to serve from. If you can't, just stop the redirecting madness and then don't try and make a determination for them.
If localization settings are available - including, but not limited to, the HTTP Accept-Language header - then websites absolutely should respect them.
The common argument against this is that "average users" aren't smart enough to find the language settings and configure them to match their own preferences, so these settings are, more often than not, incorrect (unless the user happens to be within the US).
That is the wrong solution.
If a substantial segment of the user population can't find (or can't be bothered to find) their browser's language settings, then the correct response is to make them easier to find, not for sites to ignore what they've been set to. Perhaps make language settings directly accessible from the program's top level menu instead of burying it inside an over-complicated "Preferences" dialog. Perhaps ask for language preferences the first time the program is run. Perhaps use the operating system's localization settings. Or maybe something completely different, if that's what it takes to make it near-certain that the browser will be sending correct information about the user's preferences. But don't just throw up your hands, say "it's useless and can't be fixed!", and ignore it.
Other answers have talked about letting the user choose a language or locale in their profile on the site, which is also important and absolutely should be standard, but that's just to provide a site-specific override to the user's normal settings. If the user has not overriden this on the site, though, the correct action is to default to the most-preferred available language/locale as specified in their browser settings, not to base it on geolocation of their IP address.
At one point in my career, I maintained parts of TCP/IP stack. That puts me in the somewhat rare position of knowing very well that IP addresses should not be used as anything other than Network-layer addresses. Any association between an IP address and a location is all but coincidental - it's an artifact of the way addresses are distributed, not any fundamental part of what an IP address means.
(They're also not useful as the unique identifier of a computer, but that's a different story)
I suggest leaving geolocation out of it. The HTTP standard includes a way for a browser or other user agent to include the users culture preferences with each request (and remember, it's a list of weighted preferences, not necessarily just one culture). Since the browser is closer to the user than you are, you should honor this request, at least as the default.
It's ok to then permit the user to change their preference for your site, either temporarily or permanently. It's even ok to allow the user to choose to view different content with different culture settings. A wild example would be a site that includes both political news and technical information. It's quite reasonable that someone would want the news in their "natural" language, but the technical information in English.
Finally, it's ok to have a fallback pattern. If, for instance, you have a site that services users based on their region (resellers, for instance), then it's possible that Japanese content only exists on your Asian regional sub-site. A Japanese-speaking user visiting your EMEA site might just be stuck seeing English content, which might very well be his last choice.
On the sites I create I usually follow this pattern:
Each page has a unique URL with the language in it somewhere, usually like /en/page or a different (sub)domain
If the user opens a URL with an unspecified language like /page I start to guess:
Is a cookie from a previous session is available?
If not, is Accept-Language available and can I map it to a language available on the site?
If not, if it's a possibility, can I guess by IP?
If not, default to the site's default language.
I set a cookie with the guessed language and redirect the user to a site with the appropriate URL
I put a language switch on every page, so /en/page can easily be switched to /xx/page
Cookie gets updated if the user switches to a different page
Ideally I only have to guess once and from then on use the user's cookie, or the user visits the desired page directly.
I agree, give the user the chance to override them with user preferences in your app. This is especially handy for things like timezone localization issues which you can't derive from browser settings.
I risk being considered impolite, but I think my post on this topic will have more informative answers, mostly because my post is really a question. I am sorry though that I did not find that post before.
There's a difference between smart defaults and the ability of users to override them. In big apps I've worked on, I've assumed the user's locale from browser settings, geolocation, etc. -- but always given users a way to easily switch.
I don't know how else one would do that. Not giving users a chance to correct your assumptions is deeply problematic, because you're going to get it wrong some of the time.
ADDITION:
I think your problem here is that while you can edit your locale settings, if they look basically identical to the default, there's no way for an application developer to tell if you left it as-is intentionally, or because you don't know how or why to change it.
I suggest honoring users' localization settings, except if the setting is the overwhelming default, which users may not change. For example, I believe the great majority (90+%) of users with an en-us setting geolocated in Vietnam would almost always be better served by seeing Vietnamese content, rather than US English content, as long as there's a trivial way to switch locales. On the flip side, if a user geolocated in the US has a Vietnamese setting, by all means give him or her Vietnamese content.
Is this irritating for US-English users in Vietnam? Sure. But it's also the greatest good for the greatest number, and helps ensure that average non-technical users get the best real-world experience. Until we can hold a gun to users' heads and force them to honestly declare their language/culture preferences before turning on a computer, we're going to need heuristics like this.
I have seen enough forceful bug reports from customers that when investigated turn out to be that one of there users had the browser's culture setting wrong, that we now let the customer override the browsers with a config setting. The browser's culture setting is wrong often enough that is it not very useful, it is also too hard for most end users to find or change it.

Creative Terminology

I seem to use bland words such as node, property, children (etc) too often, and I fear that someone else would have difficulty understanding my code simply because the parts' names are vague, common words.
How do you find creative names for classes and components to make them more memorable?
I am particularly having trouble with generic tools which have no real description except their rather generic functional purpose. I would like to know if others have found creative ways to name things rather than simply naming them by their utility, such as AnonymousFunctionWrapperCallerExecutorFactory.
It's hard to answer. I find them just because they seem to 'fit'.
What I do know, however, is that I find it basically impossible to move on writing code unless something is named correctly, and it 'feels' good. If it isn't named right, I find it hard to use, and the code is generally confusing.
I'm not too concerned about something being 'memorable', only 'accurate'.
I have been known to sit around thinking out loud about what to name something. Take your time, and make sure you are really happy with the name. don't be afraid of using common/simple words.
I don't really have an answer, but three things for you to think about.
The late Phil Karlton famously said: "There are only two hard problems in computer science. Cache Invalidation and Naming Things." So, the fact that you are having trouble coming up with good names is entirely normal and even expected.
OTOH, having trouble naming things can also be a sign of bad design. (And yes, I am perfectly aware, that #1 and #2 contradict each other. Or maybe one should think of it more like balancing each other.) E.g., if a thing has too many responsibilities, it is pretty much impossible to come up with a good name. (Witness all the "Service", "Util", "Model" and "Manager" classes in bad OO designs. Here's an example Google Code Search for "ManagerFactoryFactory".)
Also, your names should map to the domain jargon used by subject matter experts. If you can't find a subject matter expert, that's a sign that you are currently worrying about code that you're not supposed to worry about. (Basically, code that implements your core business domain should be implemented and designed well, code in ancillary domains should be implemented and designed so-so, and all other code should not be implemented or designed at all, but bought from a vendor, where what you are buying is their core business domain. [Please interpret "buy" and "vendor" liberally. Community-developed Free Software is just fine.])
Regarding #3 above, you mentioned in another comment that you are currently working on implementing a tree data structure. Unless your company is in the business of selling tree data structures, that is not a part of your core domain. And the reason that you have trouble finding good names could be that you are working outside your core domain. Now, "selling tree data structures" may sound stupid, but there are actually companies that do that. For example, the BCL team inside Microsoft's developer division: they actually sell (well, for certain definitions of "sell", anyway) the .NET framework's Base Class Libraries, which include, among others, tree data structures. But note that for example Microsoft's C++ compiler team actually (literally) buys their STL from a third-party vendor – they figure that their core domain is writing compilers, and they leave the writing of libraries to a company who considers writing STLs their core domain. (And indeed, AFAIK, that company does nothing but write and sell STL implementations. That's their sole product.)
If, however, selling tree data structures is your core domain, then the names you listed are just fine. They are the names that subject matter experts (programmers, in this case) use when talking about the domain of tree data structures.
Using 'metaphors' is a common theme in agile (and pattern) literature.
'Children' (in your question) is an example of a metaphor that is extensively used and for good reasons.
So, I'd encourage the use of metaphors, provided they are applicable and not a stretch of the imagination.
Metaphors are everywhere in computing. From files to bugs to pointers to streams... you can't avoid them.
I believe that for the purpose of standardization and communication, it's good to use a common vocab, like in the same case for design patterns. I have a problem with a programmer who keeps 'inventing' his own terms and I have trouble understanding him. (He kept using the term 'events orchestrating' instead of 'scripting' or 'FCFS process'. Kudos for creativity though!)
Those common vocab describe stuff we are used to. A node is a point, somewhere in a graph, in a tree, or what-not. One way is to be specific to the domain. If we are doing a mapping problem, instead of 'node', we can use 'location'. That helps in a sense, at least for me. So I find there is a need to balance being able to communicate with other programmers, and at the same time keeping the descriptor specific enough to help me remember what it does.
I think node, children, and property are great names. I can already guess the following about your classes, just by their "bland" names:
Node - this class is part of a graph of objects
children - this variable holds a list of nodes belonging to the containing node.
I don't think "node" is either vague or common, and if you're coding a generic data structure, it's probably ok to have generic names! (With that being said, if you are coding up a tree, you could use something like TreeNode to emphasize that the node is part of a tree.) One way you can make the life of developers who will use your API easier is to follow the naming conventions of your platform's built in libraries. If everyone calls a node a node, and an iterator an iterator, it makes life easy.
Names that reflect the purpose of the class, method or property are more memorable than creative ones. Modern IDEs make it easier to use longer names so feel fee to be descriptive. Getting creative won't help as much as getting accurate.
I recommend to pick nouns from a specific application domain. E.g. if you are putting cars in a tree, call the node class Car - the fact that it is also a node should be apparent from the API. Also, don't try to be too generic in your implementation - don't put all attributes of the car into a hashtable named properties, but create separate attributes for make, color, etc.
A lot of languages and coding styles like to use all sorts of descriptive prefixes. In PHP there are no clear types, so this may help greatly. Instead of doing
$isAvailable = true;
try
$bool_isAvailable = true;
It is admittedly a pain, but usually well worth the time.
I also like to use long names to describe things. It may seem strange, but is usually easier to remember, especially when I go back to refactor my code
$leftNode->properties < $leftTreeNode->arrayOfNodeProperties;
And if all else fails. Why not fall back on a solid star wars themed program.
$luke->lightsaber($darth[$ewoks]);
And lastly, in college I named my classes after my professor, and then my class methods all the things I wanted to do to that jerk.
$Kube->canEat($myShorts, $withKetchup);