I'm considering using mediawiki as my company's internal knowledge base and am trying to understand how to build out effective team sections. Unfortunately, I'm not finding much information on this.
Ideally we'd have a separate knowledge base sections for devs, product, design and HR; all in the same system with the ability to cross-link. Each of these sections would be able to have it's own landing page and we could search for content specifically within that section.
It looks like using categories might work, but initially this feels clunky and I'm not sure if it provides the level of hierarchy I'm looking for. I would love to get your ideas and any links to examples that have done this well.
Thank you!
If by segregation you mean limited visibility (ie. team members generally shouldn't be able to see other members' content), then MediaWiki is probably not the right choice for you as it does not have granular read access control.
If you are simply looking for content organization, namespaces provide an ugly but easy way of partitioning (almost everything supports filtering by namespace). Categories are more elegant but not so well integrated - you can filter search results by category but you can't do it for most other things like recent changes or user contributions.
Related
We are developing a SaaS-application and currently facing the situation that different customers ask for different customizations. I have already googled and read a lot specifically about Multitenancy. I am also familiar with the Strategy Pattern.
But this still leaves me a bit confused about a good concept for an Angular 2+ application. Business logic is not gonna be the problem, as I can use Angular's dependency injection to load and use customized services for different customers. Also theming itself is not a problem, as we use Angular Material which has a nice theming engine build in. What does give me a headache are the templates itself. Of course I can use *ngIf and *ngSwitch within the HTML templates, but this is exactly the kind of code I want to avoid, because it will become horrifying once reaching 50+ customer versions.
Let's have a real life example. On a search-page all customers can search for objects and export single objects as a file download. One specific customer asks as to implement a mass export in a proprietary file format which needs a new button in the page, which obviously all the other customer should not see.
The three options I can think of for this scenario (and none of which I really like) are the following:
as mentioned before working in the template itself with *ngIf and/or *ngSwitch*
using the theming capabilities of Angular Material and working with css-only (display: none;)
maintaining multiple versions of the component (depending on the needs using component inheritance) and loading the correct version of the component depending on the user
All of them have obvious con's, just to name a few:
Nightmare to maintain once customer numbers grow and customizations become more frequent (think of a bigger component with 6 differentiations and 50 customers ...)
for now actually my favorite, but functionality not really disabled, just hidden (of course the back-end checks for permissions, but still more information is transferred to the users then necessary)
works well for the code-part of the components, but would mean to maintain massive amounts of duplicate template-code
I am sure we are not the first to tackle this issue. Am I overseeing any solution with less disadvantages? Are there any code patterns that I could apply here?
edit: after more discussion in our company we realized that there is another important point to this: some customers are hosted on their own servers, but most of them are being served from one central server. This means that the optional features have to detected and added at run-time, which implies some kind of awkwardness.
So our approach is to extend our existing licensing database to also contain the customer specific functionalities, which then obviously only that customer has a license for. Now the easy solution is to have a license endpoint and get all the licenses the customers has acquired, then every optional function can just sit in a simple single *ngIf. I appreciate that this is a simple and clean solution, but it offers the potential to find out some business facts about other customers of our company (by unobfuscating the code and finding additional endpoints etc.pp.). So probably combining this with server-side rendering would be the best solution I can think of right now.
Of course I don't have a clear cut solution that would totally fit your scenario, but here is an idea.
Divide your page into components that act as container regions.
For each customer create a customer configuration that would say
which atomic components goes in each region.
Create atomic components in which each component can be a single function isolated from the rest of the other components. Rely more on services to communicate between them. As an example for this atomic component is the button that create the new export in your example.
Create your page dynamically using ComponentFactory.
I have used the same approach before to customize a design toolbox based on a slide template (like powerpoint slides templates).
As for the options you mentioned, here are my 2 cents:
*ngIf and *ngSwitch, you can eliminate these if u create ur components dynamically and use granular or atomic components.
I don't think this would be a good approach in terms of architecture
and design. You are just manipulating the view css
If you use transclustion, this can minimize your code base if you
can group the components efficiently.
I hope this helps.
Say I have a collection of websites for accountants, like this:
http://www.johnvanderlyn.com
http://www.rubinassociatespa.com
http://www.taxestaxestaxes.com
http://janus-curran.com
http://ricksarassociates.com
http://www.condoaudits.com
http://www.krco-cpa.com
http://ci.boca-raton.fl.us
What I want to do is crawl each and get the names & emails of the partners. How should I approach this problem, at a high-level?
Assume I know how to actually crawl each site (and all subpages) & parse the HTML elements -- I am using Oga.
What I am struggling with is how to make sense of data that is presented in a wide variety of ways. For instance, the email address for the firm (and or partner) can be found in one of these ways:
On the About Us page, under the name of the partner.
On the About Us page, as a generic catch-all email.
On the Team page, under the name of the partner.
On the Contact Us page, as a generic catch-all email.
On a Partner's page, under the name of the partner.
Or it could be any other way.
One way I was thinking about approaching the email, is just to search for all mailto a tags and filter from there.
The obvious downside for this is that there is no guarantee that the email will be for the partner and not some other employee.
Another issue that is more obvious is detecting the partner(s) names just from the markup. I was initially thinking I could just pull all the header tags and text in them, but I have stumbled across a few sites that have the partner names in span tags.
I know SO is usually for specific programming questions, but I am not sure how to approach this and where to ask this. Is there another StackExchange site that this question is more appropriate for?
Any advice on specific direction you can give me would be great.
I looked at the http://ricksarassociates.com/ website and I cant find any partners at all so in my opinion you better stand to gain from this if not you better look for some other invention.
I have done similar datascraping from time to time, and in norway we have laws - or should I say "laws" - that you are not allowed to email people however you are allowed to email the company - so in a way the same problem from another angle.
I wish I knew maths and algorythms by heart because I am sure there is a fascinating sollution hidden in AI and machine learning, but in my mind the only sollution I can see is building a rule set that over time probably gets quite complex. Maby you could apply some bayesian filtering - it works very well for email.
But - to be a little more productive here. One thing i know is inmportant, you could start by creating the crawler environment and building the dataset. Have the database for URLS so you can add more at any time, and start the crawling on what you have already so that you do your testing querying your own data with a 100% copy. This will save you enormous time instead of live scraping while tweaking.
I did my own search engine some years ago, scraping all NO domains however I needed only the index file that time. Took over a week alone just to scrape it down and I think it was 8GB of data just for that single file, and I had to use several proxyservers aswell to make it work due to problems with to much DNS traffik. Lots of problems that needed being taken care of. I guess I am only saying - if you are crawling a large scale you might aswell start getting the data down if you want to work efficient with the parsing later.
Good luck, and do post if you get a sollution. I do not think it is posible without an algorythm or AI though - people design websites the way they like and they pull templates out of their arse so there are no rules to follow. You will end up with bad data.
Do you have funding for this? If so its simpler. Then you could just crawl each site, and make a profile for each site. You could employ someone cheap to manual go through the parsed data and remove all the errors. This is probably how most people does it, unless someone already have done it and the database is for sale / available from webservice so it can be scraped.
The links you provide are mainly US site, so I guess you are focusing on English names. In that case, instead of parsing from html tags, I would just search the whole webpage for name. (There are free database of first name and last name) This may also work if you are donig this for some other Europe company, but it would be a problem for company from some countries. Take Chinese as an example, while there is a fix set of last name, one may use basically any combination of Chinese character as first name, so this solution won't work for Chinese site.
It is easy to find email from a webpage as there is a fixed format of (username)#(domain name) with no space in between. Again I won't treat it as html tags but just as normal string so that the email can be found no matter it is in mailto tag or in plain text. Then, to determine what email is it:
Only one email in page?
Yes -> catch-all email.
No -> Is name found in that page as well?
No -> catch-all email (can have more than one catch-all email, maybe for different purpose like info + employment)
Yes -> Email should be attached to the name found right before it. It is normal that the name should appear before the email.
Then, it should be safe to assume the name appear first belongs to more important member, e.g. Chairman or partner.
I have done similar scraping for these types of pages, and it varies wildly from site to site. If you are trying to make one crawler to sort of auto find the information, it will be difficult. However, the high level looks something like this.
For each site you check, look for element patterns. Divs will often have labels, ID's, and classes which will easily let you grab information. Perhaps you find that many divs will have a particular class name. Check for this first.
It is often better to grab too much data from a particular page, and boil it down on your side afterwards. You could, perhaps, look for information which comes up on a screen by utilizing type (is link) or regex (is email) to look for formatted text. Names and occupation will be harder to find by this method, but might be related positionally on many pages to other well formatted items.
Names will often be affixed with honorifics (Mrs., Mr., Dr., JD, MD, etc.) You could come up with a bank of those, and check against them for any page you end up on.
Finally, if you really wanted to make this process general purpose, you could do some heuristics to improve your methods based off of expected information; names, for example, are most often within a particular list. If it was worth your time, you could check certain text for whether it matches a list of more common names.
What you mentioned in your initial question seems that you would have a lot of benefit with a general purpose Regular Expressions crawler, and you could make improvements on it as you know more about the sites which you interact with.
There are excellent posts on this topic with a lot of useful links throughout these webpages:
https://www.quora.com/What-is-a-good-web-scraper-for-pulling-emails-names-etc-even-if-the-contact-info-is-another-page-deep-a-browser-add-on-is-a-plus
http://www.hongkiat.com/blog/web-scraping-tools/
http://www.garethjames.net/a-guide-to-web-scraping-tools/
http://www.butleranalytics.com/15-web-scraping-tools/
Some of the examined applications are working in macOS.
A bunch of work is done for static pages that will not contain dynamic data, such as, contact, about us, home, etc. that can be updated fairly easily if the designer/developer has access to a site. Why is it a better practice to keep that information in a database that must construct the data on the regular?
If one thinks in terms of templates and a website administrator who is a lay person, then the database format in Content Management System makes more sense, because all the person has to do, for example, in order to change contact details on the Contact page, or change some updates on the Homepage, is to go into the CMS. It will be set-up in a Form type of look, that only needs filling in and submitting. The initial cost for a small static website with a CMS setup will be higher of course. However, if your homepage needs regular updates, it might be worth having a CMS. If there are very little changes throughout the year, one may opt to hire designer/developer services.
Rather than best practices, I would see it as cost and demand.
Because putting the content in the database and dynamic generation a the pages for each requests solve a lot of other issues. Cache invalidation is a hard problem. A static site where every page is build with different pieces of content from multiple sources (in Drupal: blocks, users, nodes, taxonomy terms, etc.) is like a gigantic cache.
But if you don't need the flexibility and the features of the CMS (like letting nearly non-technical users edit pieces of content), then don't go with CMS.
"A bunch of work is done for static pages that will not contain dynamic data, such as, contact, about us, home, etc. that can be updated fairly easily if the designer/developer has access to a site. Why is it a better practice to keep that information in a database that must construct the data on the regular?"
There are probably 3-4 key points to consider here.
First, why do people use WordPress? In many cases it's because they aren't web developers themselves and they don't want to have to hire one every time their board of directors (say) changes. I have several clients in this category. By putting the content in the database and rendering it in a template, non-technical users can manage their own pages as well.
Second, consider how easy it is for someone like you to create several static pages and simply use WordPress to power the blog. There are no shortage of startups and small businesses (especially in the tech space) that do that. Use something like Wappaylzer to see what folks are using.
Third, it's not a best practice to keep static information in the database. That's why page caches exist (along with static asset and opcode caches... no one recommends recomputing things unnecessarily as a performance best practice)
Fourth, consider the implication of removing the ability to edit pages from users. Or requiring them to know HTML. And if we start dealing with revision management as well, they might also need to learn Git. I for one hate having to help clients solve problems that they really should be able to solve themselves. I'm much happier if they can manage the simple things on their own. They tend to be happier with their websites, and I tend to get more interesting projects... good all the way around.
In sum, it's not a best practice for some people. Which is why some people don't do it that way. It's not a best practice to return to the database for static assets either, which is why folks who are reasonably aware of caching don't do it that way. But it is a good practice, at least, to let folks have greater control over their websites. It comes at a cost, and it's not right for everyone. But it certainly is right for a lot of people.... and maybe a lot more than simply "right." I think you could go so far as to say it's empowering, and one of the best things about WordPress in general
I'm implementing a help system for a desktop app (Win32) and am looking for how to go about designing it.
What kind of structure should a help system have, what's actually helpful for the user?
e.g. Should the help system be a big list of FAQ's (office 2010 help seems to be like this)? or should it be a Feature list documenting and describing what everything does (This is probably only helpful if the user is not sure how a feature they already know about works)
What kind of knowledge should I expect the end-user to have? It's probably slightly demeaning to write into the help file that File -> Open Project Closes the current project (if present) and Opens an existing project.
What I'm looking for here is some guidence, a set of features any good help system should have and a method of organising the topics in a way that users can find them.
"Open project" can also be a good place to put a reference to the definition of a project, and other more general descriptions and procedures relating to opening of projects.
In general CHM help is accessed either via context sensitive help (which is the typical for the file->open case) and via the general table of contents, fulltext search and indexes. Most recent apps seem to only create one help page per container (a dialogue, or pullodwn menu), where they list all the items on that screen (e.g. by annotated screenshot) and not a lemma for each item (checkbox, menu entry) in the GUI. Less clutter and navigation, and many points only need a fairly short description
Besides context sensitive help, one can also browse the help via the help system.
A CHM is pretty much a bit like an e-book, with a table of contents (TOC), index and optionally support for fulltext search.
The index and fulltext search are ways for the user to search for content.
The main difference is that the index is more under your control, and the fulltext search is largely automatic when enabled.
The TOC is a treeview of nodes that act like the TOC in a book, and should fixate the general structure of the "ebook". FAQs are typically an appendix in this TOC.
Besides this, there is a default "entry" page, which is like the homepage of a website. It should navigate users to the most commonly searched topics.
Be careful by comparing to Microsoft products. They sometimes use systems that are not available for end-users/developers yet.
Good help uses all these elements.
There are broadly two styles of help: reference-based (i.e. what does this checkbox mean?) and task-based (how do I achieve XYZ?).
You're probably best off creating a task-based tutorial first, backed up by a FAQ if necessary.
What practical benefits can my client get if I use microformats on his site for every possible thing?
How can I explain these benefits to a non-technical client?
Sometimes it seems like the practical benefits are hard to quantify.
Search engines already pick up and parse microformats (see e.g. https://support.google.com/webmasters/answer/99170). I believe hCard and hCalendar are fairly well supported--and if not, plenty of sites are using it, including places like MySpace.
It's the idea that adding CSS classes and specified IDs make your existing content easier to parse in a machine-readable manner.
hReview is starting to make some inroads, and hResume looks like it take off too.
I heavily use rel="nofollow" on uncontrolled links (3rd party sources) which is actually a microformat.
Check the microformats wiki for a decent starting point.
It just means your viewers can share a few generic "formats". You can generalize stylesheets, and parsing mechanisms. Rather than having a webpage consist of one "html document," you have a webpage that consist of "10 formatted micro-documents".
If you need a real world analog: think of it like attaching a formatted invoice, to a receipt, and a business card, rather than writing it all down on notebook paper with your left hand.
Overall the site becomes easier to digest for the rest of the internet. The data can be reused, combined, cross-referenced, and saved.
A simple example would be to have anywhere on the site a latitude and a longitude (geo). With Microformats, anybody that searches for that latitude and longitude can be easily referenced to their website, increasing traffic, awareness of that person / company, and allow users to easily save that information. (Although I've encountered little of this personally, this is more of 'the future' of things than it is current. But always good to stay up to date).
A second example would be a business card (hCard) where a browser can easily save and transfer it to an address book, so that just one visit to the site and the visitor has the information saved locally. Especially useful if they're getting hits from a cell phone.
I wouldn't recommend using microformats for "every possible thing". Use them for things where you get some benefit, in exchange for the effort of using them.
The main practical benefit I'm aware of is customised search engine results:
https://support.google.com/webmasters/answer/99170
Technically, Google now prefers this to be implemented using microdata (i.e. itemprop attributes) rather than microformats, but it's the same idea.
Having a micro-format can be better than no format since it lets you save every possible thing in the application.
A micro-format for every possible thing can be better than a standard format only because: it's quicker to create so it costs less and it take less space than some standard formats, like XML.
But all this depends on the context of the application and so you must explain it to the client in that context.
microformatting your content extends its reach in every, which way possible. using your sites structure as its "api" the possibilities are what you set your limits too