Related
I am asking a very basic question here.
Question is
I am using Apache Sling , Apache Jackrabbit, Apache Felix in my project as said by my instructor. I am trying to understand why these software is developed by Apache. I tried a lot on the internet,, but I didn't find any blog or wordpress blog, or any useful youtube video that explain all these projects. Can you explain me about these projects.
Why these projects developed?
What they do ?
and more questions like this
Previously I found the same doubt with Apache Hadoop, but all the material that I found on net is sufficient for me to get a feel of this project. This time I am struggling with Sling, Felix, Jackrabbit.
I will be very thankful for you. Waiting for your kind response.
The combination of Apache Jackrabbit, Apache Sling, and Apache Felix allows you to build web application.
Apache Jackrabbit is the reference implementation of the JCR API. The JCR API is to manage content repositories; to manage, for example, web content. A content repository is a mix between file system and a database.
The JCR API is specially made to deal with web content. Why use the JCR API, and why not use a relational database API? URLs are hierarchical, as in a file system. Relational databases don't easily support hierarchical access. Why not use a file system API? Because the JCR supports transactions, versioning, and a lot of other features that file system APIs don't support.
Apache Sling is a web framework based on the JCR API, and taking advantage of the features provided by the JCR API (15 Minute introduction).
Apache Felix is an OSGi container. It allows to seamlessly start, stop, and replace components of a web application (jar files, in a sense), while the web server is running. That means it allows you to change the application without having to restart the server.
Sling in very simple terms could be described as a REST API for JCR. you can use http requests to manage content inside the repository.
Additionally, Sling provides a mechanism to render that content in different ways for web consumption. you can use scripts (JSP for example) and the java code (servlets, pojos, etc) in the Felix container to process requests and deliver a request.
When a request is made for a particular node, Sling looks up for a property called sling:resourceType, this is a lookup key for rendering scripts. Then the appropiate script is executed using the node as input.
You could write different kinds for renderers and then use it to display your content in different ways.
For example, you could write two scripts full.json.jsp and short.json.jsp and then use them to render the same node in two different ways:
/content/app/node.full.json
OR
/content/app/node.short.json.
Sling basically matches tokens in the request URL to select an appropriate script.
They have a really nice cheat sheet that explains how request resolution and rendering works
it is a bit more complex than this, since everything is organized in resources and components. you want to check their site for more info
I had the same doubts. The best response I was able to find is in the official Sling page (https://sling.apache.org/)
(What is) Apache Sling, in a hundred words:
Apache Sling is a web framework that uses a Java Content Repository, such as Apache Jackrabbit, to store and manage content.
Sling applications use either scripts or Java servlets, selected based on simple name conventions, to process HTTP requests in a RESTful way.
The embedded Apache Felix OSGi framework and console provide a dynamic runtime environment, where code and content bundles can be loaded, unloaded and reconfigured at runtime.
So, resuming it:
Sling is a web framework --> using jackrabbit --> based/supported on JCR API.
You can see Apache Felix like a container with its manager.
Note that Sling started as an internal project at Day Software. It's the reason why some bundles/libraries are named like com.day, but in the ends they are two names for the same.
Also, if you want to be clear about Jackrabbit and JCR API you can visit the Jackrabbit's official page http://jackrabbit.apache.org/jcr/jackrabbit-architecture.html
Time and again I am faced with the issue of having multiple environments that must be configured individually for an application that would run in all of them (e.g. QA, regional production env's, dev, staging, etc.) and I am wondering what would be the best way to organize different configurations?
Would it be in the database? Different configuration files per environment? Or maybe the same file with different sections/xml tags? How would these be then deployed? Embedded within the app? Or put manually in after installation to be modified in-place?
This question is not technology-specific - I've worked with .net and Java, web-apps and desktop apps and this issue comes up time and again. I'm looking to learn different approaches to maybe adapt a hybrid to address this.
EDIT: There's one caveat that I must point out - when configuration is part of the deployed solution, it is generally installed under root user on the host. In large organizations developers usually don't have a root access to production hosts so any changes to the configuration require a new build and deployment. Needless to say this isn't the nicest approach - especially at organizations that have a very strict release process involving multiple teams and approval levels... (sigh I know!)
Borrowed from Jez Humble and David Farley's book "Continuous Delivery (page 41)", you can:
Your build scripts can pull configuration in and incorporate it into your binaries at build time.
Your packaging software can inject configuration at packaging time, such as when creating assemblies, ears, or gems.
Your deployment scripts or installers can fetch the necessary information or ask the user for it and pass it to your application at
deployment time as part of the installation process.
Your application itself can fetch configuration at startup time or run time.
It is considered bad practice by them to inject configuration files in build and compile times, because you should be able to deploy the same binary file to every environments.
My experience was that you could bake all configuration files for every environments (except sensitive information) to your deployment file (war, jar, zip, etc). And you design your application to take in an extra parameter when starts, to pickup the right sets of configuration files (from your extracted deployment file, or from local/remote file system if they are sensitive, or from a database) during application's startup time.
The question is difficult to answer because it's somewhat vague. There is no technology-agnostic approach to configuration as far as I know. Exactly how configuration is set up will depend on the language/technology in question.
I'm not familiar with .net but with java a popular approach is to have a maven build set up with different profiles. Each profile is specific to an environment. You can then define different properties files that have environment-specific values, an example from the above link is:
environment.properties - This is the default configuration and will be packaged in the artifact by default.
environment.test.properties - This is the variant for the test environment.
environment.prod.properties - This is basically the same as the test variant and will be used in the production environment.
You can then build your project as follows:
mvn -Pprod package
I have good news and bad news.
The good news is that Config4* (of which I am the maintainer) neatly addresses this issue with its support for adaptive configuration. Basically, this is the ability for a configuration file to adapt itself to the environment (including hostname, username, environment variables, and command-line options) in which it is running. Read Chapter 2 of the "Getting Started" manual for details. Don't worry: it is a short chapter.
The bad news is that, currently, Config4* implementations exist only for C++ and Java, so your .Net applications are out of luck. And even with C++ and Java applications, it won't make pragmatic sense to retrofit Config4* into an existing application. Because of this, I'd advise trying to use Config4* only in new applications.
Despite the bad news, I think it is worth your while to read the above-mentioned chapter of the Config4* documentation, because doing so may provide you with ideas that you can adapt to fit your needs.
I often find the quote "InstallUtil.exe" is an ugly pattern or "Don't use InstallUtil.exe" and that I should use native WIX or Installation package patterns and I still don't understood why.
I stepped away from using InstallUtil to install a .NET service as I finally learnt that writing registry keys for such an action should be an un-install-able action - and I've come to terms with this as correct.
As I've been working through my WIX installer for a relatively complex product, I have found myself in need of creating or updating SQL Server databases, creating or updating IIS Applications and finally updating or creating configuration files.
Each of my components (features) are optional, but they all share the same configuration file. As my product uses unity, its important to note that this library contains strong support for reading/updating/removing components from the Unity Configuration block, therefore it seems fairly smart to me that I should take advantages of these blocks via Installation Components (i.e. InstallUtil) to create or update my configuration file at installation time.
Just to be clear here, my installer does not natively contain a configuration file for my application: at installation time, the installer has no idea as to the shape of it as its based on the features selected. Surely I should be embedding this knowledge into each of the modules that are to be deployed and not in the remit of the installer which is now a completely independent project? Wouldn't this break O-O principals even if we are talking about installation?
I'd really appreciate some guidance as to whether this is good practise or not? Am I reading 'InstallUtil' is bad for installing services, or is it that using 'InstallUtil' is bad full-stop? If so, what are my options for smart updating of configuration files?
The main reason for avoiding InstallUtil is that it runs outside of the installation transaction, so Windows Installer cannot keep track of what it's done.
I have used InstallUtil on a few occasions, when I just couldn't get Wix to do what I needed and didn't have time to write a custom action. In this case I called the InstallUtilLib version as I feel this is a cleaner approach.
I used the this blog as a guide as to how to achieve this.
Many programs include an auto-updater, where the program occasionally looks online for updates, and then downloads and applies any updates that are found. Program bugs are fixed, supporting files are modified, and things are (usually) made better.
Unfortunately no matter how hard I look, I can't find information on this process anywhere. It seems like the auto-updaters that have been implemented have either been proprietary or not considered important.
It seems fairly easy to implement the system that looks for updates on a network and downloads them if they are available. That part of the auto-updater will change significantly from implementation to implementation. The question is what are the different approaches of applying patches. Just downloading files and replacing old ones with new ones, running a migration script that was downloaded, monkey patching parts of the system, etc.? Concepts are preferred, but examples in Java, C, Python, Ruby, Lisp, etc. would be appreciated.
I think that "language agnostic" is going to be a limiting factor here. Applications come in so many shapes and sizes that there is no one-size-fits-all answer. I have implemented several auto-updaters in several languages, and no two were similar.
The most general philosophy is that the application checks with some home location (web address, web query, corporate network location, etc.) to either ask if it's version is current, or ask what the most current version is. If the answer calls for an update, that process will be different for each situation.
A popular alternative is to invite the home location to run a script when the application is initiated. The script can check the version, download updates if necessary, and ask for usage feedback, for example.
We can probably help better if you narrow the parameters.
UPDATE: The approach to "patching" also depends on the nature of the application, and there's a very wide diversity here. If you have a single executable file, for instance, then it's probably most practical to replace the executable. If your application has many files, you should look for ways to minimize the number of files replaced. If your application is highly customized or parameterized, you should strive to minimize the re-tailoring effort. If your application employs interpreted code (such as an Excel VBA application or MS Access MDB application), then you may be able to replace parts of the code. In a Java application you may only need to replace a JAR file, or even a subset of the JAR contents. You'll also need to have a way to recognize the current client version, and update it appropriately. I could go on and on, but I hope you see my point about diversity. This is one of those many times when the best answer usually starts with "Well, it depends ...!" That's why so many answers include "Please narrow the parameters."
Be sure to also consider the security implications of sucking down information about the update, as well as the update binaries themselves.
Do you trust the source of the download? You maybe phoning home to got your update, but what if there is a man in the middle who redirects to a malicious server. An HTTPS or similar secure connection will help, but double checking the bits that you eventually download by using a digital signature check is recommended.
First you need a file on your application home web site with the latest version.
The best way I think to have special SQL table for this task and populate it automatically after publishing new version / nightly build completion.
Your application creates new thread which requests built-in http link with version and compares in with current. In .NET use can use code like this:
Version GetLatestVersion() {
HttpWebRequestrequest = (HttpWebRequest)WebRequest.Create(new Uri(new Uri(http://example.net), "version.txt));
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (request.HaveResponse)
{
StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.Default);
return new Version(stream.ReadLine());
}
else
{
return null;
}
}
Version latest = GetLatestVersion();
Version current = new Version(Application.ProductVersion);
if (current < latest)
{
// you need an update
}
else
{
// you are up-to-date
}
In this example, version.php in only one plain string like 1.0.1.0.
Another tip I can give - how to download an update.
I like very much next idea: in the resources of your application there is a string of CLR-code which you compile on-the-fly (using CodeDom) to a temporary folder, main application calls it and goes to close. Updater reads arguments, settings or registry and downloads new modules. And calls main application which deletes all temporary files. Done!
(But everything here is about .NET)
The simplest solutions (used by many programs) is running the uninstaller for the previous version and the running the installer for the new one (optionally skipping questions which the user has already answered, like the EULA). The only catch is that the new version must be able to read the configuration options from the old version.
Also, on Windows you can't delete an executable file which is in use, so you probably will want to drop a small executable in Temp folder, which runs the whole process and then delete it at the end from the instance of the new version which got launched (or just register it to be deleted at the next reboot).
Because auto updating is a common scenario, most languages have at least one package available to support this. (Below I list some of the available packages)
One of the really nice idea's is the ClickOnce distribution for .NET, it's an installer which sandboxes your application and installs in the user context, so no administrator rights required. You can configure the ClickOnce in your publish to check for updates each application start.
Java has Java Web Start which offers the same kind of functionality for java applets.
Delphi has numerous articles about auto-updating, Torry has a list of WebUpdate components, for instance GoUpdater seems to have a very wide range of functionality.
They all use a website/network share to check for a new version and than retrieve either a patch or a complete install file and run it. So you should try to find a nice package for your application, to save you the hassle of developing and maintaining your own solution.
The simplest approach would be to have your program query a server (website) to see if there is an update. If there is an update you could display a message to the user that prompts them to download a newer version and provides a link.
An alternative and more complex solution would be to create a small windows service (or unix daemon) that checks periodically to see if there are updates, this service can download the update and launch the installer.
The general architecture is that you have a central server that you control that knows the latest version and where to get it. Then the programs query the server. I am not going to include sample code because it is highly defendant on the server and the format you choose. It is not terrible difficult though.
This is not so much a complete answer, but rather one example of auto-updating mechanism I implemented recently. The situation is a little different from the tradition Firefox-type of user application, since it was an internal tool used at work.
Basically, it's a little script that manages a queue of Subversion branches to be built and packaged in an installer. It reads a little file, where the names of the branches are written, takes the first one, re-writes it at the end of the file, and launches the build process, which involves calling a bunch of scripts. The configuration for each branch to build is written in a .INI file, stored in a Subversion repository along with the tool itself.
Because this tool runs on several computers, I wanted a way to update it automatically on all machines as soon as I made a change either to the tool itself, or to the configuration scripts.
The way I implemented it was simple: when I launch the tool, it becomes an "outer shell". This outer shell does 2 very simple things:
svn update on itself and on the configuration files
launch itself again, this time as the "inner shell", the one that actually handles one configuration (and then exits again).
This very simple update-myself-in-a-loop system has served us very well for a few months now. It's very elegant, because it is self-contained: the auto-updater is the program itself. Because "outer shell" (the auto-updater part) is so simple, it doesn't matter that it does not benefit from the updates as the "inner shell" (which gets executed from the updated source file every time).
One thing that hasn't really been mentioned is that you should seriously consider that the user running your program might not actually have sufficient privileges to upgrade it. This should be pretty common at least for business users, probably less so for home users.
I'm always working with a (self-imposed) limited account for security reasons and it always pisses me off that most auto-updaters simply assume that I'm running as admin and then after downloading just fail and offer no other way of performing the update other than actually closing the program and running it again in an administrative context. Most do not even cache the downloaded update and have to do it all over again.
It'd be much better if the auto-updater would simply prompt for admin credentials when needed and get on with it.
I'm going to assume answer for Windows.
This way seems to work well.
In the installer do:
1. Create a manual-start service that runs as LocalSystem that when started does the update then stops.
2. Change the service permissions so all users can start the service (if all users should be able to update w/o admin rights).
3. Change the main program to check for updates when started using a simple mechanism. If it detects an update, prompt if the user wants to apply it.
4. If user accepts the update, start the service.
If the architecture allows for it, create a way to monitor the update as it is running.
In a Java-Webstart setting you start a JNLP file which then triggers the download of the Jar files needed to run the application. Everytime webstart checks if there are newer versions of the Jars and would download them replacing the locally cached ones. With a tool named jardiff you will create only diffs towards the newer jars and distribute these via the server (e.g. only get an update).
Pros:
always up to date
Cons:
you need an application server (tomcat, JBoss) in order to distribute the files
you need an internet connection in order to get the application
Reading Carl Seleborgs answer gave me some ideas how a generic code-repository could be useful.
svn comes with a tool called svnsync, which sort of behaves like an svn export but keeps track of the actual revision your export is at.
Someone could utilize this system in order to only fetch the changed files from the users actual revision.
In actuality, you will have a repository with the binaries compiled, and running svnsync will only fetch the binaries that has been modified. It might also be able to merge local changes to text-based configuration files with new configuration-options.
The function of installing a patch to a program is basically one of the basic functions of an installer. Installer software is documented in numerous places but usually on a per-installer basis: There the Microsoft Installer (with Install Shield Extensions), Ruby gems, Java .jar files, the various Linux package manager systems (RPM, Apt-get)and others.
These are all complex systems which solve the problem of patching program in general but for slightly different systems. To decide what is best for you, consider which of these system your application most resembles. Rolling your own is fine but looking at these systems is a place to start.
You can write an internal module of your application to do updates. You can write an external mini application to do updates.
Also look at .NET on-the-fly compilation technology, it makes possible to create such mini application on-the-fly on demand. For example, http://fly.sf.net/
You can use my solution (part of the Target Eye project).
http://www.codeproject.com/Articles/310530/Target-Eye-Revealed-part-Target-Eyes-Unique-Auto
If your software is open sourced, and target Linux or developers. It is interesting to install your software as a git repo. And having it pull the stable branch occasionally or everytime when it is launched.
This is particular easy when your application is managed via npm, sbt, mavan, stack, elm-package or alike.
After hours of searching some working solution for this problem I've finally implemented auto update mechanism for python script that works on Linux and Windows.
In short - the script before running actual work checks for update on S3 and if it's available downloads it, unzips, creates or updates the symlink (or junction on Windows) and re-runs the script with already the new version with original arguments.
The full source code and the explanation can be found here.
If you are searching for an cross-platform software update solution, take a look at www.updatenode.com
Some highlights:
free for Open Source projects
cross-platform & Open Source update client tool
localized already for the most important languages
easy to integrate and easy to handle
cloud based management platform to define and manage updates
provides additionally support for displaying messages (inform about new events, products, etc.)
web interface is open (you can create your own client using the service)
many usage statistics, as used operating systems, geo location, version usage, etc.
Android API for mobile App updates
Just try it.
BTW, I am part of the dev team for the open source client. :)
I need to find a way to crawl one of our company's web applications and create a static site from it that can be burned to a cd and used by traveling sales people to demo the web site. The back end data store is spread across many, many systems so simply running the site on a VM on the sale person's laptop won't work. And they won't have access to the internet while at some clients (no internet, cell phone....primitive, I know).
Does anyone have any good recommendations for crawlers that can handle things like link cleanup, flash, a little ajax, css, etc? I know odds are slim, but I figured I'd throw the question out here before I jump into writing my own tool.
By using a WebCrawler, e.g. one of these:
DataparkSearch is a crawler and search engine released under the GNU General Public License.
GNU Wget is a command-line operated crawler written in C and released under the GPL. It is typically used to mirror web and FTP sites.
HTTrack uses a Web crawler to create a mirror of a web site for off-line viewing. It is written in C and released under the GPL.
ICDL Crawler is a cross-platform web crawler written in C++ and intended to crawl websites based on Website Parse Templates using computer's free CPU resources only.
JSpider is a highly configurable and customizable web spider engine released under the GPL.
Larbin by Sebastien Ailleret
Webtools4larbin by Andreas Beder
Methabot is a speed-optimized web crawler and command line utility written in C and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
Jaeksoft WebSearch is a web crawler and indexer build over Apache Lucene. It is released under the GPL v3 license.
Nutch is a crawler written in Java and released under an Apache License. It can be used in conjunction with the Lucene text indexing package.
Pavuk is a command line web mirror tool with optional X11 GUI crawler and released under the GPL. It has bunch of advanced features compared to wget and httrack, eg. regular expression based filtering and file creation rules.
WebVac is a crawler used by the Stanford WebBase Project.
WebSPHINX (Miller and Bharat, 1998) is composed of a Java class library that implements multi-threaded web page retrieval and HTML parsing, and a graphical user interface to set the starting URLs, to extract the downloaded data and to implement a basic text-based search engine.
WIRE - Web Information Retrieval Environment [15] is a web crawler written in C++ and released under the GPL, including several policies for scheduling the page downloads and a module for generating reports and statistics on the downloaded pages so it has been used for web characterization.
LWP::RobotUA (Langheinrich , 2004) is a Perl class for implementing well-behaved parallel web robots distributed under Perl 5's license.
Web Crawler Open source web crawler class for .NET (written in C#).
Sherlock Holmes Sherlock Holmes gathers and indexes textual data (text files, web pages, ...), both locally and over the network. Holmes is sponsored and commercially used by the Czech web portal Centrum. It is also used by Onet.pl.
YaCy, a free distributed search engine, built on principles of peer-to-peer networks (licensed under GPL).
Ruya Ruya is an Open Source, high performance breadth-first, level-based web crawler. It is used to crawl English and Japanese websites in a well-behaved manner. It is released under the GPL and is written entirely in the Python language. A SingleDomainDelayCrawler implementation obeys robots.txt with a crawl delay.
Universal Information Crawler Fast developing web crawler. Crawls Saves and analyzes the data.
Agent Kernel A Java framework for schedule, thread, and storage management when crawling.
Spider News, Information regarding building a spider in perl.
Arachnode.NET, is an open source promiscuous Web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages. Arachnode.net is written in C# using SQL Server 2005 and is released under the GPL.
dine is a multithreaded Java HTTP client/crawler that can be programmed in JavaScript released under the LGPL.
Crawljax is an Ajax crawler based on a method which dynamically builds a `state-flow graph' modeling the various navigation paths and states within an Ajax application. Crawljax is written in Java and released under the BSD License.
Just because nobody copy pasted a working command ... I am trying ... ten years later. :D
wget --mirror --convert-links --adjust-extension --page-requisites \
--no-parent http://example.org
It worked like a charm for me.
wget or curl can both recursively follow links and mirror an entire site, so that might be a good bet. You won't be able to use truly interactive parts of the site, like search engines, or anything that modifies the data, thoguh.
Is it possible at all to create dummy backend services that can run from the sales folks' laptops, that the app can interface with?
You're not going to be able to handle things like AJAX requests without burning a webserver to the CD, which I understand you have already said is impossible.
wget will download the site for you (use the -r parameter for "recursive"), but any dynamic content like reports and so on of course will not work properly, you'll just get a single snapshot.
If you do end up having to run it off of a webserver, you might want to take a look at:
ServerToGo
It lets you run a WAMPP stack off of a CD, complete with mysql/php/apache support. The db's are copied to the current users temp directory on launch, and can be run entirely without the user installing anything!