What is the purpose behind building of Apache Sling, Felix, Jackrabbit projects - open-source

I am asking a very basic question here.
Question is
I am using Apache Sling , Apache Jackrabbit, Apache Felix in my project as said by my instructor. I am trying to understand why these software is developed by Apache. I tried a lot on the internet,, but I didn't find any blog or wordpress blog, or any useful youtube video that explain all these projects. Can you explain me about these projects.
Why these projects developed?
What they do ?
and more questions like this
Previously I found the same doubt with Apache Hadoop, but all the material that I found on net is sufficient for me to get a feel of this project. This time I am struggling with Sling, Felix, Jackrabbit.
I will be very thankful for you. Waiting for your kind response.

The combination of Apache Jackrabbit, Apache Sling, and Apache Felix allows you to build web application.
Apache Jackrabbit is the reference implementation of the JCR API. The JCR API is to manage content repositories; to manage, for example, web content. A content repository is a mix between file system and a database.
The JCR API is specially made to deal with web content. Why use the JCR API, and why not use a relational database API? URLs are hierarchical, as in a file system. Relational databases don't easily support hierarchical access. Why not use a file system API? Because the JCR supports transactions, versioning, and a lot of other features that file system APIs don't support.
Apache Sling is a web framework based on the JCR API, and taking advantage of the features provided by the JCR API (15 Minute introduction).
Apache Felix is an OSGi container. It allows to seamlessly start, stop, and replace components of a web application (jar files, in a sense), while the web server is running. That means it allows you to change the application without having to restart the server.

Sling in very simple terms could be described as a REST API for JCR. you can use http requests to manage content inside the repository.
Additionally, Sling provides a mechanism to render that content in different ways for web consumption. you can use scripts (JSP for example) and the java code (servlets, pojos, etc) in the Felix container to process requests and deliver a request.
When a request is made for a particular node, Sling looks up for a property called sling:resourceType, this is a lookup key for rendering scripts. Then the appropiate script is executed using the node as input.
You could write different kinds for renderers and then use it to display your content in different ways.
For example, you could write two scripts full.json.jsp and short.json.jsp and then use them to render the same node in two different ways:
/content/app/node.full.json
OR
/content/app/node.short.json.
Sling basically matches tokens in the request URL to select an appropriate script.
They have a really nice cheat sheet that explains how request resolution and rendering works
it is a bit more complex than this, since everything is organized in resources and components. you want to check their site for more info

I had the same doubts. The best response I was able to find is in the official Sling page (https://sling.apache.org/)
(What is) Apache Sling, in a hundred words:
Apache Sling is a web framework that uses a Java Content Repository, such as Apache Jackrabbit, to store and manage content.
Sling applications use either scripts or Java servlets, selected based on simple name conventions, to process HTTP requests in a RESTful way.
The embedded Apache Felix OSGi framework and console provide a dynamic runtime environment, where code and content bundles can be loaded, unloaded and reconfigured at runtime.
So, resuming it:
Sling is a web framework --> using jackrabbit --> based/supported on JCR API.
You can see Apache Felix like a container with its manager.
Note that Sling started as an internal project at Day Software. It's the reason why some bundles/libraries are named like com.day, but in the ends they are two names for the same.
Also, if you want to be clear about Jackrabbit and JCR API you can visit the Jackrabbit's official page http://jackrabbit.apache.org/jcr/jackrabbit-architecture.html

Related

Can I use Teacup to manage custom offline Tcl code

I'm trying to figure out a good way for my company to have a local repository/package manager (something a little more user friendly than git). I like Teacup and we are using ActiveState Tcl anyways (Tcl 8.5, we have legacy systems using this version).
Can I use Teacup to make my own offline package manager repo? Sort of like how you can do that with Anaconda in Python. It has to be totally offline but I want to be able to upload packages to it when I make them and let Teacup handle the installation of them for other users in my company.
I've read through this page a little bit but it is missing some content.
You are looking for the server-side component to the "teaparty": teapot as the server providing for the teacup client:
https://wiki.tcl-lang.org/page/Teapot
While there is a dedicated teapot (server) implementation available from ActiveState, the client/ server protocol is straight forward: It is about generating markup (HTML) resources delivered via HTTP (containing table DOM structures) and processed by the teacup client. As always, these resources can be generated statically or dynamically, or anything inbetween.
Watch the examples at:
http://teapot.rkeene.org/index.html
Better:
view-source:http://teapot.rkeene.org/index.html
Assuming your Tcl projects are hosted using some SCM repo, you may provide some repository (CD, pipeline) action to produce a static resource structure served by a HTTP server of your choice? The original teacup client can be used against this resource collection.

Can I make calls to APIs such as youtube-dl and ffmpeg from a chrome-app?

First of all, I haven't started the implementation of the system I'm about to describe, as I didn't want to commit on implementing something I did not know if was possible.
So, what I'm trying to achieve is to build a chrome-app to download the audio from certain websites (e.g. youtube and soundcloud) using youtube-dl, post process it using ffmpeg and then upload it to a cloud service via some api. The reason I want to do it via a chrome-app is because I could do all the work on the client side (no need for servers) and I'd have the ability to insert javascript into the pages using content scripts, which would make the app pretty simple to use (I could create buttons such as 'download song' and stuff like that).
Although I have already read the documentation explaining the NaCl Technical Overview and some of the Application Structure, I still am not sure as to whether I would be able to make these calls via some C/C++ module or if I would get denied due to security reasons.
To summarize: considering that the user has the needed dependencies in his system (youtube-dl, python, ffmpeg and etc.), is it possible to make calls to third party APIs such as the ones described before via a chrome-app using NaCl ?
Thank you all in advance,
Chrome apps are normally sandboxed.
Less so than extensions - they can reach much more system resources via app APIs.
But still, what you mention is executing libraries / utilities out of browser, and it's not normally allowed.
(P)NaCl is tightly sandboxed in this regard. See this old question, it still applies: you can only use 3rd-party code that compiles into NaCl along with your app, not just link to a library. There are some library ports to NaCl, but it's not automatic.
Normally, a few years back you would use a mechanism like NPAPI to reach out and use a library out of browser. It's deprecated, and won't work anymore. In its place, Chrome offers a pipe-like (through stdio) connection to an external program called Native Messaging. You could use it to perform operations with system-level libraries and tools, but the downside is that you can't bundle the native host with your app, you'll need a separate installer.

Node.js SOA with JSON web-services - configuration

I am starting research on how to implement Node.js SOA (service oriented architecture) with JSON web-services.
As a small sub-question, I need an approach/framework/system to make universal configuration center for all companies web-services. So that we don't configure every application with exact address of other application, but just link to some central server to get that information.
(This should be very well worked-out topic for XML-based services, so some terminology/approaches/etc could/should be borrowed.)
Related to
RESTful JSON based SOA Registry
Service Oriented Architecture suggestions
UPDATE: This questions is about web-services configuration & orchestration.
GO for an active(having activity happening off late) framework with lean architecture.There's one called Geddy and another called Restify. If in doubt, Express can also be used for building webservices with JSON.
You can work on reading the centrally stored config from different app codebse when you use any of these.

ASP.NET web api: documenting/specifying a service

I've been looking at asp.net Web Api, and I like the simplicity of implementing a practical web service.
However, how can I document/specify the interface of a service implemented like that? For example, is there any spec I can pass on or generate to a Java guy with no .NET background that will let him easily call and consume the service? What can I give to the javascript guy?
Ideally, I'd like the benefits of SOAP/XSD or something like it (easy to deserialize with nicely typed objects) for the java guy, while retaining a service that's callable from a web browser too (i.e. supports non-crufy JSON).
Update
It's worth noting that since I originally posted this question, I discovered ServiceStack which deals with this more naturally; supporting JSON, SOAP, and WSDL out of the box for the same service, as the client chooses. If you really want SOAP+JSON, it may be a better framework than ASP.NET Web Api.
Update March 2016
It has been a while since this was answered and the tooling for documenting any Rest API has come along a lot. We are currently evaluating Swagger 2.0 now spawning out to the the Open Api Initiative, RAML and API Blueprint.
For WebAPI projects there is a tool Swashbuckle that auto creates Swagger (Open API) format documentation.
Format for documenting a REST service:
There are some attempts at structuring and standardising the description of REST services:
Web Application Desciption Language (WADL)
Web Service Description Language 2.0 (WSDL 2.0)
I think it is fair to say neither of the two approaches above have very wide adoption, but WADL does look like a nice concise format - a quick XSLT over the top and it could be a nice human readable format. There lots of examples of WADL for some famous API's at the apigee github site here.
When trying to find a documentation format that is appropriate I tend to look for "inspiration" from others.... Apigee do a lot of research in this area and have this as documentation for one of their API's here or take a look at Facebook's social graph api here.
The examples are largely in line with the advise here
How to auto document:
Using .NET: There is a good example of auto generating a WebApi "help" page here. A logical extension of this example may be to get it outing a WADL formated version as well...
Using Java: Jersey is a tool used in the Java community to generate WADL automatically.
What to share with the other developers:
Your Javascript guy will most likely want a manual like the Facebook and apigee one; giving the dev examples of the resources, urls, response codes etc. The most important thing here will be supporting JSON as the primary content type this will be the easiest for him/her to consume and work with by far.
Your Java guy would also want the manual, but also in theory they could be given example XSD for any XML representations of the resources you send/consume (assuming they make the request as "Content-Type: appplication/xml"). This may help them build proxy classes etc. JSON to Java and .NET converters are available online and given the example resources in your manual they should simply be able to use one of these types of services to quickly create proxies. Generate Java class from JSON?.
If you absolutely must have auto discovery, auto proxy generation etc then you may need to offer a choice of both REST and SOAP (with WSDL) endpoints - relevant question here: ReST Proxy Object Generator.
You can use IApiExplorer interface and ApiExplorer class in order to create a help page for your Web Api service. This help page will describe the REST methods exposed by your service so any developer who understands how REST works will be able to use it (regardless the language). Please read below links for details and samples:
ASP.NET Web API: Introducing IApiExplorer/ApiExplorer
ASP.NET Web API: Generating a Web API help page using ApiExplorer
Documenting your ASP.Net Web API’s

How do you turn a dynamic site into a static site that can be demo'd from a CD?

I need to find a way to crawl one of our company's web applications and create a static site from it that can be burned to a cd and used by traveling sales people to demo the web site. The back end data store is spread across many, many systems so simply running the site on a VM on the sale person's laptop won't work. And they won't have access to the internet while at some clients (no internet, cell phone....primitive, I know).
Does anyone have any good recommendations for crawlers that can handle things like link cleanup, flash, a little ajax, css, etc? I know odds are slim, but I figured I'd throw the question out here before I jump into writing my own tool.
By using a WebCrawler, e.g. one of these:
DataparkSearch is a crawler and search engine released under the GNU General Public License.
GNU Wget is a command-line operated crawler written in C and released under the GPL. It is typically used to mirror web and FTP sites.
HTTrack uses a Web crawler to create a mirror of a web site for off-line viewing. It is written in C and released under the GPL.
ICDL Crawler is a cross-platform web crawler written in C++ and intended to crawl websites based on Website Parse Templates using computer's free CPU resources only.
JSpider is a highly configurable and customizable web spider engine released under the GPL.
Larbin by Sebastien Ailleret
Webtools4larbin by Andreas Beder
Methabot is a speed-optimized web crawler and command line utility written in C and released under a 2-clause BSD License. It features a wide configuration system, a module system and has support for targeted crawling through local filesystem, HTTP or FTP.
Jaeksoft WebSearch is a web crawler and indexer build over Apache Lucene. It is released under the GPL v3 license.
Nutch is a crawler written in Java and released under an Apache License. It can be used in conjunction with the Lucene text indexing package.
Pavuk is a command line web mirror tool with optional X11 GUI crawler and released under the GPL. It has bunch of advanced features compared to wget and httrack, eg. regular expression based filtering and file creation rules.
WebVac is a crawler used by the Stanford WebBase Project.
WebSPHINX (Miller and Bharat, 1998) is composed of a Java class library that implements multi-threaded web page retrieval and HTML parsing, and a graphical user interface to set the starting URLs, to extract the downloaded data and to implement a basic text-based search engine.
WIRE - Web Information Retrieval Environment [15] is a web crawler written in C++ and released under the GPL, including several policies for scheduling the page downloads and a module for generating reports and statistics on the downloaded pages so it has been used for web characterization.
LWP::RobotUA (Langheinrich , 2004) is a Perl class for implementing well-behaved parallel web robots distributed under Perl 5's license.
Web Crawler Open source web crawler class for .NET (written in C#).
Sherlock Holmes Sherlock Holmes gathers and indexes textual data (text files, web pages, ...), both locally and over the network. Holmes is sponsored and commercially used by the Czech web portal Centrum. It is also used by Onet.pl.
YaCy, a free distributed search engine, built on principles of peer-to-peer networks (licensed under GPL).
Ruya Ruya is an Open Source, high performance breadth-first, level-based web crawler. It is used to crawl English and Japanese websites in a well-behaved manner. It is released under the GPL and is written entirely in the Python language. A SingleDomainDelayCrawler implementation obeys robots.txt with a crawl delay.
Universal Information Crawler Fast developing web crawler. Crawls Saves and analyzes the data.
Agent Kernel A Java framework for schedule, thread, and storage management when crawling.
Spider News, Information regarding building a spider in perl.
Arachnode.NET, is an open source promiscuous Web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages. Arachnode.net is written in C# using SQL Server 2005 and is released under the GPL.
dine is a multithreaded Java HTTP client/crawler that can be programmed in JavaScript released under the LGPL.
Crawljax is an Ajax crawler based on a method which dynamically builds a `state-flow graph' modeling the various navigation paths and states within an Ajax application. Crawljax is written in Java and released under the BSD License.
Just because nobody copy pasted a working command ... I am trying ... ten years later. :D
wget --mirror --convert-links --adjust-extension --page-requisites \
--no-parent http://example.org
It worked like a charm for me.
wget or curl can both recursively follow links and mirror an entire site, so that might be a good bet. You won't be able to use truly interactive parts of the site, like search engines, or anything that modifies the data, thoguh.
Is it possible at all to create dummy backend services that can run from the sales folks' laptops, that the app can interface with?
You're not going to be able to handle things like AJAX requests without burning a webserver to the CD, which I understand you have already said is impossible.
wget will download the site for you (use the -r parameter for "recursive"), but any dynamic content like reports and so on of course will not work properly, you'll just get a single snapshot.
If you do end up having to run it off of a webserver, you might want to take a look at:
ServerToGo
It lets you run a WAMPP stack off of a CD, complete with mysql/php/apache support. The db's are copied to the current users temp directory on launch, and can be run entirely without the user installing anything!