I have a dream to improve the world of distributed programming :)
In particular, I'm feeling a lack of necessary tools for debugging, monitoring, understanding and visualizing the behavior of distributed systems (heck, I had to write my own logger and visualizers to satisfy my requirements), and I'm writing a couple of such tools in my free time.
Community, what tools do you lack with this regard? Please describe one per answer, with a rough idea of what the tool would be supposed to do. Others can point out the existence of such tools, or someone might get inspired and write them.
OK, let me start.
A distributed logger with a high-precision global time axis - allowing to register events from different machines in a distributed system with high precision and independent on the clock offset and drift; with sufficient scalability to handle the load of several hundred machines and several thousand logging processes. Such a logger allows to find transport-level latency bottlenecks in a distributed system by seeing, for example, how many milliseconds it actually takes for a message to travel from the publisher to the subscriber through a message queue, etc.
Syslog is not ok because it's not scalable enough - 50000 logging events per second will be too much for it, and timestamp precision will suffer greatly under such load.
Facebook's Scribe is not ok because it doesn't provide a global time axis.
Actually, both syslog and scribe register events under arrival timestamps, not under occurence timestamps.
Honestly, I don't lack such a tool - I've written one for myself, I'm greatly pleased with it and I'm going to open-source it. But others might.
P.S. I've open-sourced it: http://code.google.com/p/greg
Dear Santa, I would like visualizations of the interactions between components in the distributed system.
I would like a visual representation showing:
The interactions among components, either as a UML collaboration diagram or sequence diagram.
Component shutdown and startup times as self-interactions.
On which hosts components are currently running.
Location of those hosts, if available, within a building or geographically.
Host shutdown and startup times.
I would like to be able to:
Filter the components and/or interactions displayed to show only those of interest.
Record interactions.
Display a desired range of time in a static diagram.
Play back the interactions in an animation, with typical video controls for playing, pausing, rewinding, fast-forwarding.
I've been a good developer all year, and would really like this.
Then again, see this question - How to visualize the behavior of many concurrent multi-stage processes?.
(I'm shamelessly refering to my own stuff, but that's because the problems solved by this stuff were important for me, and the current question is precisely about problems that are important for someone).
You could have a look at some of the tools that come with erlang/OTP. It doesn't have all the features other people suggested, but some of them are quite handy, and built with a lot of experience. Some of these are, for instance:
Debugger that can debug concurrent processes, also remotely, AFAIR
Introspection tools for mnesia/ets tables as well as process heaps
Message tracing
Load monitoring on local and remote nodes
distributed logging and error report system
profiler which works for distributed scenarios
Process/task/application manager for distributed systems
These come of course in addition to the base features the platform provides, like Node discovery, IPC protocol, RPC protocols & services, transparent distribution, distributed built-in database storage, global and node-local registry for process names and all the other underlying stuff that makes the platform tic.
I think this is a great question and here's my 0.02 on a tool I would find really useful.
One of the challenges I find with distributed programming is in the deployment of code to multiple machines. Quite often these machines may have slightly varying configuration or worse have different application settings.
The tool I have in mind would be one that could on demand reach out to all the machines on which the application is deployed and provide system information. If one specifies a settings file or a resource like a registry, it would provide the list for all the machines. It could also look at the user access privileges for the users running the application.
A refinement would be to provide indications when settings are not matching a master list provided by the developer. It could also indicate servers that have differing configurations and provide diff functionality.
This would be really useful for .NET applications since there are so many configurations (machine.config, application.config, IIS Settings, user permissions, etc) that the chances of varying configurations are high.
In my opinion, what is missing is a distributed programming platform...a platform that makes application programming over distributed systems as transparent as non-distributed programming is now.
Isn't it a bit early to work on Tools when we don't even agree on a platform? We have several flavors of actor models, virtual shared memory, UMA, NUMA, synchronous dataflow, tagged token dataflow, multi-hierchical memory vector processors, clusters, message passing mesh or network-on-a-chip, PGAS, DGAS, etc.
Feel free to add more.
To contribute:
I find myself writing a lot of distributed programs by constructing a DAG, which gets transformed into platform-specific code. Every platform optimization is a different kind of transformation rules on this DAG. You can see the same happening in Microsoft's Accelerator and Dryad, Intel's Concurrent Collections, MIT's StreaMIT, etc.
A language-agnostic library that collects all these DAG transformations would save re-inventing the wheel every time.
You can also take a look at Akka:
http://akka.io
Let me notify those who've favourited this question by pointing to the Greg logger - http://code.google.com/p/greg . It is the distributed logger with a high-precision global time axis that I've talked about in the other answer in this thread.
Apart from the mentioned tool for "visualizing the behavior of many concurrent multi-stage processes" (splot), I've also written "tplot" which is appropriate for displaying quantitative patterns in logs.
A large presentation about both tools, with lots of pretty pictures here.
Related
What reasons are there against (or for) using the features of an Enterprise Service Bus when building an overall service adhering to a micro-service architecture (http://martinfowler.com/articles/microservices.html)? Why should we use dumb pipes and smart endpoints as opposed to using smarter pipes and be able to develop simpler services?
This is a huge question and probably can't be answered effectively in SO's Q&A format.
It depends what you are doing with it.
If you are building a single product which consists of lots of small pieces of function that can be thought of as being independent then microservices maybe the way to go.
If you are a large enterprise organisation where IT is not the main consideration of the board of directors as a competitive advantage and you work in a heavily regulate industry where new standards have to be applied across globally distributed projects with their own IT departments, some from new acquisitions, where you can't centrally control all the endpoints and applications within your organisation, then maybe you need an ESB.
I don't want to be accused of trying to list ALL the advantages of both approaches here as they wouldn't be complete and may be out of date quickly.
Having said that, in an effort to be useful to the OP:
If you look up how Spotify and Netflix do microservices you can find many things they like about the approach, including but not limited to: ease of blue/green deployment of individual services, decoupled team structures, and isolation of failures.
ESBs allow you to centrally administer and enforce policies, like legal requirements, audit everything in one place rather than hoping each team got the memo about logging everything, provide global statistics about load and uptime, as well as many other things. ESBs grew out of large enterprises where the driver was not customer response time on a website and speed of innovation (amongst other things) but Service Level Agreements, cost effectiveness and regulations (amongst other things).
There is a lot of value in both approaches. Microservices are being written about a lot at the moment, just as ESBs were 10-15 years ago. Maybe that's a progression, maybe it's just a change, maybe it's just that consumer product companies need to market themselves and large enterprises like to keep details private. We may find out in another 10 years. For now, it depends heavily on what you are doing. As with most things in programming, I'd start out simple and only move to the more complex solution if you need to.
The term ESB has gotten overloaded, primarily in the Java world, to mean a big and complex piece of infrastructure that ends up hosting a bunch of poorly implemented logic in a central place.
Lighter-weight technologies like Apache Caml or NServiceBus don't encourage this kind of approach and indeed follow the "dumb pipes / smart endpoints" approach that has served as the backbone of the internet from the beginning.
NServiceBus specifically focuses on providing a higher level framework than most messaging libraries to make it easier to build smart endpoints that are more reliable through its deeper support for once-and-only-once message processing.
Full disclosure - I'm the founder of NServiceBus.
Because services are isolated and pipes are reused.
Core idea of microservices is isolation - any part of the system can be replaced without affecting other services. Smart pipes means they have configuration, they have state, they have complex (which often means hard-to-predict) behavior. Thus, smart pipes are less likely to retain their exact behavior over time.
But - pipe change will affect every service attached while service change affects only other services that use it.
The problem with how ESB is used is that it creates a coupling between ESB and services by having some business logic built into the ESB. This would makes it more difficult to deploy a single service independently and increasingly making the ESB more complex and difficult to maintain.
I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. How can I sandbox these programs so that malicious users don't take down my server?
Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious.
I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox.
codepad.org has something based on geordi, which runs everything in a chroot (i.e restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. See http://codepad.org/about .
I've previously used Systrace, another utility for restricting system calls.
If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. Although that would provide another layer of protection, which probably wouldn't hurt.
Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:
Isolation and access control granularity
Performance and ease of installation/configuration
I eventually decided on a multi-tiered architecture, based on Linux:
Level 0 - Virtualization:
By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
Clear separation of sensitive from non-sensitive data.
At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:
On a Unix OS that would contain the traditional access and resource control mechanisms:
Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail.
Strict user permissions, possibly with ACLs.
ulimit resource limits on processor time and memory usage.
Execution under nice to reduce priority over more critical processes. On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems.
Disk quotas.
Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:
On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:
I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:
Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
Selective access to system libraries and header files; if you don't want your users to call connect() you might just restrict access to socket.h.
Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:
You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
calling whatever security officials are in charge of such issues.
finding that persistent little hacker of yours and offering them a job.
The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.
I am the developer of libsandbox mentioned by #thkala, and I do recommend it for use in your project.
Some additional comments on #thkala's answer,
it is fair to classify libsandbox as a user-land tool, but libsandbox does integrate standard OS-level security mechanisms (i.e. chroot, setuid, and resource quota);
restricting access to C/C++ headers, or static analysis of users' code, does NOT prevent system functions like connect() from being called. This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions in libc;
compile-time protection also deserves attention because malicious C/C++ code can exhaust your CPU with infinite template recursion or pre-processing macro expansion;
Are there benefits to developing an application on two or more different platforms? Does using a different compiler on even the same platform have benefits?
Yes, especially if you plan to distribute your code for multiple platforms.
But even if you don't cross platform development is a form of futureproofing; if it runs on multiple (diverse) platforms today, it's more likely to run on future platforms than something that was tuned, tweeked, and specialized to work on a version 7.8.3 clean install of vendor X's Q-series boxes (patch level 1452) and nothing else.
There seems to be a benefit in finding and simply preventing bugs with a different compiler and a different OS. Different CPUs can pin down endian issues early. There is the pain at the GUI level if you want to stay native at that level.
Short answer: Yes.
Short of cloning a disk, it is almost impossible to make two systems exactly alike, so you are going to end up running on "different platforms" whether you meant to or not. By specifically confronting and solving the "what if system A doesn't do things like B?" problem head on you are much more likely to find those key assumptions your code makes.
That said, I would say you should get a good chunk of your base code working on system A, and then take a day (or a week or ...) and get it running on system B. It can be very educational.
My education came back in the 80's when I ported a source level C debugger to over 100 flavors of U*NX. Gack!
Are there benefits to developing an application on two or more different platforms?
If this is production software, the obvious reason is the lure of a larger client base. Your product's appeal is magnified the moment the client hears that you support multiple platforms. Remember, most enterprises do not use a single OS or even a single version of the OS. It is fairly typical to find a section using Windows, another Mac and a smaller version some flavor of Linux.
It is also seen that customizing a product for a single platform is often far more tedious than to have it run on multi-platform. The law of diminishing returns kicks in even before you know.
Of course, all of this makes little sense, if you are doing customization work for an existing product for the client's proprietary hardware. But even then, keep an eye out for the entire range of hardware your client has in his repertoire -- you never know when he might ask for it.
Does using a different compiler on even the same platform have benefits?
Yes, again. Different compilers implement different extensions. See to it that you are not dependent on a particular version of a particular compiler.
Further, there may be a bug or two in the compiler itself. Using multiple compilers helps sort these out.
I have further seen bits of a (cross-platform) product using two different compilers -- one was to used in those modules where floating point manipulation required a very high level of accuracy. (Been a while I've heard anyone else do that, but ...)
I've ported a large C++ program, originally Win32, to Linux. It wasn't very difficult. Mostly dealing with compiler incompatibilities, because the MS C++ compiler at the time was non-compliant in various ways. I expect that problem has mostly gone now (until C++0x features start gradually appearing). Also writing a simple platform abstraction library to centralize the platform-specific code in one place. It depends to what extent you are dependent on services from the OS that would be hard to mimic on a new platform.
You don't have to build portability in from the ground up. That's why "porting" is often described as an activity you can perform in one shot after an initial release on your most important platform. You don't have to do it continuously from the very start. Purely for economic reasons, if you can avoid doing work that may never pay off, obviously you should. The cost of porting later on, when really necessary, turns out to be not that bad.
Mostly, there is an existing platform where the application is written for (individual software). But you adress more developers (both platforms), if you decide to provide an independent language.
Also products (standard software) for SMEs can be sold better if they run on different platforms! You can gain access to both markets, WIN&LINUX! (and MacOSx and so on...)
Big companies mostly buy hardware which is supported/certified by the product vendor only to deploy the specified product.
If you develop on multiple platforms at the same time you get the advantage of being able to use different tools. For example I once had a memory overwrite (I still swear I didn't need the +1 for the null byte!) that cause "free" to crash. I brought the code up to speed on Windows and found the overwrite in about 1 minute with Rational Purify... it had taken me a week under Linux of chasing it (valgrind might have found it... but I didn't know about it at the time).
Different compilers on the same or different platforms is, to me, a must as each compiler will report different things, and sometimes the report from one compiler about an error will be gibberish but the other compiler makes it very clear.
Using things like multiple databases while developing means you are much less likely to tie yourself to a particular database which means you can swap out the database if there is a reason to do so. If you want to integrate something that uses Oracle into a existing infrastructure that uses SQL Server for example it can really suck - much better if the Oracle or SQL Server pieces can be moved to the other system (I know of some places that have 3 different databases for their financial systems... ick).
In general, always developing for two or three things means that the odds of you finding mistakes is better, and the odds of the system being more flexible is better.
On the other hand all of that can take time and effort that, at the immediate time, is seen as an unneeded expense.
Some platforms have really dreadful development tools. I once worked in an IB where rather than use Sun's ghastly toolset, peole developed code in VC++ and then ported to Solaris.
I have been doing a little reading on Flow Based Programming over the last few days. There is a wiki which provides further detail. And wikipedia has a good overview on it too. My first thought was, "Great another proponent of lego-land pretend programming" - a concept harking back to the late 80's. But, as I read more, I must admit I have become intrigued.
Have you used FBP for a real project?
What is your opinion of FBP?
Does FBP have a future?
In some senses, it seems like the holy grail of reuse that our industry has pursued since the advent of procedural languages.
1. Have you used FBP for a real project?
We've designed and implemented a DF server for our automation project (dispatcher, component iterface, a bunch of components, DF language, DF compiler, UI). It is written in bare C++, and runs on several Unix-like systems (Linux x86, MIPS, avr32 etc., Mac OSX). It lacks several features, e.g. sophisticated flow control, complex thread control (there is only a not too advanced component for it), so it is just a prototype, even it works. We're now working on a full-featured server. We've learnt lot during implementing and using the prototype.
Also, we'll make a visual editor some day.
2. What is your opinion of FBP?
2.1. First of all, dataflow programming is ultimate fun
When I met dataflow programming, I was feel like 20 years ago, when I met programming first. Altough, DF programming differs from procedural/OOP programming, it's just a kind of programming. There are lot of things to discover, even sooo simple ones! It's very funny, when, as an experienced programmer, you met a DF problem, which is a very-very basic thing, but it was completely unknown for you before. So, if you jump into DF programming, you will feel like a rookie programmer, who first met the "cycle" or "condition".
2.2. It can be used only for specific architectures
It's just a hammer, which are for hammering nails. DF is not suitable for UIs, web server and so on.
2.3. Dataflow architecture is optimal for some problems
A dataflow framework can make magic things. It can paralellize procedures, which are not originally designed for paralellization. Components are single-threaded, but when they're organized into a DF graph, they became multi-threaded.
Example: did you know, that make is a DF system? Try make -j (see man, what -j is used for). If you have multi-core machine, compile your project with and without -j, and compare times.
2.4. Optimal split of the problem
If you're writing a program, you often split up the problem for smaller sub-problems. There are usual split points for well-known sub-problems, which you don't need to implement, just use the existing solutions, like SQL for DB, or OpenGL for graphics/animation, etc.
DF architecture splits your problem a very interesting way:
the dataflow framework, which provides the architecture (just use an existing one),
the components: the programmer creates components; the components are simple, well-separated units - it's easy to make components;
the configuration: a.k.a. dataflow programming: the configurator puts the dataflow graph (program) together using components provided by the programmer.
If your component set is well-designed, the configurator can build such system, which the programmer has never even dreamed about. Configurator can implement new features without disturbing the programmer. Customers are happy, because they have personalised solution. Software manufacturer is also happy, because he/she don't need to maintain several customer-specific branches of the software, just customer-specific configurations.
2.5. Speed
If the system is built on native components, the DF program is fast. The only time loss is the message dispatching between components compared to a simple OOP program, it's also minimal.
3. Does FBP have a future?
Yes, sure.
The main reason is that it can solve massive multiprocessing issues without introducing brand new strange software architectures, weird languages. Dataflow programming is easy, and I mean both: component programming and dataflow configuration building. (Even dataflow framework writing is not a rocket science.)
Also, it's very economic. If you have a good set of components, you need only put the lego bricks together. A DF program is easy to maintain. The DF config building requires no experienced programmer, just a system integrator.
I would be happy, if native systems spread, with doors open for custom component creating. Also there should be a standard DF language, which means that it can be used with platform-independent visual editors and several DF servers.
Interesting discussion! It occurred to me yesterday that part of the confusion may be due to the fact that many different notations use directed arcs, but use them to mean different things. In FBP, the lines represent bounded buffers, across which travel streams of data packets. Since the components are typically long-running processes, streams may comprise huge numbers of packets, and FBP applications can run for very long periods - perhaps even "perpetually" (see a 2007 paper on a project called Eon, mostly by folks at UMass Amherst). Since a send to a bounded buffer suspends when the buffer is (temporarily) full (or temporarily empty), indefinite amounts of data can be processed using finite resources.
By comparison, the E in Grafcet comes from Etapes, meaning "steps", which is a rather different concept. In this kind of model (and there are a number of these out there), the data flowing between steps is either limited to what can be held in high-speed memory at one time, or has to be held on disk. FBP also supports loops in the network, which is hard to do in step-based systems - see for example http://www.jpaulmorrison.com/cgi-bin/wiki.pl?BrokerageApplication - notice that this application used both MQSeries and CORBA in a natural way. Furthermore, FBP is natively parallel, so it lends itself to programming of grid networks, multicore machines, and a number of the directions of modern computing. One last comment: in the literature I have found many related projects, but few of them have all the characteristics of FBP. A list that I have amassed over the years (a number of them closer than Grafcet) can be found in http://www.jpaulmorrison.com/cgi-bin/wiki.pl?FlowLikeProjects .
I do have to disagree with the comment about FBP being just a means of implementing FSMs: I think FSMs are neat, and I believe they have a definite role in building applications, but the core concept of FBP is of multiple component processes running asynchronously, communicating by means of streams of data chunks which run across what are now called bounded buffers. Yes, definitely FSMs are one way of building component processes, and in fact there is a whole chapter in my book on FBP devoted to this idea, and the related one of PDAs (1) - http://www.jpaulmorrison.com/fbp/compil.htm - but in my opinion an FSM implementing a non-trivial FBP network would be impossibly complex. As an example the diagram shown in
is about 1/3 of a single batch job running on a mainframe. Every one of those blocks is running asynchronously with all the others. By the way, I would be very interested to hearing more answers to the questions in the first post!
1: http://en.wikipedia.org/wiki/Pushdown_automaton Push-down automata
Whenever I hear the term flow based programming I think of LabView, conceptually. Ie component processes who's scheduling is driven primarily by a change to its input data. This really IS lego programming in the sense that the labview platform was used for the latest crop of mindstorm products. However I disagree that this makes it a less useful programming model.
For industrial systems which typically involve data collection, control, and automation, it fits very well. What is any control system if not data in transformed to data out? Ie what component in your control scheme would you not prefer to represent as a black box in a bigger picture, if you could do so. To achieve that level of architectural clarity using other methodologies you might have to draw a data domain class diagram, then a problem domain run time class relationship, then on top of that a use case diagram, and flip back and forth between them. With flow driven systems you have the luxury of being able to collapse a lot of this information together accurately enough that you can realistically design a system visually once the components are build and defined.
One question I never had to ask when looking at an application written in labview is "What piece of code set this value?", as it was inherent and easy to trace backwards from the data, and also mistakes like multiple untintended writers were impossible to create by mistake.
If only that was true of code written in a more typically procedural fashion!
1) I build a small FBP framework for an anomaly detection project, and it turns out to have been a great idea.
You can also have a look at some of the KNIME videos, that give a good idea of what a flow based framework feels like when the framework is put together by a great team. Admittedly, it is batch based and not created for continuous operation.
By far the best example of flow based programming, however, is UNIX pipes which is one of the oldest, most overlooked FBP framework. I don't think I have to elaborate on the power of nix pipes...
2) FBP is a very powerful tool for a large set of problems. The intrinsic parallelism is a great advantage, and any FBP framework can be made completely network transparent by using adapter modules. Smart frameworks are also absurdly fault tolerant, and able to dynamically reload crashed modules when necessary. The conceptual simplicity also allows cleaner communication with everybody involved in a project, and much cleaner code.
3) Absolutely! Pipes are here to stay, and are one of the most powerful feature of unix. The power inherent in a FBP framework compared to a static program are many, and trivialise change, to the point where some frameworks can be reconfigured while running with no special measures.
FBP FTW! ;-)
In automotive development, they have a language agnostic messaging protocol which is part of the MOST specification (Media Oriented Systems Transport), this was designed to communicate between components over a network or within the same device. Systems usually have both a real and visualized message bus - therefore you effectively have a form of flow based programming.
That was what made the light bulb go on for me several years ago and brought me here. It really is a fantastic way to work and so much more fun than conventional programming. The message catalog form the central specification and point of reference. It works well for both developers and management. i.e. Management are able to browse the message catalog instead of looking at source.
With integrated logging also referencing the catalog to produce intelligible analysis things can get really productive. I have real world experience of developing commercial products in this way. I am interested in taking things further, particularly with regards to tools and IDEs. Unfortunately I think many people within the automotive sector have missed the point about how great this is and have failed to build on it. They are now distracted by other fads and failed to realize that there was far more to most development than the physical bus.
I've used Spring Web Flow extensively in Java Web applications to model (typically) application processes, which tend to be complex wizard-like affairs with lots of conditional logic as to which pages to display. Its incredibly powerful. A new product was added and I managed to recut the existing pieces into a completely new application process in an hour or two (with adding a couple of new views/states).
I also looked into using OS Workflow to model business processes but that project got canned for various reasons.
In the Microsoft world you have Windows Workflow Foundation ("WWF"), which is becoming more popular, particularly in conjunction with Sharepoint.
FBP is just a means of implementing a finite state machine. It's nothing new.
I realize that it is not exactly the same thing, but this model has been used for years in PLC programming. ISO calls it Sequential Flow Chart, but many people call it Grafcet after a popular implementation. It offers parallel processing and defines transitions between states.
It's being used in the Business Intelligence world these days to mashup and process data. Data processing steps like ETL, querying, joining , and producing reports can be done by the end-user. I'm a developer on an open system - ComposableAnalytics.com In CA, the flow-based apps can be shared and executed via the browser.
This is what MQ Series, MSMQ and JMS are for.
This is cornerstone of Web Services and Enterprise Service Bus implementations.
Products like TIBCO and Sun's JCAPS are basically flow-based without using this particular buzz-word.
Most of the work of the application is done with small modules that pass messages through a processing network.
I know that there is no way to fully protect our code.
I also know that if a user wants to crack our app, then he or she is not a user that would buy our app.
I also know that it is better to improve our app.. instead of being afraid of anticracking techniques.
I also know that there is no commercial tool that can protec our app....
I also know that....
Ok. Enough. I've heard everything.
I really think that adding a little protection won't hurt.
So.... have you ever used code virtulizer from oreans or vmprotect?
I've heard that they are sometimes detected as virus by some antivirus.
Any experiences that I should be aware of before buying it.
I know it creates some virtual machines and obfuscates a little the code to make it harder to find the weaknesses of our registration routines.
Is there any warning I should know?
Thanks.
Any advice would be appreciated.
Jag
In my humble opinion, you should be lucky or even eager to be pirated, because that means your product is successful and popular.
That's plain incorrect. My software that I worked many months on was cracked the moment it was released. There are organised cracking groups that feed off download.com's RSS channel etc and crack each app that appears. It's a piece of cake to extract the keygen code of any app, so my response was to:
a) resort to digital certificate key files which are impossible to forge as they are signed by a private AES key and validated by a public one embedded in the app (see: aquaticmac.com - I use the stl c++ implementation which is cross-platform), along with.
b) The excellent Code Virtualizerâ„¢. I will say that the moment I started using Code Virtualizerâ„¢ I was getting some complaints from one or two users about app crashes. When I removed it from their build the crashes ceased. Still, I'm not sure whether it was a problem with CV per se as it could have been an obscure bug in my code, but I since reshuffled my code and I have since heard no complaints.
After the above, no more cracks. Some people look at being cracked as a positive thing, as it's a free publicity channel, but those people usually haven't spent months/years on an idea only to find you're being ripped off. Quite hard to take.
Unfortunately, VM-protected software is more likely to get affected by false positives than conventional packing software. The reason for that is that since AV protection is so complicated, AV software are often unable to analyze the protected code, and may rely on either pattern libraries or may issue generic warnings for any files protected by a system it can't analyze. If your priority is to eliminate false positives, I suggest picking a widely-used protection solution, e.g. AsProtect (although Oreans' products are becoming quite popular as well).
Software VM protection is quite popular today, especially as it's now available at an accessible price for small companies and independent software developers. It also takes a considerable amount of effort to crack in comparison to non-VM techniques - the wrappers usually have the standard anti-debugging tricks that other protections have, as well as the VM protection. Since the virtual machine is generated randomly on each build, the crackers will need to analyze the VM instruction set and reverse engineer the protected code back to machine code.
The main disadvantage of VM protection is that if it's overused (used to protect excessive parts of the code), it can slow down your application considerably - so you'll need to protect just the critical parts (registration checks, etc). It also doesn't apply to certain application types - it likely won't work on DLLs that are used for injection, as well as device drivers.
I've also heard that StrongBit EXECryptor is a decent protection package at a decent price. (I'm not affiliated with said company nor guarantee any quality what-so-ever, it's just word of mouth and worth checking out IMO).