How to increase efficiency of Unit testing? - junit

We are creating unit test cases for our existing code base, while progressing through the creation of the test cases the test files are getting bigger in size and are taking very long time in execution.
I know the limitations of unit testing and I did some research also to increase efficiency. While research I found one useful idea to tighten up the provided data set.
Still I am looking for some more ideas on how I can increase efficiency of running/creating the unit test cases? We can keep the option to increase the server resources outside of this scope.

As your question was general I'll cover a few of the common choices. But most of the speed-up techniques have downsides.
If you have dependencies on external components (web services, file systems, etc.) you can get a speed-up by mocking them. This is usually desirable for unit testing anyway. You still need to have integration/functional tests that test with the real component.
If testing databases, you can get quite a speed-up by using an in-memory database (sqlite works well with PHP's PDO; with java maybe H2?). This can have downsides, unless database portability is already a design goal. (I'm about to move to running one set of unit tests against both MySQL and sqlite.) Mocking the database away completely (see above) may be better.
PHPUnit allows you to specify #group on each test. You could go through and mark your slower tests with #group slow, and then use the --exclude-group commandline flag to exclude them on most of your tests runs, and just include them in the overnight build. (You can also specify groups to include/exclude in your phpunit.xml.dist file.
(I don't think jUnit has this option, but TestNG does; for C#, NUnit offers categories for this.)
Creating fixtures once, and then sharing them between tests is quicker than creating the fixture before each test. The XUnit Test Patterns devotes whole chapters to the pros and cons (mostly cons) of this approach.
I know throwing hardware at it was explicitly forbidden in your question, but look again at #group, and consider how it can allow you to split your tests across multiple machines. Or splitting tests by directory, and processing one directory on each of multiple machines on your LAN. (PHPUnit is single-threaded, so you could run multiple instances on the same machine, each doing its own directory: be aware of how fixtures need to be independent (including unique names for databases you create, mocking the filesystem, etc.) if you go down this route.)

Related

Where to create KnowledgeBase in a Drools unit test?

Brief
I'm looking to write unit tests in JUnit to check individual drools rules. A unit test should be simple to write and fast to run. If the unit tests are slow then developers will avoid running them and the build will become excessively slow. With that in mind I'm trying to figure out the best (fastest to execute and easiest to write) method of writing these unit tests.
First attempt
The first option I tried was to create the KnowledgeBase as a static class attribute and initialized on one .drl file. Each test then creates a new session in the #Before method. This was based on the code examples in the Drools JBoss rules developer guide.
I've seen a second option that tidies this up a bit by creating some annotations to abstract the initialization code but it's basically the same approach.
I noticed that this basic unit test on one .drl file was taking a couple of seconds to run. This isn't too bad with one unit test but once it's scaled up I can see it being a problem. I did some reading and found that the KnowledgeBase is expensive to create, whereas the session is cheap. This is why the examples have the KnowledgeBase as static so it's created only once, however, with multiple unit test classes it will potentially be created many times.
Alternative
The alternative I tried is to create a singleton KnowledgeBase that will load all .drl files. This will be done once globally for the test suite and then autowired into each test class. I used a spring #Configuration class and defined the KnowledgeBase as an #Bean. I now find that the KnowledgeBase takes 2 seconds (but runs only once), the session creation takes about 0.2 seconds and the test itself takes no time at all.
It seems as if the spring approach may scale better but I'm not sure if I will get other problems with testing a single rule but using a KnowledgeBase initialized on all files? I'm using an AgendaFilter to target the specific rule I want to test. Also I've searched online quite a bit and haven't found anybody else doing it this way.
Summary
What will be the most scalable way of writing these tests? Thanks
This is a very nice collection of experiences. Let me contribute some ideas.
If you have a scenario where a large number of rules is essential to test the outcome because rules vie with each other, you should create the KieBase containing all and serialize it, once. For individual tests, either derive a session from it for each test, inserting facts and firing all rules, or, alternatively, derive the session up front, and run tests, clearing the session (!), inserting facts and firing all rules.
For testing a single rule or a small (<10) set of rules, compiling the DRL from scratch for each set of tests may be tolerable, especially when you adopt the strategy to reuse the session (!) by clearing Working Memory between individual tests.
Some consideration should also be given to rule design. Excessive iterative algorithms should not be implemented in DRL by hook or by crook; using a DRL function or some (static) Java method may be far superior. And testing such as these is much easier.
Also, following established rule design patterns helps a lot. Google "Design Patterns in Production Systems".

How about an Application Centralized Configuration Management System?

We have a build pipeline to manage the artifacts' life cycle. The pipline is consist of four stages:
1.commit(runing unit/ingetration tests)
2.at(deploy artifact to at environment and runn automated acceptance tests)
3.uat(deploy artifact to uat environment and run manual acceptance tests)
4.pt(deploy to pt environment and run performance tests)
5.//TODO we're trying to support the production environment.
The pipeline supports environment varialbles so we can deploy artifacts with different configurations by triggerting with options. The problem is sometimes there are too many configuration items making the deploy script contains too many replacement tasks.
I have an idea of building a centralized configuration managment system (ccm for short name), so we can maintain the configuration items over there and leave only a url(connect to the ccm) replacement task (handling different stages) in the deploy script. Therefore, the artifact doesnt hold the configuration values, it asks the ccm for them.
Is this feasible or a bad idea of the first place?
My concern is that the potential mismatch between the configuration key (defined in the artifact) and value (set in the ccm) is not solved by this solution and may even worse.
Configuration files should remain with the project or set as configuration variables where they are run. The reasoning behind this is that you're adding a new point of failure in your architecture, you have to take into account that your configuration server could go down thus breaking everything that depends on it.
I would advice against putting yourself in this situation.
There is no problem in having a long list of environment variables defined for a project, besides that could even mean you're doing things properly.
If for some reason you find yourself changing configuration files a lot (for ex. database connection strings, api ednpoints, etc...) then the problem might be this need to change a lot configurations, which should stay almost always the same.

Drools testing with junit

What is the best practice to test drools rules with junit?
Until now we used junit with dbunit to test rules. We had sample data that was put to hsqldb. We had couple of rule packages and by the end of the project it is very hard to make a good test input to test certain rules and not fire others.
So the exact question is that how can I limit tests in junit to one or more certain rule(s) for testing?
Personally I use unit tests to test isolated rules. I don't think there is anything too wrong with it, as long as you don't fall into a false sense of security that your knowledge base is working because isolated rules are working. Testing the entire knowledge base is more important.
You can write the isolating tests with AgendaFilter and StatelessSession
StatelessSession session = ruleBase.newStatelessSesssion();
session.setAgendaFilter( new RuleNameMatches("<regexp to your rule name here>") );
List data = new ArrayList();
... // create your test data here (probably built from some external file)
StatelessSessionResult result == session.executeWithResults( data );
// check your results here.
Code source: http://blog.athico.com/2007/07/my-rules-dont-work-as-expected-what-can.html
Do not attempt to limit rule execution to a single rule for a test. Unlike OO classes, single rules are not independent of other rules, so it does not make sense to test a rule in isolation in the same way that you would test a single class using a unit test. In other words, to test a single rule, test that it has the right effect in combination with the other rules.
Instead, run tests with a small amount of data on all of your rules, i.e. with a minimal number of facts in the rule session, and test the results and perhaps that a particular rule was fired. The result is not actually that much different from what you have in mind, because a minimal set of test data might only activate one or two rules.
As for the sample data, I prefer to use static data and define minimal test data for each test. There are various ways of doing this, but programmatically creating fact objects in Java might be good enough.
I created simple library that helps to write unit tests for Drools. One of the features is exactly what you need: declare particular drl files you want to use for your unit test:
#RunWith(DroolsJUnitRunner.class)
#DroolsFiles(value = "helloworld.drl", location = "/drl/")
public class AppTest {
#DroolsSession
StatefulSession session;
#Test
public void should_set_discount() {
Purchase purchase = new Purchase(new Customer(17));
session.insert(purchase);
session.fireAllRules();
assertTrue(purchase.getTicket().hasDiscount());
}
}
For more details have a look on blog post: https://web.archive.org/web/20140612080518/http://maciejwalkowiak.pl/blog/2013/11/24/jboss-drools-unit-testing-with-junit-drools/
A unit test with DBUnit doesn't really work. An integration test with DBUnit does. Here's why:
- A unit test should be fast.
-- A DBUnit database restore is slow. Takes 30 seconds easily.
-- A real-world application has many not null columns. So data, isolated for a single feature, still easily uses half the tables of the database.
- A unit test should be isolated.
-- Restoring the dbunit database for every test to keep them isolated has drawbacks:
--- Running all tests takes hours (especially as the application grows), so no one runs them, so they constantly break, so they are disabled, so there is no testing, so you application is full of bugs.
--- Creating half a database for every unit test is a lot of creation work, a lot of maintenance work, can easily become invalid (with regards to validation which database schema's don't support, see Hibernate Validator) and ussually does a bad job of representing reality.
Instead, write integration tests with DBunit:
- One DBunit, the same for all tests. Load it only once (even if you run 500 tests).
-- Wrap each test in a transaction and rollback the database after every test. Most methods use propagation required anyway. Set the testdata dirty only (to reset it in the next test if there is a next test) only when propagation is requires_new.
- Fill that database with corner cases. Don't add more common cases than are strictly needed to test your business rules, so ussually only 2 common cases (to be able to test "one to many").
- Write future-proof tests:
-- Don't test the number of activated rules or the number of inserted facts.
-- Instead, test if a certain inserted fact is present in the result. Filter the result on a certain property set to X (different from the common value of that property) and test the number of inserted facts with that property set to X.
Unit test is about taking minimum piece of code and test all possible usecases defining specification. With integration tests your goal is not all possible usecases but integration of several units that work together. Do the same with rules. Segregate rules by business meaning and purpose. Simplest 'unit under the test' could be file with single or high cohension set of rules and what is required for it to work (if any), like common dsl definition file and decision table. For integration test you could take meaningful subset or all rules of the system.
With this approach you'll have many isolated unit tests and few integration tests with limited amount of common input data to reproduce and test common scenarios. Adding new rules will not impact most of unit tests but few integration tests and will reflect how new rules impact common data flow.
Consider JUnit testing library that could be suitable for this approach

How to do integration testing?

There is so much written about unit testing but I have hardly found any books/blogs about integration testing? Could you please suggest me something to read on this topic?
What tests to write when doing integration testing?
what makes a good integration test?
etc etc
Thanks
Anything written by Kent Beck, father of both JUnit and SUnit, is a great place to start (for unit tests / test writing in general). I'm assuming that you don't mean "continuous integration," which is a process-based build approach (very cool, when you get it working).
In my own experience, integration tests look very similar to regular unit tests, simply at a higher level. More mock objects. More state initialization.
I believe that integration tests are like onions. They have layers.
Some people prefer to "integrate" all of their components and test the "whole" product as an the "integration" test. You can certainly do this, but I prefer a more incremental approach. If you start low-level and then keep testing at higher composition layers, then you will achieve integration testing.
Maybe it is generally harder to find information on integration testing because it is much more specific to the actual application and its business use. Nevertheless, here's my take on it.
What applies to unit-tests also applies to integration tests: modules should have an easy way to mock their externals inputs (files, DB, time...), so that they can be tested together with the other unit-tests.
But what I've found extremely useful, at least for data-oriented applications, is to be able to create a "console" version of the application that takes input files that fully determine its state (no dependencies on databases, network resources...), and outputs the result as another file. One can then maintain pairs of inputs / expected results files, and test for regressions as part of nightly builds, for example. Having this console version allows for easier scripting, and makes debugging incredibly easier as one can rely on a very stable environment, where it is easy to reproduce bugs and to run the debugger.
J.B. Rainsberger has written about them. Here's a link to an InfoQ article with more info.
http://www.infoq.com/news/2009/04/jbrains-integration-test-scam

Structuring RSpec file structure and code for tests with very large coverage?

I've just started looking at a project that has >20k unit tests written in Rspec (the project itself isn't written in Ruby; just the test cases). The current number of test cases is expected to grow dramatically in the future, as more functionality is added. What's already happened (over an extended period) is that RSpec started out being a particularly good solution for testing this project, but as the project has grown, the fairly ad-hoc structure of their RSpec test cases has bitten them badly. One of the biggest problems they've got is with taxonomy in their test code - the structure (or lack of) in naming their test cases, fixtures, helper code etc. in their test cases.
As you could imagine, with >20k unit tests, there's a lot of methods with very similar names, using helper methods that are all loaded from a global namespace.
To highlight just one area where the problem is biting, there's ~10 databases within this application. Checking the structure of tables/columns/views/constraints/stored procs/... for all of these databases is something that (quite reasonably) is covered within the existing RSpec unit tests. However, the total number of DDL entities within this collection of databases that needs to be checked is probably >10000; covering the whole range of DB structural checks and being able to selectively test only subsets of DB structure requires either:
10000 separate methods (and I'm ruling out that option straight away!)
a fairly complex naming convention within the test cases (i.e. something encompassing the DB name + table name + column name + ...),
OR passing e.g. DB name, table name, column name, ... to a generic method
OR separation of concerns via namespaces (and I'm not aware of an elegantly scalable way of doing this within RSpec),
OR some clever metaprogramming (that I suspect could ultimately make an already messy structure even more difficult to follow).
What exists now is a bit of all of the above, as far as I can tell, with not a lot of obvious planning...
Do you have any tips or references I could look at to try to straighten out this mess, and give them some sort of scalable structure to their RSpec testing? In particular, suggestions on how to structure the various RSpec files for very large projects would be very welcome.
The RSpec Book is an excellent resource for RSpec tips and tricks, and for BDD methodology in general (although your focus appears to be more about testing). There are several ways to simplify and DRY up specs to make them easier to manage, including shared examples (Chapter 12) and macros (Chapter 17).
I also recommend David Chelimsky's blog.
Still it looks like your project could be a real challenge. Of the approaches you mentioned, I think using macros with DB, table, and column as parameters is the most promising.