How would you test something with complex output? - junit

I wrote a parser for a non-trivial language this weekend. Some of the output can be complex, even for seemingly simple input. Let's say the input to the parser is a mathematical expression, and the output is a list of tuples that describe the input.
So the output could be 20 lines long.
How would you write the junit test? Would you run the parser, hand-check the result, and if it seems correct, drop the result into the unit test as the Right Answer?
Or is this just insane, and I need to do something differently?

Ideally the idea of a "unit" test is that it tests a small unit of functionality. If the output is so complex that it's difficult to test, that implies that you're testing too large a unit of functionality.
Remember that in addition to verifying that your code works, your unit tests can also act as an example of how your code should be used. A single test that just matches a result against a large predefined result probably won't do that.
Try to break the inner workings into smaller methods and test each one. Try to test building up a result from smaller results (e.g. if input A results in output Y, and input B results in output Z, then write a test for whether input AB results in output YZ, or whatever the appropriate result would be).

This is a perfectly valid test, but not necessarily a unit test. This is tending towards being an integration test or regression test:
How would you write the junit test? Would you run the parser,
hand-check the result, and if it seems correct, drop the result into
the unit test as the Right Answer?
It's perfectly valid to use JUnit to do your integration tests and/or regression tests. I use the approach you've described a lot of the time, but you need to be aware that this has limitations.
Unless you're careful, your tests end up being quite brittle. For instance, your output could contain unexpected characters (spaces, cr/lf and encoding is a particular problem if you're mixing unix and windows machines). This makes the testing slightly more complex because you have to "clean" the output of your parser.
It's a pain to have 20 lines of text in your junit java class, along with the input. So you're faced with the choice of having the text in the java, of putting them into a separate file. Most of the time, I find separate files easier to manage, and the methods are a single line which takes a file, processes it and compares it against a reference file.
Because you're doing integration tests, it'd harder to identify the cause when you have a failing test.
As JacobM says, it's probably a good idea to split down your tests down to smaller pieces, but you can leave the other tests, because they're useful as well.

Related

Use of templates to switch between input versions of data and analyze outputs in palantir-foundry

We are looking to build a single pipeline within a code repository that cleans, harmonizes, and transforms data to features of interest. We would like to apply that single pipeline code on different inputs and then test how the outputs look.
For example, we would like to test the pipeline on synthetic data, version 1 of 'real' data that includes only retrospective data, and version 2 of 'real' data that includes retrospective and prospective data. The comparison of the outputs could be what percent of patients had diabetes in version 1 compared to version 2.
I saw that you could template code repositories in foundry. Is this a viable option? Could you template your code repository and apply to the three scenarios I have provided? Is there a better option?
If your data scale is reasonably small, I would recommend going down the test-driven path of development here instead of trying to compare and contrast results across a wide variety of datasets. You'll find the iteration time and difficulty in exactly comparing results probably quite high.
For this, you should follow the method I lay out here and create representative datasets for each input you expect as a .csv file in your repo, then you can incorporate these schemas as a unique input to your core code and inspect the outputs with ease.
This will let you 'tighten' your code much easier and faster, after which you can then run this logic on real full-scale data and generate your outputs as you wish.
Templating code is possible but should be incorporated with great care. If what you're truly solving for is comparing and contrasting the execution of your code on arbitrary schemas, then you should use test-driven in-repo development. If what you're after is running a core set of logic across a wide variety of outputs after the code is working, then generated transforms is going to work great. If what you're really after is rolling out a large codebase of transformations across differently-permissioned projects where each needs to be completely independent / configured separately of the other, then maybe you should consider templates. I would stick to test-driven development and generated transforms until you prove otherwise.

Prevent jUnit from failing if assumption not satisfied

When running a series of test cases using jUnit, it is possible to skip the test for some cases using assumptions. However, if there are no cases in which the assumption is satisfied, the test is marked as failed. Is there a way to prevent that?
For example, if the property testTopic is passed as vertical_move, all of the tests for vertical_move should run and none of the tests for horizontal_move. Currently I am using assume as shown below to skip these tests.
assumeTrue(TestCons.get("testTopic").contains("horizontal_move"));
The problem is that if the assumption fails in all cases, the test is marked as failed. I want it prevent that. In other words, if the assumption fails, just skip the test without failing it, even if the assumption is never satisfied. Is there a way to do that? Thanks.
You can use #Ignore to mark tests that should be completely excluded from execution, but I am not aware how you could determine that from within a test case.
And honestly: I think you shouldn't even try that.
Unit tests should be straight forward. They exist to help you to quickly identify a bug in your production code. But: any piece of "extra logic" that a reader needs to digest to understand the reason for a failing testcase makes that harder.
Thus: avoid putting any such extra logic in your test cases. Write testcases that create a clear setup, and then assert for the result that you expect for exactly that setup, and nothing else.

Detecting JUnit "tests" that never assert anything

We used to have a technical director who liked to contribute code and was also very enthusiastic about adding unit tests. Unfortunately his preferred style of test was to produce some output to screen and visually check the result.
Given that we have a large bank of tests, are there any tools or techniques I could use to identify the tests never assert?
Since that's a one time operation I would:
scan all test methods (easy, get the jUnit report XML)
use an IDE or other to search references to Assert.*, export result as a list of method
awk/perl/excel the results to find mismatches
Edit: another option is to just look for references to System.out or whatever his preferred way to output stuff was, most tests won't have that.
Not sure of a tool, but the thought that comes to mind is two-fold.
Create a TestRule class that keeps track of the number of asserts per test (use static counter, clear counter at beginning of test, assert that it is not 0 at end of test).
Wrap the Assert class in your own proxy that increments the TestRule's counter each time it is called.
Is your Assert class is called Assert that you would only need to update the imports and add the Rule to the tests. The above described mechanism is not thread-safe so if you have multiple tests running concurrently you will be incorrect results.
If those tests are the only ones that produce output, an automated bulk replacement of System.out.println( with org.junit.Assert.fail("Fix test: " + would highlight exactly those tests that aren't pulling their weight. This technique would make it easy to inspect those tests in an IDE after a run and decide whether to fix or delete them; it also gives a clear indication of progress.

How strict should I be in the "do the simplest thing that could possible work" while doing TDD

For TDD you have to
Create a test that fail
Do the simplest thing that could possible work to pass the test
Add more variants of the test and repeat
Refactor when a pattern emerge
With this approach you're supposing to cover all the cases ( that comes to my mind at least) but I'm wonder if am I being too strict here and if it is possible to "think ahead" some scenarios instead of simple discover them.
For instance, I'm processing a file and if it doesn't conform to a certain format I am to throw an InvalidFormatException
So my first test was:
#Test
void testFormat(){
// empty doesn't do anything nor throw anything
processor.validate("empty.txt");
try {
processor.validate("invalid.txt");
assert false: "Should have thrown InvalidFormatException";
} catch( InvalidFormatException ife ) {
assert "Invalid format".equals( ife.getMessage() );
}
}
I run it and it fails because it doesn't throw an exception.
So the next thing that comes to my mind is: "Do the simplest thing that could possible work", so I :
public void validate( String fileName ) throws InvalidFormatException {
if(fileName.equals("invalid.txt") {
throw new InvalidFormatException("Invalid format");
}
}
Doh!! ( although the real code is a bit more complicated, I found my self doing something like this several times )
I know that I have to eventually add another file name and other test that would make this approach impractical and that would force me to refactor to something that makes sense ( which if I understood correctly is the point of TDD, to discover the patterns the usage unveils ) but:
Q: am I taking too literal the "Do the simplest thing..." stuff?
I think your approach is fine, if you're comfortable with it. You didn't waste time writing a silly case and solving it in a silly way - you wrote a serious test for real desired functionality and made it pass in - as you say - the simplest way that could possibly work. Now - and into the future, as you add more and more real functionality - you're ensuring that your code has the desired behavior of throwing the correct exception on one particular badly-formatted file. What comes next is to make that behavior real - and you can drive that by writing more tests. When it becomes simpler to write the correct code than to fake it again, that's when you'll write the correct code. That assessment varies among programmers - and of course some would decide that time is when the first failing test is written.
You're using very small steps, and that's the most comfortable approach for me and some other TDDers. If you're more comfortable with larger steps, that's fine, too - but know you can always fall back on a finer-grained process on those occasions when the big steps trip you up.
Of course your interpretation of the rule is too literal.
It should probably sound like "Do the simplest potentially useful thing..."
Also, I think that when writing implementation you should forget the body of the test which you are trying to satisfy. You should remember only the name of the test (which should tell you about what it tests). In this way you will be forced to write the code generic enough to be useful.
I too am a TDD newbie struggling with this question. While researching, I found this blog post by Roy Osherove that was the first and only concrete and tangible definition of "the simplest thing that could possibly work" that I have found (and even Roy admitted it was just a start).
In a nutshell, Roy says:
Look at the code you just wrote in your production code and ask yourself the following:
“Can I implement the same solution in a way that is ..”
“.. More hard-coded ..”
“.. Closer to the beginning of the method I wrote it in.. “
“.. Less indented (in as less “scopes” as possible like ifs, loops, try-catch) ..”
“.. shorter (literally less characters to write) yet still readable ..”
“… and still make all the tests pass?”
If the answer to one of these is “yes” then do that, and see all the tests still passing.
Lots of comments:
If validation of "empty.txt" throws an exception, you don't catch it.
Don't Repeat Yourself. You should have a single test function that decides if validation does or does not throw the exception. Then call that function twice, with two different expected results.
I don't see any signs of a unit-testing framework. Maybe I'm missing them? But just using assert won't scale to larger systems. When you get a result from validation, you should have a way to announce to a testing framework that a given test, with a given name, succeeded or failed.
I'm alarmed at the idea that checking a file name (as opposed to contents) constitutes "validation". In my mind, that's a little too simple.
Regarding your basic question, I think you would benefit from a broader idea of what the simplest thing is. I'm also not a fundamentalist TDDer, and I'd be fine with allowing you to "think ahead" a little bit. This means thinking ahead to this afternoon or tomorrow morning, not thinking ahead to next week.
You missed point #0 in your list: know what to do. You say you are processing a file for validation purposes. Once you have specified what "validation" means (hint: do this before writing any code), you might have a better idea of how to a) write tests that, well, test the specification as implemented, and b) write the simplest thing.
If, e.g., validation is "must be XML", your test case is just some non-xml-conformant string, and your implementation is using an XML library and (if necessary) transform its exceptions into those specified for your "validation" feature.
One thing of note to future TDD learners - the TDD mantra doesn't actually include "Do the simplest thing that could possibly work." Kent Beck's TDD Book has only 3 steps:
Red— Write a little test that doesn't work, and perhaps doesn't even
compile at first.
Green— Make the test work quickly, committing
whatever sins necessary in the process.
Refactor— Eliminate all of the duplication created in merely getting the test to work.
Although the phrase "Do the simplest thing..." is often attributed to Ward Cunningham, he actually asked a question "What's the simplest thing that could possibly work?", but that question was later turned into a command - which Ward believes may confuse rather help.
Edit: I can't recommend reading Beck's TDD Book strongly enough - it's like having a pairing session with the master himself, giving you his insights and thoughts on the Test Driven Development process.
Like a method should do one thing only, one test should test one thing (behavior) only. To address the example given, I'd write two tests, for instance, test_no_exception_for_empty_file and test_exception_for_invalid_file. The second could indeed be several tests - one per sort of invalidity.
The third step of the TDD process shall be interpreted as "add a new variant of the test", not "add a new variant to the test". Indeed, a unit test shall be atomic (test one thing only) and generally follows the triple A pattern: Arrange - Act - Assert. And it's very important to verify the test fails first, to ensure it is really testing something.
I would also separate the responsibility of reading the file and validating its content. That way, the test can pass a buffer to the validate() function, and the tests do not have to read files. Usually unit tests do not access to the filesystem cause this slow them down.

Test cases, "when", "what", and "why"?

Being new to test based development, this question has been bugging me. How much is too much? What should be tested, how should it be tested, and why should it be tested? The examples given are in C# with NUnit, but I assume the question itself is language agnostic.
Here are two current examples of my own, tests on a generic list object (being tested with strings, the initialisation function adds three items {"Foo", "Bar", "Baz"}):
[Test]
public void CountChanging()
{
Assert.That(_list.Count, Is.EqualTo(3));
_list.Add("Qux");
Assert.That(_list.Count, Is.EqualTo(4));
_list[7] = "Quuuux";
Assert.That(_list.Count, Is.EqualTo(8));
_list.Remove("Quuuux");
Assert.That(_list.Count, Is.EqualTo(7));
}
[Test]
public void ContainsItem()
{
Assert.That(_list.Contains("Qux"), Is.EqualTo(false));
_list.Add("Qux");
Assert.That(_list.Contains("Qux"), Is.EqualTo(true));
_list.Remove("Qux");
Assert.That(_list.Contains("Qux"), Is.EqualTo(false));
}
The code is fairly self-commenting, so I won't go into what's happening, but is this sort of thing taking it too far? Add() and Remove() are tested seperately of course, so what level should I go to with these sorts of tests? Should I even have these sorts of tests?
I would say that what you're actually testing are equivalence classes. In my view, there is no difference between a adding to a list that has 3 items or 7 items. However, there is a difference between 0 items, 1 item and >1 items. I would probably have 3 tests each for Add/Remove methods for these cases initially.
Once bugs start coming in from QA/users, I would add each such bug report as a test case; see the bug reproduce by getting a red bar; fix the bug by getting a green bar. Each such 'bug-detecting' test is there to stay - it is my safety net (read: regression test) that even if I make this mistake again, I will have instant feedback.
Think of your tests as a specification. If your system can break (or have material bugs) without your tests failing, then you don't have enough test coverage. If one single point of failure causes many tests to break, you probably have too much (or are too tightly coupled).
This is really hard to define in an objective way. I suppose I'd say err on the side of testing too much. Then when tests start to annoy you, those are the particular tests to refactor/repurpose (because they are too brittle, or test the wrong thing, and their failures aren't useful).
A few tips:
Each testcase should only test one thing. That means that the structure of the testcase should be "setup", "execute", "assert". In your examples, you mix these phases. Try splitting your test-methods up. That makes it easier to see exactly what you are testing.
Try giving your test-methods a name that describes what it is testing. I.e. the three testcases contained in your ContainsItem() becomes: containsReportsFalseIfTheItemHasNotBeenAdded(), containsReportsTrueIfTheItemHasBeenAdded(), containsReportsFalseIfTheItemHasBeenAddedThenRemoved(). I find that forcing myself to come up with a descriptive name like that helps me conceptualize what I have to test before I code the actual test.
If you do TDD, you should write your test firsts and only add code to your implementation when you have a failing test. Even if you don't actually do this, it will give you an idea of how many tests are enough. Alternatively use a coverage tool. For a simple class like a container, you should aim for 100% coverage.
Is _list an instance of a class you wrote? If so, I'd say testing it is reasonable. Though in that case, why are you building a custom List class?
If it's not code you wrote, don't test it unless you suspect it's in some way buggy.
I try to test code that's independent and modular. If there's some sort of God-function in code I have to maintain, I strip out as much of it as possible into sub-functions and test them independantly. Then the God function can be written to be "obviously correct" -- no branches, no logic, just passing results from one well-tested subfunction to another.