Converting a mock to JSON in Spock - json

One of the objects I've mocked must be converted into JSON but Spock does not seem to support the mocking of convertions. How can I choose which JSON will be returned?
Example of what I would like to achieve:
def "convert as JSON"()
{
when:
def product = Mock(Product)
println(product as JSON)
then:
1* (product as JSON) << (["message": "message"] as JSON)
}
This does not work however.
EDIT: Mocking the way the object is converted into JSON is useful, because what I want to achieve is to test a method of another class, that takes a product as argument and use it, calling "as JSON" on the product during it's execution. Since the products can be complex and have lots of dependencies and fields, I prefer to mock them. Spock then gives control over the output of the mocked products methods but it gets trickier when conversion is needed...

In your test, you're trying to reduce the complexity of an object (Product) to make your tests more simple. This is dangerous for two reasons:
Complicated tests are a code smell. They tell you "something is wrong". Trying to apply lots of deodorant on the smell will make things worse.
You're testing scenarios which can't happen in production.
The clean/better solution would be to refactor Product until it can be created easily and you don't need to mock it anymore. From what I know about your specific case, Product is a data object (like Integer, Long, BigDecimal). It just encodes state without much functionality of its own.
If that's true, it should be simple to create test cases without mocking. If you need mocking for data objects, then something is wrong with your code. Mocking is only needed for things like services - code which acts upon data objects and which has external dependencies which you need to cut for a test.
The second argument is that you're writing tests that pass but which don't tell a story. It's a complex form of having 10'000 tests that only contain assertTrue(true);. While it's a nice thing to have in terms of test count, it doesn't give you a single advantage over not having them at all.

Related

Composition in REST and consistence of the inserted data

How to properly design REST if you have a composition? I have a TestResult entity, which has TestCaseResults entities. Both support full set of REST methods. The important fact about this (which I believe differs from many examples I found on a web) is that TestResult is not consistent if it doesn't have all of TestCaseResults How do I properly design this in REST?
Let's say I create it as separate but dependent resources: api\testresults\ and api\testresults\1\testcaseresults. When the client wants to create a test result, he needs to POST to api\testresults, then retrieve URL api\testresults\1\testcaseresutls by a link from the response, and POST all of test case results to it. This means that at some point in time the test result is not consistent until the user finishes its operation. Basically, there is no concept of the transaction here.
Let's say I create only api\testresults resource, and embed an array of test case results inside, like this:
{
"Name": "Test A"
"Results": [
{
"Measured": "BB",
...
},
...
]
...
}
Then it is easier to insert, but it still hard to work with. Simple GET to api\testresults\1\ will retrieve test result with a big amount of test case results. GET to api\testresults\ will retrieve much more! The structure of this becomes complex. Furthermore, in the real word I have a few entities like TestCaseResults belong to TestResults, so there will be a few arrays, and each could have 100-200 elements.
I could try to combine the approaches. Embed the array, but also provide links to api\testresults\1\testcaseresults and support operations there as well. Maybe on GET api\testresults\1\ I could provide TestResult without it's TestCaseResults but only with a link pointing to a resource, but on POST I could accept an array of TestCaseResults embedded (not sure though it is allowed to have different return types for POST and GET in REST) But now there are two approaches for inserting information, it is confusing and I'm still not sure it solves anything.
your approach with api\testresults\1 and api\testresults\1\testcaseresults seems promising.
As JSON does not have a fixed structure, you can add query parameters to your URL to control if results are inserted or not.
api\testresults\1?with_results=true would mean that your caller want to see the test cases in addition to the test results.
api\testresults\1\testcaseresults would still return the test case results for your test 1.
If you fear that the number of test case results is too large, you can add pagination parameters, that would be reuse in the testcaseresults call.
api\testresults\1?with_results=true&per_page=10 would include the only the 10 first results. To get more, use api\testresults\1\testcaseresults?per_page=10&page=2 and so on, as it is the dedicated endpoint.
Cheers
Note: if you want a flexible API still returning JSON data, you can give a look to GraphQL, the trendy approach.

How to best validate JSON on the server-side

When handling POST, PUT, and PATCH requests on the server-side, we often need to process some JSON to perform the requests.
It is obvious that we need to validate these JSONs (e.g. structure, permitted/expected keys, and value types) in some way, and I can see at least two ways:
Upon receiving the JSON, validate the JSON upfront as it is, before doing anything with it to complete the request.
Take the JSON as it is, start processing it (e.g. access its various key-values) and try to validate it on-the-go while performing business logic, and possibly use some exception handling to handle vogue data.
The 1st approach seems more robust compared to the 2nd, but probably more expensive (in time cost) because every request will be validated (and hopefully most of them are valid so the validation is sort of redundant).
The 2nd approach may save the compulsory validation on valid requests, but mixing the checks within business logic might be buggy or even risky.
Which of the two above is better? Or, is there yet a better way?
What you are describing with POST, PUT, and PATCH sounds like you are implementing a REST API. Depending on your back-end platform, you can use libraries that will map JSON to objects which is very powerful and performs that validation for you. In JAVA, you can use Jersey, Spring, or Jackson. If you are using .NET, you can use Json.NET.
If efficiency is your goal and you want to validate every single request, it would be ideal if you could evaluate on the front-end if you are using JavaScript you can use json2.js.
In regards to comparing your methods, here is a Pro / Cons list.
Method #1: Upon Request
Pros
The business logic integrity is maintained. As you mentioned trying to validate while processing business logic could result in invalid tests that may actually be valid and vice versa or also the validation could inadvertently impact the business logic negatively.
As Norbert mentioned, catching the errors before hand will improve efficiency. The logical question this poses is why spend the time processing, if there are errors in the first place?
The code will be cleaner and easier to read. Having validation and business logic separated will result in cleaner, easier to read and maintain code.
Cons
It could result in redundant processing meaning longer computing time.
Method #2: Validation on the Go
Pros
It's efficient theoretically by saving process and compute time doing them at the same time.
Cons
In reality, the process time that is saved is likely negligible (as mentioned by Norbert). You are still doing the validation check either way. In addition, processing time is wasted if an error was found.
The data integrity can be comprised. It could be possible that the JSON becomes corrupt when processing it this way.
The code is not as clear. When reading the business logic, it may not be as apparent what is happening because validation logic is mixed in.
What it really boils down to is Accuracy vs Speed. They generally have an inverse relationship. As you become more accurate and validate your JSON, you may have to compromise some on speed. This is really only noticeable in large data sets as computers are really fast these days. It is up to you to decide what is more important given how accurate you think you data may be when receiving it or whether that extra second or so is crucial. In some cases, it does matter (i.e. with the stock market and healthcare applications, milliseconds matter) and both are highly important. It is in those cases, that as you increase one, for example accuracy, you may have to increase speed by getting a higher performant machine.
Hope this helps.
The first approach is more robust, but does not have to be noticeably more expensive. It becomes way less expensive even when you are able to abort the parsing process due to errors: Your business logic usually takes >90% of the resources in a process, so if you have an error % of 10%, you are already resource neutral. If you optimize the validation process so that the validations from the business process are performed upfront, your error rate might be much lower (like 1 in 20 to 1 in 100) to stay resource neutral.
For an example on an implementation assuming upfront data validation, look at GSON (https://code.google.com/p/google-gson/):
GSON works as follows: Every part of the JSON can be cast into an object. This object is typed or contains typed data:
Sample object (JAVA used as example language):
public class someInnerDataFromJSON {
String name;
String address;
int housenumber;
String buildingType;
// Getters and setters
public String getName() { return name; }
public void setName(String name) { this.name=name; }
//etc.
}
The data parsed by GSON is by using the model provided, already type checked.
This is the first point where your code can abort.
After this exit point assuming the data confirmed to the model, you can validate if the data is within certain limits. You can also write that into the model.
Assume for this buildingType is a list:
Single family house
Multi family house
Apartment
You can check data during parsing by creating a setter which checks the data, or you can check it after parsing in a first set of your business rule application. The benefit of first checking the data is that your later code will have less exception handling, so less and easier to understand code.
I would definitively go for validation before processing.
Let's say you receive some json data with 10 variables of which you expect:
the first 5 variables to be of type string
6 and 7 are supposed to be integers
8, 9 and 10 are supposed to be arrays
You can do a quick variable type validation before you start processing any of this data and return a validation error response if one of the ten fails.
foreach($data as $varName => $varValue){
$varType = gettype($varValue);
if(!$this->isTypeValid($varName, $varType)){
// return validation error
}
}
// continue processing
Think of the scenario where you are directly processing the data and then the 10th value turns out to be of invalid type. The processing of the previous 9 variables was a waste of resources since you end up returning some validation error response anyway. On top of that you have to rollback any changes already persisted to your storage.
I only use variable type in my example but I would suggest full validation (length, max/min values, etc) of all variables before processing any of them.
In general, the first option would be the way to go. The only reason why you might need to think of the second option is if you were dealing with JSON data which was tens of MBs large or more.
In other words, only if you are trying to stream JSON and process it on the fly, you will need to think about second option.
Assuming that you are dealing with few hundred KB at most per JSON, you can just go for option one.
Here are some steps you could follow:
Go for a JSON parser like GSON that would just convert your entire
JSON input into the corresponding Java domain model object. (If GSON
doesn't throw an exception, you can be sure that the JSON is
perfectly valid.)
Of course, the objects which were constructed using GSON in step 1
may not be in a functionally valid state. For example, functional
checks like mandatory fields and limit checks would have to be done.
For this, you could define a validateState method which repeatedly
validates the states of the object itself and its child objects.
Here is an example of a validateState method:
public void validateState(){
//Assume this validateState is part of Customer class.
if(age<12 || age>150)
throw new IllegalArgumentException("Age should be in the range 12 to 120");
if(age<18 && (guardianId==null || guardianId.trim().equals(""))
throw new IllegalArgumentException("Guardian id is mandatory for minors");
for(Account a:customer.getAccounts()){
a.validateState(); //Throws appropriate exceptions if any inconsistency in state
}
}
The answer depends entirely on your use case.
If you expect all calls to originate in trusted clients then the upfront schema validation should be implement so that it is activated only when you set a debug flag.
However, if your server delivers public api services then you should validate the calls upfront. This isn't just a performance issue - your server will likely be scrutinized for security vulnerabilities by your customers, hackers, rivals, etc.
If your server delivers private api services to non-trusted clients (e.g., in a closed network setup where it has to integrate with systems from 3rd party developers), then you should at least run upfront those checks that will save you from getting blamed for someone else's goofs.
It really depends on your requirements. But in general I'd always go for #1.
Few considerations:
For consistency I'd use method #1, for performance #2. However when using #2 you have to take into account that rolling back in case of non valid input may become complicated in the future, as the logic changes.
Json validation should not take that long. In python you can use ujson for parsing json strings which is a ultrafast C implementation of the json python module.
For validation, I use the jsonschema python module which makes json validation easy.
Another approach:
if you use jsonschema, you can validate the json request in steps. I'd perform an initial validation of the most common/important parts of the json structure, and validate the remaining parts along the business logic path. This would allow to write simpler json schemas and therefore more lightweight.
The final decision:
If (and only if) this decision is critical I'd implement both solutions, time-profile them in right and wrong input condition, and weight the results depending on the wrong input frequency. Therefore:
1c = average time spent with method 1 on correct input
1w = average time spent with method 1 on wrong input
2c = average time spent with method 2 on correct input
2w = average time spent with method 2 on wrong input
CR = correct input rate (or frequency)
WR = wrong input rate (or frequency)
if ( 1c * CR ) + ( 1w * WR) <= ( 2c * CR ) + ( 2w * WR):
chose method 1
else:
chose method 2

Is it OK to have multiple assertions in a unit test when testing complex behavior?

Here is my specific scenario.
I have a class QueryQueue that wraps the QueryTask class within the ArcGIS API for Flex. This enables me to easily queue up multiple query tasks for execution. Calling QueryQueue.execute() iterate through all the tasks in my queue and call their execute method.
When all the results have been received and processed QueryQueue will dispatch the completed event. The interface to my class is very simple.
public interface IQueryQueue
{
function get inProgress():Boolean;
function get count():int;
function get completed():ISignal;
function get canceled():ISignal;
function add(query:Query, url:String, token:Object = null):void;
function cancel():void;
function execute():void;
}
For the QueryQueue.execute method to be considered successful several things must occur.
task.execute must be called on each query task once and only once
inProgress = true while the results are pending
inProgress = false when the results have been processed
completed is dispatched when the results have been processed
canceled is never called
The processing done within the queue correctly processes and packages the query results
What I am struggling with is breaking these tests into readable, logical, and maintainable tests.
Logically I am testing one state, that is the successful execution state. This would suggest one unit test that would assert #1 through #6 above are true.
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
However, the name of the test is not informative as it does not describe all the things that must be true in order to be considered a passing test.
Reading up online (including here and at programmers.stackexchange.com) there is a sizable camp that asserts that unit tests should only have one assertion (as a guideline). As a result when a test fails you know exactly what failed (i.e. inProgress not set to true, completed displayed multiple times, etc.) You wind up with potentially a lot more (but in theory simpler and clearer) tests like so:
[Test] public mustInvokeExecuteForEachQueryTaskWhenQueueIsNotEmpty():void
[Test] public mustBeInProgressWhenResultsArePending():void
[Test] public mustNotInProgressWhenResultsAreProcessedAndSent:void
[Test] public mustDispatchTheCompletedEventWhenAllResultsProcessed():void
[Test] public mustNeverDispatchTheCanceledEventWhenNotCanceled():void
[Test] public mustReturnQueryQueueEventArgsWithResultsAndNoErrorsWhenAllQueriesAreSuccessful:void
// ... and so on
This could wind up with a lot of repeated code in the tests, but that could be minimized with appropriate setup and teardown methods.
While this question is similar to other questions I am looking for an answer for this specific scenario as I think it is a good representation of a complex unit testing scenario exhibiting multiple states and behaviors that need to be verified. Many of the other questions have, unfortunately, no examples or the examples do not demonstrate complex state and behavior.
In my opinion, and there will probably be many, there are a couple of things here:
If you must test so many things for one method, then it could mean your code might be doing too much in one single method (Single Responsibility Principle)
If you disagree with the above, then the next thing I would say is that what you are describing is more of an integration/acceptance test. Which allows for multiple asserts, and you have no problems there. But, keep in mind that this might need to be relegated to a separate section of tests if you are doing automated tests (safe versus unsafe tests)
And/Or, yes, the preferred method is to test each piece separately as that is what a unit test is. The closest thing I can suggest, and this is about your tolerance for writing code just to have perfect tests...Is to check an object against an object (so you would do one assert that essentially tests this all in one). However, the argument against this is that, yes it passes the one assert per test test, but you still lose expressiveness.
Ultimately, your goal should be to strive towards the ideal (one assert per unit test) by focusing on the SOLID principles, but ultimately you do need to get things done or else there is no real point in writing software (my opinion at least :)).
Let's focus on the tests you have identified first. All except the last one (mustReturnQueryQueueEventArgs...) are good ones and I could immediatelly tell what's being tested there (and that's very good sign, indicating they're descriptive and most likely simple).
The only problem is your last test. Note that extensive use of words "and", "with", "or" in test name usually rings problems bell. It's not very clear what it's supposed to do. Return correct results comes to mind first, but one might argue it's vague term? This holds true, it is vague. However you'll often find out that this is indeed pretty common requirement, described in details by method/operation contract.
In your particular case, I'd simplify last test to verify whether correct results are returned and that would be all. You tested states, events and stuff that lead to results building already, so there is no need to that again.
Now, advices in links you provided are quite good ones actually, and generally, I suggest sticking to them (single assertion for one test). The question is, what single assertion really stands for? 1 line of code at the end of test? Let's consider this simple example then:
// a method which updates two fields of our custom entity, MyEntity
public void Update(MyEntity entity)
{
entity.Name = "some name";
entity.Value = "some value";
}
This method contract is to perform those 2 operations. By success, we understand entity to be correctly updated. If one of them for some reasons fails, method as a unit is considered to fail. You can see where this is going; you'll either have two assertions or write custom comparer purely for testing purposes.
Don't be tricked by single assertion; it's not about lines of code or number of asserts (however, in majority of tests you'll write this will indeed map 1:1), but about asserting single unit (in the example above, update is considered to be an unit). And unit might be in reality multiple things that don't make any sense at all without eachother.
And this is exactly what one of questions you linked quotes (by Roy Osherove):
My guideline is usually that you test one logical CONCEPT per test. you can have multiple asserts on the same object. they will usually be the same concept being tested.
It's all about concept/responsibility; not the number of asserts.
I am not familiar with flex, but I think I have good experience in unit testing, so you have to know that unit test is a philosophy, so for the first answer, yes you can make a multiple assert but if you test the same behavior, the main point always in unit testing is to be very maintainable and simple code, otherwise the unit test will need unit test to test it! So my advice to you is, if you are new in unit testing, don't use multiple assert, but if you have good experience with unit testing, you will know when you will need to use them

Is it okay to rely on automatic pass-by-reference to mutate objects?

I'm working in Python here (which is actually pass-by-name, I think), but the idea is language-agnostic as long as method parameters behave similarly:
If I have a function like this:
def changefoo(source, destination):
destination["foo"] = source
return destination
and call it like so,
some_dict = {"foo": "bar"}
some_var = "a"
new_dict = changefoo(some_var, some_dict)
new_dict will be a modified version of some_dict, but some_dict will also be modified.
Assuming the mutable structure like the dict in my example will almost always be similarly small, and performance is not an issue (in application, I'm taking abstract objects and changing into SOAP requests for different services, where the SOAP request will take an order of magnitude longer than reformatting the data for each service), is this okay?
The destination in these functions (there are several, it's not just a utility function like in my example) will always be mutable, but I like to be explicit: the return value of a function represents the outcome of a deterministic computation on the parameters you passed in. I don't like using out parameters but there's not really a way around this in Python when passing mutable structures to a function. A couple options I've mulled over:
Copying the parameters that will be mutated, to preserve the original
I'd have to copy the parameters in every function where I mutate them, which seems cumbersome and like I'm just duplicating a lot. Plus I don't think I'll ever actually need the original, it just seems messy to return a reference to the mutated object I already had.
Just use it as an in/out parameter
I don't like this, it's not immediately obvious what the function is doing, and I think it's ugly.
Create a decorator which will automatically copy the parameters
Seems like overkill
So is what I'm doing okay? I feel like I'm hiding something, and a future programmer might think the original object is preserved based on the way I'm calling the functions (grabbing its result rather than relying on the fact that it mutates the original). But I also feel like any of the alternatives will be messy. Is there a more preferred way? Note that it's not really an option to add a mutator-style method to the class representing the abstract data due to the way the software works (I would have to add a method to translate that data structure into the corresponding SOAP structure for every service we send that data off too--currently the translation logic is in a separate package for each service)
If you have a lot of functions like this, I think your best bet is to write a little class that wraps the dict and modifies it in-place:
class DictMunger(object):
def __init__(self, original_dict):
self.original_dict = original_dict
def changefoo(source)
self.original_dict['foo'] = source
some_dict = {"foo": "bar"}
some_var = "a"
munger = DictMunger(some_dict)
munger.changefoo(some_var)
# ...
new_dict = munger.original_dict
Objects modifying themselves is generally expected and reads well.

OOP Design Question - Linking to precursors

I'm writing a program to do a search and export the output.
I have three primary objects:
Request
SearchResults
ExportOutput
Each of these objects links to its precursor.
Ie: ExportOutput -> SearchResults -> Request
Is this ok? Should they somehow be more loosely coupled?
Clarification:
Processes later on do use properties and methods on the precursor objects.
Ie:
SendEmail(output.SearchResults.Request.UserEmail, BODY, SUBJECT);
This has a smell even to me. The only way I can think to fix it is have hiding properties in each one, that way I'm only accessing one level
MailAddress UserEmail
{
get { return SearchResults.UserEmail; }
}
which would yeild
SendEmail(output.UserEmail, BODY, SUBJECT);
But again, that's just hiding the problem.
I could copy everything out of the precursor objects into their successors, but that would make ExportOutput really ugly. Is their a better way to factor these objects.
Note: SearchResults implements IDisposable because it links to unmanaged resources (temp files), so I really don't want to just duplicate that in ExportOutput.
If A uses B directly, you cannot:
Reuse A without also reusing B
Test A in isolation from B
Change B without risking breaking A
If instead you designed/programmed to interfaces, you could:
Reuse A without also reusing B - you just need to provide something that implements the same interface as B
Test A in isolation from B - you just need to substitute a Mock Object.
Change B without risking breaking A - because A depends on an interface - not on B
So, at a minimum, I recommend extracting interfaces. Also, this might be a good read for you: the Dependency Inversion Principle (PDF file).
Without knowing your specifics, I would think that results in whatever form would simply be returned from a Request's method (might be more than one such method from a configured Request, like find_first_instance vs. find_all_instances). Then, an Exporter's output method(s) would take results as input. So, I am not envisioning the need to link the objects at all.