We have a suite of applications developed in C# and C++ and using SQL Server as the back end. Integration tests are developed with NUnit, and they take more than two minutes to run. To speed up integration tests, we are using the following:
Tests run on the same workstation, so no network delays
Test databases are created on DataRam RAM Disk, which is fast
Test fixtures run in parallel, currently up to four at a time
Most test data is bulk loaded using table-valued parameters.
What else can be done to speed up automated integration tests?
I know this question is very, very old but I'll post my answer anyway.
It may sound stupid but: Write less integration tests and more unit tests. Integration test only at your applications boundaries (as in "when you pass control to code you do not own").
My opinion on this is inspired by J.B. Reinsberger. If you want, you can listen to a talk he gave on this topic. He is way better in explaining this than I am. Here is a link to the video:
http://vimeo.com/80533536
I do not like this answer, write less integration tests as it is wrong. Our application is data heavy. Most of our code is logic around the data. So without integration tests we have just trivial unit tests (which i think should still be written).
Our Integration Tests run for 1 hour. Thousands of tests. They have brought us a tremendous value.
I think you should analyse the slow tests and why they are slow. Look if multiple tests can reuse the data without dropping and recreating it from scratch.
Divide tests into areas so you do not always need to run every test.
Use an existing database snapshot instead of recreating the database.
Related
in our team recently the question was raised if using h2db for integration tests is a bad practice/should be avoided if the production environment relies on a different database engine, in our case MySQL8.
I'm not sure if I agree with that, considering we are using spring boot/hibernate for our backends.
I did some reading and came across this article https://phauer.com/2017/dont-use-in-memory-databases-tests-h2/ stating basically the following (and more):
TL;DR
Using in-memory databases for tests reduce the reliability and
scope of your tests. Your application’s SQL may fail in production
against the real database, although the h2-based tests are green.
They provide not the same features as the real database. Possible
consequences are:
You change the application’s SQL code just to make
it run in both the real and the in-memory database. This may result in
less effective, elegant, accurate or maintainable implementations. Or
you can’t do certain things at all.
You skip the tests for some
features completely.
As far as I can tell for a simple CRUD application with some business logic all those points does'n concern me (there some more in the article), because hibernate wraps away all the SQL and there is no native SQL in the code.
Are there any points that i am overlooking or have not considered that speak against an h2db? Is there a "best practice" regarding the usage of in-memory db for integration tests with spring boot/hibernate?
I'd avoid using H2 DB if possible. Using H2DB is good, when you can't run your own instance, for example if your company uses stuff like Oracle and won't let you run your own DB wherever you want (local machine, own dev server...).
Problems with H2DB are following:
Migration scripts may different for H2DB and your DB. You'll probably have to have some tweaks for H2DB scripts and MySQL scripts.
H2DB usually doesn't provide same features, like real RDBMS, you degrade the DB for using only SQL, you won't be able to test store procedures, triggers and all the fancy stuff that may come handy.
The H2DB and other RDBMS are different. Tests won't be testing the same thing, you may get some errors in production that won't appear in your tests.
Speaking of your simple CRUD application - it may not stay like that forever.
But go ahead with any approach you like, it is best to get your personal experience yourself, I got burned on H2DB too often to like it.
I would say it depends on the scope of your tests and what you can afford for your integration tests. I would prefer testing against an as close as possible environment to my production environment. But that's the ideal case, in reality that might not be possible for a varied reasons. Also, expecting hibernate to abstract away low level details perfectly is also an ideal case, in reality the abstraction may be giving you a false sense of security.
If the scope of your tests is to just test CRUD operations, an in-memory tests should be fine. It will perform in that scope quite adequately. It might even be beneficial reducing time of your tests, as well as some degree of complexity. It wont detect any platform/version/vendor specific issues, but that wasnt the scope of the test anyways. You can rather test those things in a staging environment before going to production.
In my opinion, it's now easier than ever to create a test environment as close as possible to your production environment using things like docker, CI/CD tools/platform also support spinning up services for that purpose. If this isn't available or too complicated for your use case, then the fallback is acceptable.
From experience, I had faced failures related to platform/version/vendor specific issues when deploying to production though all my tests against in-memory database went green. It's always better to detect these issues early and save a lot of recurrent development time and most importantly your good night sleep.
We have an application running on Symfony 2.8 with a package named "liip/functional-test-bundle". We plan on using PHP Unit to run functional tests on our application, which uses MySQL for it's database.
The 'functional test bundle' package allows us to use the entities as a schema builder for an in-memory SQLite database, which is very handy because:
It requires zero configuration to run
It's extremely fast to run tests
Our tests can be run independently from each test and the development data
Unfortunately, some of our entities use 'enums' which is not supported by SQLite, and our technical lead has opted to keep existing enums whilst refraining from using them anymore.
Ideally we need this in the project sooner rather than later, so the team can start writing new tests in the future to help maintain the stability of the application.
I have 3 options at this point, but I need help choosing the correct one and performing it correctly:
Convince the technical lead that enums are a bad idea and lookup tables could instead be used (Which may cost time where the workload is already high)
Switch to using MySQL for the testing database. (This will require additional configuration for our tests to run, and may be slower)
Have doctrine detect when enums are used on a SQLite driver, and switch them out for strings. (I would have no idea how to do this, but this is, in my opinion, the most ideal solution)
Which action is the best, and how should I carry it out?
I have an exe configured under windows scheduler to perform timely operations on a set of data.
The exe calls stored procs to retrieve data and perform some calcualtions and updates the data back to a different database.
I would like to know, what are the pros and cons of using SSIS package over scheduled exe.
Do you mean what are the pros and cons of using SQL Server Agent Jobs for scheduling running SSIS packages and command shell executions? I don't really know the pros about windows scheduler, so I'll stick to listing the pros of SQL Server Agent Jobs.
If you are already using SQL Server Agent Jobs on your server, then running SSIS packages from the agent consolidates the places that you need to monitor to one location.
SQL Server Agent Jobs have built in logging and notification features. I don't know how Windows Scheduler performs in this area.
SQL Server Agent Jobs can run more than just SSIS packages. So you may want to run a T-SQL command as step 1, retry if it fails, eventually move to step 2 if step 1 succeeds, or stop the job and send an error if the step 1 condition is never met. This is really useful for ETL processes where you are trying to monitor another server for some condition before running your ETL.
SQL Server Agent Jobs are easy to report on since their data is stored in the msdb database. We have regualrly scheduled subscriptions for SSRS reports that provide us with data about our jobs. This means I can get an email each morning before I come into the office that tells me if everything is going well or if there are any problems that need to be tackled ASAP.
SQL Server Agent Jobs are used by SSRS subscriptions for scheduling purposes. I commonly need to start SSRS reports by calling their job schedules, so I already have to work with SQL Server Agent Jobs.
SQL Server Agent Jobs can be chained together. A common scenario for my ETL is to have several jobs run on a schedule in the morning. Once all the jobs succeed, another job is called that triggers several SQL Server Agent Jobs. Some jobs run in parallel and some run serially.
SQL Server Agent Jobs are easy to script out and load into our source control system. This allows us to roll back to earlier versions of jobs if necessary. We've done this on a few occassions, particularly when someone deleted a job by accident.
On one ocassion we found a situation where Windows Scheduler was able to do something we couldn't do with a SQL Server Agent Job. During the early days after a SAN migration we had some scripts for snapshotting and cloning drives that didn't work in a SQL Server Agent Job. So we used a Windows Scheduler task to run the code for a while. After about a month, we figured out what we were missing and were able to move the step back to the SQL Server Agent Job.
Regarding SSIS over exe stored procedure calls.
If all you are doing is running stored procedures, then SSIS may not add much for you. Both approaches work, so it really comes down to the differences between what you get from a .exe approach and SSIS as well as how many stored procedures that are being called.
I prefer SSIS because we do so much on my team where we have to download data from other servers, import/export files, or do some crazy https posts. If we only had to run one set of processes and they were all stored procedure calls, then SSIS may have been overkill. For my environment, SSIS is the best tool for moving data because we move all kinds of types of data to and from the server. If you ever expect to move beyond running stored procedures, then it may make sense to adopt SSIS now.
If you are just running a few stored procedures, then you could get away with doing this from the SQL Server Agent Job without SSIS. You can even parallelize jobs by making a master job start several jobs via msdb.dbo.sp_start_job 'Job Name'.
If you want to parallelize a lot of stored procedure calls, then SSIS will probably beat out chaining SQL Server Agent Job calls. Although chaining is possible in code, there's no visual surface and it is harder to understand complex chaining scenarios that are easy to implement in SSIS with sequence containers and precedence constraints.
From a code maintainability perspective, SSIS beats out any exe solution for my team since everyone on my team can understand SSIS and few of us can actually code outside of SSIS. If you are planning to transfer this to someone down the line, then you need to determine what is more maintainable for your environment. If you are building in an environment where your future replacement will be a .NET programmer and not a SQL DBA or Business Intelligence specialist, then SSIS may not be the appropriate code-base to pass on to a future programmer.
SSIS gives you out of the box logging. Although you can certainly implement logging in code, you probably need to wrap everything in try-catch blocks and figure out some strategy for centralizing logging between executables. With SSIS, you can centralize logging to a SQL Server table, log files in some centralized folder, or use another log provider. Personally, I always log to the database and I have SSRS reports setup to help make sense of the data. We usually troubleshoot individual job failures based on the SQL Server Agent Job history step details. Logging from SSIS is more about understanding long-term failure patterns or monitoring warnings that don't result in failures like removing data flow columns that are unused (early indicator for us of changes in the underlying source data structure) or performance metrics (although stored procedures also have a separate form of logging in our systems).
SSIS give you a visual design surface. I mentioned this before briefly, but it is a point worth expanding upon on its own. BIDS is a decent design surface for understanding what's running in what order. You won't get this from writing do-while loops in code. Maybe you have some form of a visualizer that I've never used, but my experience with coding stored procedure calls always happened in a text editor, not in a visual design layer. SSIS makes it relatively easy to understand precedence and order of operations in the control flow which is where you would be working if you are using execute sql tasks.
The deployment story for SSIS is pretty decent. We use BIDS Helper (a free add-in for BIDS), so deploying changes to packages is a right click away on the Solution Explorer. We only have to deploy one package at a time. If you are writing a master executable that runs all the ETL, then you probably have to compile the code and deploy it when none of the ETL is running. SSIS packages are modular code containers, so if you have 50 packages on your server and you make a change in one package, then you only have to deploy the one changed package. If you setup your executable to run code from configuration files and don't have to recompile the whole application, then this may not be a major win.
Testing changes to an individual package is probably generally easier than testing changes in an application. Meaning, if you change one ETL process in one part of your code, you may have to regression test (or unit test) your entire application. If you change one SSIS package, you can generally test it by running it in BIDS and then deploying it when you are comfortable with the changes.
If you have to deploy all your changes through a release process and there are pre-release testing processes that you must pass, then an executable approach may be easier. I've never found an effective way to automatically unit test a SSIS package. I know there are frameworks and test harnesses for doing this, but I don't have any experience with them so I can't speak for the efficacy or ease of use. In all of my work with SSIS, I've always pushed the changes to our production server within minutes or seconds of writing the changes.
Let me know if you need me to elaborate on any points. Good luck!
If you have dependency on Windows features, like logging, eventing, access to windows resources- go windows scheduler/windows services route. If it is just db to db movement or if you need some kind of heavy db function usage- go SSIS route.
I am very new to stress testing and am just trying to learn the ropes. So my questions are:
If I have a development server which in terms of software is identical but in terms of hardware has a much lower spec that the production server, is it worth stress testing the development server to identify obvious software defects?
How is it best to stress test a live production server without potentially jeopardising the experience of you users? Or should stress testing a live production server be avoided.
Here are various tips/suggestions:
If your application is new, so you don't know if it can handle the load it will have in production, then you need to do "capacity" testing. You should do your capacity testing on your production hardware, which since it hasn't gone "live" yet, won't affect users.
If your application is an existing application that is already deployed in production then what you should be doing is "performance regression" testing.
A performance regression test consists of doing a stress test of all the individual "features" (whatever that means for your application) on your development server to measure it's performance. You keep a record of the results as your "baseline".
As you make changes to your application, re-run your performance regression tests to see if any results have changed significantly from the baseline (and record the new numbers as your new baseline).
If the performance regression results on your development server didn't change much from the baseline then you should be safe to deploy to production without your server utilization changing (i.e. getting overloaded).
I think you should avoid any work including stress testing on production machines unless you know you have a problem that you can't reproduce in your test environment - that said maybe you know your users don't use the system during the night? If the tests are non intrusibe/read only then I'd say it's an additional option.
As to the analyzing performance on a weeker machine it's not so bad - most bottlenecks are caused by bad architecture of your system and should be visible on different hardware configurations, just at different load scenarios - it may be even easier to notice the problems on a weeker machine so I'd say stress test and optimize on your development system and you'll know that at least theoreticaly your production system should be even better.
I use Hudson to automate the testing of a very large important product. I want to have my testing-hosts able to run as many concurrent builds as they will theoretically support with the exception of excel-tests which must only run one per machine at any time. Any number of non-excel tests can run concurrently, however at most one excel test at a time must run per machine.
Background:
Most of my tests are normal unit-tests - the sort of thing that I can easily run in parallel. Unfortunately a substantial and time consuming part of my unit-testing plan consists of tests which have been implemented in Excel.
You might think it crazy to implement a test in Excel - actually there's an important reason: Most of our users access our system via a Excel. Excel has it's own quirky ways of handling data so the only way to guarantee that our stuff works for Excel users is to literally implement our reg-test our application Excel.
I've written a test-runner tool which allows me to easily fire off a group of excel tests: Each test is a single .xls file. Each group is a folder full of excel files. I've got about 30 groups which need to be run for an end-to-end test. My tool converts the result of each of the tests into JUnit style XML which Hudson is able to understand. The tests use the pywin32com library to automate excel. When run on their own they are reliable.
I've got a group of computers which are dedicated to running tests. Each machine is quad-core and can theoretically run quite a lot of stuff at once. Unfortunately I've found that COM cannot be used to safely control more than 1 excel per machine at a time.
That is to say if a 2nd build stars which tries to talk to Excel via COM it might interfere with the one which is already running and cause both tests to fail.
I can run as many other non-excel processes as the machine will allow but I need to find a way so that Hudson does not attempt to launch any more than 1 process which requires excel on any one machine concurrently.
Sounds like the Locks and Latches plugin might help you.
http://hudson.gotdns.com/wiki/display/HUDSON/Locks+and+Latches+plugin
Isn't hudson java?
Since you've tagged this post python, I'll point out that buildbot, has slave locks to limit individual steps on individual slaves (or use them as more coarse locks if you'd like).