How to test SQL queries/reports? - mysql

We are developing a Rails app, that has quite a few pages with data reports. A typical reporting page is based on a relatively big SQL query, usually involving 5–8 table joins.
The cornerstone question we've stumbled upon is – writing integration tests reports pages. A common integration test of ours looks like this:
creating a bunch of records in the DB via factory_girl in the test setup,
fire up a capybara scenario, where a user logs in, advances to the page with report, and sees the right data in it.
As the app grows and we get to create more of such reports pages, we've started to run into the following problem - the setup for each individual test ends up being too big, complex and generally hard to read and maintain.
Creating such a test significantly raises the bar for a developer in delivering a feature, related to reporting, as it is very time-consuming and not optimized for happiness. However, we still need to make sure our reports are correct.
Therefore, my questions are:
should or should we not test pages with reports?
if we should test the reports, then what would be the least painful way to do that?
where are we doing wrong?

1. should or should we not test reports page?
You should definitely test your reports page.
2. if we should test the reports, then what would be the least painful way to do that?
Given the size of the reports, you're probably going to have the following problems:
your tests will become super slow;
your tests will become really hard to read and maintain due to the huge setup;
your tests will stop being updated when reports change.
With this, you'll probably stop maintaining properly your specs.
So, first, you should differentiate between:
testing that UI shows proper results (acceptance) vs
testing that reports are generated correctly (unit and integrated).
The tests for the first scenario, UI, which use Capybara, should test UI and not reports themselves. It'd cover that reports data is showing as they were generated by their respective classes, which make us conclude that you don't need to test the millions of report lines, but rather that the table has the correct columns and headers, pagination is working etc. You'd test that the first, second and maybe last report line are showing properly.
On the other hand, the tests for the second scenario, reports generation, should test that reports are generated. That has nothing to do with the UI, as you could be serving those reports as JSON, HTML, Cap'n Proto and any visualization mean. As an imagination exercise, picture testing reports via JSON responses, then all over again via HTML, then all over again via some other method. It'd become evident that report generation is repeated all over.
This means that report generation is the core and should be tested on its own. Which means you should cover it mainly by unit tests. Tons of them if you need. Huge arrays.
With this setup, you'd have blazingly fast unit tests covering your reports and their edge cases, a few integrated tests making sure report generation pieces are connected properly and a few acceptance tests covering your UI (Capybara).
Remember the Test Pyramid?
3. where are we doing wrong?
I don't have all the details about your setup, but it seems the main misconception is thinking that reports are the pages themselves. Remember that you could generate reports as CSV or XML and they'd still be the same report internally. In software, a report will probably end up being an array with values.
So, next time, think about separating concepts. You have reports generation and you have the UI. Test them separately and then add some tests in between to make sure they're both integrated well.
In the future, say you move to a Single Page JS App™ and iOS app, you'd not have to get rid of your report generation tests, but UI tests would go into the clients. That's proof that UI is different from reports generation.

I will post ideas we've had so far.
Do not do an integration test at all
Instead writing an integration test, write a lower-level functional test, by treating DB interaction as an interaction with a 3rd party API.
This is how it would be:
stub the object that sends a query to DB with a mock of DB result we're expecting,
rely any test that needs the data – on that result mock,
execute a SQL query expecting empty dataset, although verifying that:
the SQL raised no syntax error,
the result object returned correct columns,
the column types are as what we are expecting them to be.
This advantages of this approach are:
the tests setup is no longer a cognitive barrier,
the tests are significantly faster than they were back in the factory_girl epoch,
if something changes in the DB result (column names set or column types), we still catch that by performing a "real" DB request.
Pre-load data fixtures
Instead of setting up complex tree of records for each and every report test, pre-populate a whole "field" of DB records prior to the entire test suite.
This is how it would be:
before a suite (or the part of it which contains tests for reports), populate the DB with a log of test data
test each and every report without any further setup needed.
This advantages of this approach are:
the tests setup is no longer a cognitive barrier – the data is just always there for you,
the tests are much faster than they were back in the factory_girl epoch.
This disadvantage of this approach is:
occasionally, as new reports come up, one will have to go and add more data to the "fixtures" setup, which will very likely to break existing tests. Fixing existing tests may lead to strange / not readable pull request changesets for the new features.

Related

How to maintain updates to a split database in Access

I've been researching this question over the last few days as I prepare to deliver the first of 3 phases with my first system using a split database. I would like your advice as I haven't found enough info to make a full decision yet.
At the moment I'm working in dev on an unsplit database. When I split it in live I'll take a copy of both parts but what do I do with them for phase 2?
I'm thinking that I'll now make them my dev version after relinking the tables (as I've effectively moved the back end) which would then mean that I no longer work with the unsplit database. Is that the right approach?
When it comes to putting phase 2 live I don't think I have any choice other than manually applying table updates to the live back end (once backed up). For the live front end, do I just replace it with my dev front end and then relink the tables or do I export the changes into the live front end? I guess I could do either depending on the number of objects that are changing/new. Is that right? I'll then take copies and make them my dev versions for phase 3.
Finally in dev I have form and report templates and test forms, reports and queries which are not needed in live so do I remove them for each deployment and then add them all back in to the new dev front end or just put them live? Normally I would take them out but there are a lot of them and I don't know of any quick way to add them back in so what do you do?
Primarily my questions are asked from a risk point of view - what steps best reduce the risk of messing things up in live.
Update:
For those of you who are looking for answers on this, in addition to the fine responses below I have since found the following that might also help:
How to Continuously Develop and Deploy an Access 2010 Database Application
At the moment I'm working in dev on an unsplit database. W
Don't, a VERY bad idea. How this works?
Well, for sure at the start, you are building LOTS of new tables, changing relationships, and building tables at a high rate of development.
so, at this point in time, you can develop un-split.
You will then find after some time the rate (and need) to create new tables, and change the so called "database schema" calms down to a dull roar.
At this point your still developing away - not yet deployed.
So, somewhere around this point? You want to split. You REALLY want to do this.
There is a boatload of reasons for this, but several are:
while a split vs non split is "very similar", they are not the same!
Thus, you can't and don't want to develop code that NOT really tested
as to HOW the code will run in the real world
So, many issues can change or crop up during development that is DIFFERENT when run split. So some commands (such as seek()) don't work, and a few other issues can crop up. You don't want to develop for a whole week, then split and now find 20 hard to fix bugs in your code. So, by developing as split as soon as possible, then ANY and ALL issues that come up will be seen as you develop along, and thus can fix, see, and deal with such issues at THAT POINT in time. Much worse is to write a bunch of code, get ready to deploy, and then find new bugs.
Next up:
Having a split system is great, since say a customer might let you remote into their system. You can pull down a copy of their data, re-link your tables to point from your "test data" to real live production data.
Or, say your developing on site. You might want to test some dangerous delete code, or code that modifies the data. So, you can't risk working on production data, so now you re-link and point to your test back end. So, this setup allows you to test code, but MORE important test on a copy of the database with great ease.
And it also allows you to develop off site. You can take the latest front end for their system, maybe get their latest data file, maybe not. but you can now with ease simple change the database that your applcation runs with.
The other big issue? Say your working on site, and have a test database of theirs on the network folder. You write some code, test for a new report. You find it runs SLOW AS A TURTLE. You check your code, maybe add a few indexes, and boom! - your report now runs great.
If you test un-split, then a boatload of performance issues can crop up, but NOT SEEN during the development cycle. Once again, you don't want to develop for weeks or whatever, split, and NOW find a whole bunch of forms and code runs REALLY slow.
So, the goal, the idea here?
You want to get split as SOON as possible?
How soon?
Well, this is one of those things that only you can know!!!
As I stated, at start of development, sure, start out un-split.
Once the table designs are quite solid, then you can split. You then ALWAYS develop as split (and the above list of reasons why is the VERY short list - there are many more reasons).
Now, the problem of course with split? Say you want to add a new column to a table?
Well, it is MORE work, since now ANY AND ALL changes to the data schema are done in the back end. So, you have to close down the front end (FE), open the back end, and now use the table designer to add that one column. (or maybe change or setup a new relationship between some tables. Or maybe add a new table).
This is a "bit more" work, since now you close down the BE, and open the FE, and now you MUST re-link tables. And if you added new tables to the BE, then you have to add that/those new table links.
Because this "dance" is extra work, that is why you wait as long as possible to split. As I stated, you "just know" when that time has arrived to split. (when table and schema tables changes settles down to a low rate of change). Since the rate of change is now low in regards to table changes, then it not much work nor pain to have to do the above every time you want to change the table structures.
In fact, think of any program you buy? It has a applcation part, and then a data file part. In effect, once you split, you have the same two parts, and in fact in some applications I written, they are allowed to use "different" back ends - not unlike any other applcation in which you launch the applcation, and THEN choose the datafile to work with.
So, what about developing off site? Well, that can be REALLY difficult, since you have your own copy of the FE and the BE.
If you HAVE TO make changes to the BE?
I open up a work document. And if I add a new column to say tblCusotmers?
Then I enter this:
Add new column TaxRate, Currency, to table tblCustomers
So, you build up a "log" of changes. Now, when you travel to the customer site, and want to roll out and deploy the new FE? Well, you have to FIRST open up their BE, and make the above changes to their production BE database.
Now in some cases, where is was not possible for me to be on site? (in fact, I had a automatic update system to automatic roll out a new version of my software - and it would automatic down load from the internet. In this case? I had to write code in the FE on startup that would use VBA code to MAKE the changes to the data tables. This can be REALLY hard to do, but is possible. I just recommend the plane jane word document, and you keep track of your changes.
So, the above is quite much how this development process works.
Since you will have to re-link the tables? Then near everyone has googled for a VBA table re-link routines. You want that, since having such code is MUCH easier then say using the linked table manager each time. And we even often have a table in the FE that saves the BE server location, and on startup will check, and if the location of the files don't match, we launch the re-link code.
that way, you can deploy the applcation to different sites, and have it automatic re-link. Another way is to have a simple text file in the same location as the FE on each computer, and on startup read the text file with the BE location - and re-link if required.
So, the typical process to role out a new FE (which is placed on each work station - do NOT break this rule!!!).
So, I point/relink my front end to the production BE. I then compile down to a accDE, and then deploy that new compiled FE to all the work stations. In fact, I have some code in VBA at start up that compares a version number, and if the version number is lower, then the VBA code will copy down the next FE sitting in a folder.
This might not be a big deal if you have 2-4 users. But, if you have two sites, and each has 35 users, then you want to figure out a automated approach.
However, do not prolong jumping over to the split development cycle, as you really for all practical purposes MUST develop in a split environment. So, for the first part, you can develop un-split. But, once you split - that's it, and from that point on-wards, you are to develop as split. There are boatloads of benefits, but it also really quite much standard approach from a developer point of view.
So, you have to master the linked table manager rite quick, and then VERY much consider adding some re-link code, since you want with great ease to point to a different back end - including at deployment time.
so, as a general rule, you should think of your FE like a .exe program, for a new version roll out, yes, you copy (over write) their existing FE's on each work station. And as noted, in most cases, it should be a compiled accDE, and not a un-compiled accDB for the FE.
For reducing risk:
you should have your development version, a test version for live and the live version.
You are developing on develop
Customer tests the changes on test (with test data)
after that you move to live
For the move from develop to test I create an update/migration script.
In this script are all alterations included that needs to be done on the back-end.
I use the script to create the test version and with this I can check if it is working properly.
In case there are database changes that I can't reflect in my script (either insufficient skill or restrictions from db) I add them to my checklist.
I am using version control to see changes during development and to import modules, queries etc. to the new version.
Updating the front end is done via Import of the latest version (without not needed forms / reports).

How to Get Rid of UNUSED Queries in MS ACCESS

I have reviewed the previous Questions and haven't found the answer to the following question,
Is there a Database Tool available in MS Access to run and identify the Queries that are NOT Bring used as a part of my database. We have lots of Queries that are no longer used and I need to clean the database and get rid of these Queries.
Access does have a built in “dependency” feature. The result is a VERY nice tree-view of those dependencies, and you can even launch such objects using that treeview of your application to “navigate” the application so to speak.
The option is found under database tools and is appropriately called Object Dependencies.
The result looks like this:
While you don't want to use auto correct, this feature will force on track changes. If this is a large application, then on first run a significant delay will occur. After that, the results can be viewed instantly. So, most developers still turn off track name autocorrect (often referred to track auto destroy). However, the track auto correct is required for this feature.
And, unfortunately, you have to go query by query, but at least it will display dependences for each query - (forms, or reports). However, VBA code that creates SQL on the fly and uses such queries? Well, it will not catch that case. So, at the end of the day, deleting a query may well still be used in code, and if that code creates SQL on the fly (as at LOT of VBA code does, then you can never really be sure that the query is not not used some place in the application.
So, the dependency checker can easy determine if another query, another form/sub form, or report uses that query. So dependency checker does a rather nice job.
However, VBA code is a different matter, and how VBA code runs and does things cannot be determined until such time code is actually run. In effect, a dependency checker would have to actually run the VBA code, and even then, sometimes code will make several choices as to which query to run, or use - and that is determined by code. I suppose that you could do a quick "search", since a search is global for VBA (all code in modules, reports and forms can be searched). This would find most uses of the query, but not in all cases since as noted VBA code often can and does create sql on the fly.
I have a vague recollection part of Access Analyzer from FMS Inc has this functionality built in.
Failing that, I can see 2 options that may work.
Firstly, you could use the inbuilt Database Documenter. This creates a report that you can export to Excel. You would then need to import this into the database, and write some code that loops the queries to see if they appear in this table;
Alternatively, you could use the undocumented "SaveAsText" feature to loop all Forms/Reports/Macros/Modules in your database, as well as looping the Querydefs and saving their SQL into a text file. You would then write some VBA to loop the queries, open each of the text files and check for the existence of the query.
Either way, rather than just deleting any unused queries, rename then to something like "old_Query", and leave them for a month or so in the database just in case!!
Regards,

Dynamically changing Report's Shared Data Source at Runtime

I'm looking to use SSRS for multi-tenant reporting and I'd like the ability to have runtime-chosen Shared Data Sources for my reports. What do I mean by this? Well, I could be flexible but I think the two most likely possibilities are (however, I'm also open to other possibilities):
The Shared Data Source is dictated by the client's authentication. In my case, the "client" is a .NET application and not the user, so if this is a viable path then I'd like to somehow have the MainDB (that's what I'm calling it) Shared Data Source selected by the Service Account that the client logs in as.
Pass the name of the Shared Data Source as a parameter and let that dictate which one to use. Given that all of my clients are "trusted players", I am comfortable with this approach. While each client will have its own representative Service Account, it's just for good measure and should not be important. So instead of just calling the data source MainDB, we could instead have Client1DB and Client2DB, etc. It's okay if a new data source means a new deployment but I need this to scale easily enough as well to ~50 different data sources over time.
Why? Because we have multiple/duplicate copies of our production application for multiple customers but we don't want to duplicate everything, just the web apps and databases. We're fine with some common "back-end" things. And for SSRS, because of how expensive licenses are (and how rarely reports are ran by our users), we really want to have just a single back-end for all of our customers (I actually have a second one on standby for manual disaster recovery situations - we don't need to be too fancy here as reports are the least important DR concern we have).
I have seen this question which points to this post but I was really hoping there was a better way than this. Because of all of those additional steps/efforts/limitations/etc, I'd rather just use PowerShell to script duplicate deployments of the reports with tweaked hardcoded data sources instead of standardizing on the steps in that post. That solution feels WAY too hacky to me and doesn't seem to scale very well at all.
I've done this a bunch of terrible ways (usually hardcoded in a dynamic script), and then I discovered its actually quite simple.
Instead of using Shared Connection, use the Embedded Connection and create your Connection string based on params (or any string manipulation code)....

What can I do to trace what a program does, not having the source code and the support from the program supplier

I have now to deal with a program called FDT whose support is no longer taken by the company I am working for but still using the same program. Now I need to insert new orders into the program from the site which I can get in xml, csv or some other from magento. I am trying to automate this process. All work in the office are done on the basis of this software FDT like checking the out of stock, bills printing and others.
I am now thinking to use profiler to trace events. I would like to know what processing does the program do when we place some order in it. I am not a good user of Profiler, I would like some suggestions if it is possible know what tables it effects, what columns it updates or writes to.
Above it is a new order no. the program generates. which is a unique id and is integer. I am not able to know the pattern. I do have a test server where I can make changes and trial and error is no problem.
Some suggestions on how shall I proceed or at least start going on would be appreciated.
I think most important would be to trace the T-sql but again which events and what filter to use?
I am sorry if it a stupid question, I am trying to learn .. source code and support is not an option.
This question has too many parts- how to do trace, how to deal with an application post-support-contract, how to reverse engineer an app and even if that is a good idea (and sometimes it's the only idea available) I'd re-ask this as a series of narrow technical question or ask it on Programmers (after reading their FAQ they only like certain questions)
Yup, been there done that. In large organizations, normally these tasks fall to technies who don't weild the awesome power of the budget and can't personal go negotiate a new contract with the original vendor. I assume you have food bills to pay and can't tell your supervisor, "well, I ain't do doing nothing until we get a support contract"
Step 0 Diagram the tables - work out the entity relationships and assembly a data dictionary (one that explains the motivation of each table and column, not just the name and data type)
Step 1 Attach the profiler to an active instance of SQL 2008. If you have a specific question about SQL Profiler, open a new question. One hint-- if you are attached to a multi-user instance, filter down to just your own user (the one in the connection string)
http://blog.sqlauthority.com/2009/08/03/sql-server-introduction-to-sql-server-2008-profiler-2/
Step 2
Do an action in the application and watch what SQL was emitted. If it is SQL, you can copy and paste it to Management studio so you can diagram the query and run your own test executions. If it is a stored proc, you go read the source code of the stored procedure. If the stored procedure is encrypted, it may or may not be possible to decrypt it. Scenarios when decrypting the code is fairly defensible is when you aren't redistributing it and the supporting company isn't there.
Step 3
Once you understand the app, you can write reports, or more likey, you want to record either new transactions or old transactions differently.
If the app is written in .net or java, you can decompile it and read the code. Creating a custom build from that source isn't going to be fun. A more likely thing to happen is you will create an application that targets the same tables or possibly export all the data out of the original app and into a new bespoke one.

Testing without affected certain tables

At the moment I'm stuck with the need to debug several functions in our system to determine if they're working or not.
The situation is basicly that I'm left with someone elses CakePHP structure which makes me unable to know the code in and out. This is due to lack of time and lack of documentation.
I need to run tests on this system, however it will cause incorrect data on our reports page when I create new orders etc. This is not allowed and basicly there's a lot of models which saves data to the reports by simply creating other rows.
The easiest solution here would be to make no report rows get created if I'm logged in as a certain user. Then I'd simply just do a condition and determine if I should insert the report row in the database or not. (if ($bool_tester) return FALSE; else /* Insert data */)
This would however require to fetch the Session data within the Model, which I've read is a bad solution. I can't simply run an extra parameter in the function, since the function is called on so many places in so many files.
So my question is basicly; Should I include Session data within the Model regardless or is there any other nifty solution that makes me not insert these rows when I'm testing.
Defining a session value through the controllers isn't a smooth solution either here.
Do the testing in your development environment, not on the live site.
Do you use unit testing for the tests? CakePHP does support that. When you are, you could stub or mock the data within your setup for the test. Cake also supports that.