Testing without affected certain tables - mysql

At the moment I'm stuck with the need to debug several functions in our system to determine if they're working or not.
The situation is basicly that I'm left with someone elses CakePHP structure which makes me unable to know the code in and out. This is due to lack of time and lack of documentation.
I need to run tests on this system, however it will cause incorrect data on our reports page when I create new orders etc. This is not allowed and basicly there's a lot of models which saves data to the reports by simply creating other rows.
The easiest solution here would be to make no report rows get created if I'm logged in as a certain user. Then I'd simply just do a condition and determine if I should insert the report row in the database or not. (if ($bool_tester) return FALSE; else /* Insert data */)
This would however require to fetch the Session data within the Model, which I've read is a bad solution. I can't simply run an extra parameter in the function, since the function is called on so many places in so many files.
So my question is basicly; Should I include Session data within the Model regardless or is there any other nifty solution that makes me not insert these rows when I'm testing.
Defining a session value through the controllers isn't a smooth solution either here.

Do the testing in your development environment, not on the live site.

Do you use unit testing for the tests? CakePHP does support that. When you are, you could stub or mock the data within your setup for the test. Cake also supports that.

Related

How to Get Rid of UNUSED Queries in MS ACCESS

I have reviewed the previous Questions and haven't found the answer to the following question,
Is there a Database Tool available in MS Access to run and identify the Queries that are NOT Bring used as a part of my database. We have lots of Queries that are no longer used and I need to clean the database and get rid of these Queries.
Access does have a built in “dependency” feature. The result is a VERY nice tree-view of those dependencies, and you can even launch such objects using that treeview of your application to “navigate” the application so to speak.
The option is found under database tools and is appropriately called Object Dependencies.
The result looks like this:
While you don't want to use auto correct, this feature will force on track changes. If this is a large application, then on first run a significant delay will occur. After that, the results can be viewed instantly. So, most developers still turn off track name autocorrect (often referred to track auto destroy). However, the track auto correct is required for this feature.
And, unfortunately, you have to go query by query, but at least it will display dependences for each query - (forms, or reports). However, VBA code that creates SQL on the fly and uses such queries? Well, it will not catch that case. So, at the end of the day, deleting a query may well still be used in code, and if that code creates SQL on the fly (as at LOT of VBA code does, then you can never really be sure that the query is not not used some place in the application.
So, the dependency checker can easy determine if another query, another form/sub form, or report uses that query. So dependency checker does a rather nice job.
However, VBA code is a different matter, and how VBA code runs and does things cannot be determined until such time code is actually run. In effect, a dependency checker would have to actually run the VBA code, and even then, sometimes code will make several choices as to which query to run, or use - and that is determined by code. I suppose that you could do a quick "search", since a search is global for VBA (all code in modules, reports and forms can be searched). This would find most uses of the query, but not in all cases since as noted VBA code often can and does create sql on the fly.
I have a vague recollection part of Access Analyzer from FMS Inc has this functionality built in.
Failing that, I can see 2 options that may work.
Firstly, you could use the inbuilt Database Documenter. This creates a report that you can export to Excel. You would then need to import this into the database, and write some code that loops the queries to see if they appear in this table;
Alternatively, you could use the undocumented "SaveAsText" feature to loop all Forms/Reports/Macros/Modules in your database, as well as looping the Querydefs and saving their SQL into a text file. You would then write some VBA to loop the queries, open each of the text files and check for the existence of the query.
Either way, rather than just deleting any unused queries, rename then to something like "old_Query", and leave them for a month or so in the database just in case!!
Regards,

Data Migrating document for Couchbase (i.e changing existing field type)?

I am coming from object relation database background, I understand Couchbase is schema-less, but data migration will still happen as the application develop.
In SQL we have management tool to alter table, or I can write migration script with SQL to do migration from version 1 table to version 2 table.
But in document, say we have json Document UserProfile:
UserProfile
{
"Owner": "Rich guy!",
"Car": "Cool car"
}
We might want to add a last visit field there, allow user have multiple car, so the new updated document will become follows:
UserProfile
{
"Owner": "Rich guy!",
"Car": ["Cool car", "Another car"],
"LastVisit": "2015-09-29"
}
But for easier maintenance, I want all other UserProfile documents to follow the same format, having "Car" field as an array.
From my experience in SQL, I could write migration script which support migrating different version of table. Migrate from version 1 table to version 2...N table.
So how can I should I write such migration code? I will have to really just writing an app (executable) using Couchbase SDK to migrate all the documents each time?
What will be the good way for doing migration like this?
Essentially, your problem breaks down into two parts:
Finding all the documents that need to be updated.
Retrieving and updating said documents.
You can do this in one of two ways: using a view that gives you the document ids, or using a DCP stream to get all the documents from the bucket. The view only gives you the ids of the documents, so you basically iterate over all the ids, and then retrieve, update and store each one using regular key-value methods. The DCP protocol, on the other hand, gives you the actual documents.
The advantage of using a view is that it's very simple to implement, works with any language SDK, and it lets you write your own logic around the process to make it more robust and safe. The disadvantage is having to build a view just for this, and also that if the data keeps changing, you must retrieve the ENTIRE view result at once, because if you try to page over the view with offsets, the ordering of results can change, thus giving you an inconsistent snapshot of the data.
The advantage of using DCP to stream all documents is that you're guaranteed to get a consistent snapshot of your data even if it's constantly changing, and also that you get the whole document directly as part of the stream, so you don't need to retrieve it separately - just update and store back to the database. The disadvantage is that it's currently only implemented in the Java SDK and is considered an experimental feature. See this blog for a simple implementation.
The third - and most convenient for an SQL user - way to do this is through the N1QL query language that's introduced in Couchbase 4. It has the same data manipulation commands as you would expect in SQL, so you could basically issue a command along the lines of UPDATE myBucket SET prop = {'field': 'value'} WHERE condition = 'something'. The advantage of this is pretty clear: it both finds and updates the documents all at once, without writing a single line of program code. The disadvantage is that the DML commands are considered "beta" in the 4.0 release of Couchbase, and that if the data set is too large, then it might not actually work due to timing out at some point. And of course, that fact that you need Couchbase 4.0 in the first place.
I don't know of any official tool currently to help with data model migrations, but there are some helpful code snippets depending on the SDK you use (see e.g. bulk updates in java).
For now you will have to write your own script. The basic process is as follow:
Make sure all your documents have a model_version attribute that you increment after each migration.
Before a migration update your application code so it can handle both the old and the new model_version, and so that new documents are written in the new model.
Write a script that iterate through all the old model documents in your bucket(you need a view that emits the document key), make the update you want, increment model_version and save the document back.
In a high concurrency environment it's important to have good error handling and monitoring, you could have for example a view that counts how many documents are in each model_version.
You can use Couchmove, which is a java migration tool working like Flyway DB.
You can execute N1QL queries with this tool to migrate your documents and keep tracking of your changes.
If I understood correctly, the crux here is getting and then 'update every CB docs'. This can be done with a view, provided that you understand that views are only 'eventually consistent' (unlike read/write actions which are strongly consistent).
If (at migration-time) no new documents are added to your bucket, then your view would be up-to-date and should return the entire set of documents to be migrated. easy.
On the other hand, if new documents continue to be written into your bucket, and these documents need to be migrated, then you will have to run your migration code continually to catch all these new docs (since the view wont return them until it is updated, a few seconds later).
In this 2nd scenario, while migration is happening, your bucket will contain a heterogeneous collection of docs: some that have been migrated already, some that are about to be migrated and some that your view has not 'seen' yet (because they were recently added) and would only be migrated once you re-run the migration code.
To make the migration process efficient, you'll need to find a way to differentiate between already-migrated items and yet-to-be-migrated items. You can add a field to each doc with its 'version number' and update it during the migration. Your view should be defined to only select documents with older 'version number' and ignore already-migrated items.
I suggest you read more about couchbase views - here and on their site.
Regarding your migration: There are two aspects here: (1) getting the list of document ids that need to be updated and (2) the actual update.
The actual update is simple: you retrieve the doc and save it again with the new format. There's no explicit schema. Where once you added column in SQL and populated it, you now just add a field in the json-doc (of all the docs). All migrated docs should have this field. Side note: Things get little more complicated if (while you're migrating) the document can be updated by another process. This requires special handling (read aboud CAS if that's the case).
Getting all the relevant doc-keys requires that you define a view and query it. Its beyond the scope of this answer (and is very well documented). Once you have all the keys, you simply iterate them one by one and update them.
With N1QL, Couchbase provides the same schema migration capabilities as you have in RDBMS or object-relational database. For the example in your question, you can place the following query in a migration script:
UPDATE UserProfile
SET Car = TO_ARRAY(Car),
LastVisit = NOW_STR();
This will migrate all the documents in your bucket to your new schema. Note that update statements in Couchbase provide document-level atomicity, not statement-level atomicity. But since this update is idempotent (repeatable), you can run it multiple times if you run into errors. Note: similar to the last paragraph of David's answer above.

How to test SQL queries/reports?

We are developing a Rails app, that has quite a few pages with data reports. A typical reporting page is based on a relatively big SQL query, usually involving 5–8 table joins.
The cornerstone question we've stumbled upon is – writing integration tests reports pages. A common integration test of ours looks like this:
creating a bunch of records in the DB via factory_girl in the test setup,
fire up a capybara scenario, where a user logs in, advances to the page with report, and sees the right data in it.
As the app grows and we get to create more of such reports pages, we've started to run into the following problem - the setup for each individual test ends up being too big, complex and generally hard to read and maintain.
Creating such a test significantly raises the bar for a developer in delivering a feature, related to reporting, as it is very time-consuming and not optimized for happiness. However, we still need to make sure our reports are correct.
Therefore, my questions are:
should or should we not test pages with reports?
if we should test the reports, then what would be the least painful way to do that?
where are we doing wrong?
1. should or should we not test reports page?
You should definitely test your reports page.
2. if we should test the reports, then what would be the least painful way to do that?
Given the size of the reports, you're probably going to have the following problems:
your tests will become super slow;
your tests will become really hard to read and maintain due to the huge setup;
your tests will stop being updated when reports change.
With this, you'll probably stop maintaining properly your specs.
So, first, you should differentiate between:
testing that UI shows proper results (acceptance) vs
testing that reports are generated correctly (unit and integrated).
The tests for the first scenario, UI, which use Capybara, should test UI and not reports themselves. It'd cover that reports data is showing as they were generated by their respective classes, which make us conclude that you don't need to test the millions of report lines, but rather that the table has the correct columns and headers, pagination is working etc. You'd test that the first, second and maybe last report line are showing properly.
On the other hand, the tests for the second scenario, reports generation, should test that reports are generated. That has nothing to do with the UI, as you could be serving those reports as JSON, HTML, Cap'n Proto and any visualization mean. As an imagination exercise, picture testing reports via JSON responses, then all over again via HTML, then all over again via some other method. It'd become evident that report generation is repeated all over.
This means that report generation is the core and should be tested on its own. Which means you should cover it mainly by unit tests. Tons of them if you need. Huge arrays.
With this setup, you'd have blazingly fast unit tests covering your reports and their edge cases, a few integrated tests making sure report generation pieces are connected properly and a few acceptance tests covering your UI (Capybara).
Remember the Test Pyramid?
3. where are we doing wrong?
I don't have all the details about your setup, but it seems the main misconception is thinking that reports are the pages themselves. Remember that you could generate reports as CSV or XML and they'd still be the same report internally. In software, a report will probably end up being an array with values.
So, next time, think about separating concepts. You have reports generation and you have the UI. Test them separately and then add some tests in between to make sure they're both integrated well.
In the future, say you move to a Single Page JS App™ and iOS app, you'd not have to get rid of your report generation tests, but UI tests would go into the clients. That's proof that UI is different from reports generation.
I will post ideas we've had so far.
Do not do an integration test at all
Instead writing an integration test, write a lower-level functional test, by treating DB interaction as an interaction with a 3rd party API.
This is how it would be:
stub the object that sends a query to DB with a mock of DB result we're expecting,
rely any test that needs the data – on that result mock,
execute a SQL query expecting empty dataset, although verifying that:
the SQL raised no syntax error,
the result object returned correct columns,
the column types are as what we are expecting them to be.
This advantages of this approach are:
the tests setup is no longer a cognitive barrier,
the tests are significantly faster than they were back in the factory_girl epoch,
if something changes in the DB result (column names set or column types), we still catch that by performing a "real" DB request.
Pre-load data fixtures
Instead of setting up complex tree of records for each and every report test, pre-populate a whole "field" of DB records prior to the entire test suite.
This is how it would be:
before a suite (or the part of it which contains tests for reports), populate the DB with a log of test data
test each and every report without any further setup needed.
This advantages of this approach are:
the tests setup is no longer a cognitive barrier – the data is just always there for you,
the tests are much faster than they were back in the factory_girl epoch.
This disadvantage of this approach is:
occasionally, as new reports come up, one will have to go and add more data to the "fixtures" setup, which will very likely to break existing tests. Fixing existing tests may lead to strange / not readable pull request changesets for the new features.

Django code or MySQL triggers

I'm making a web service with Django that uses MySQL database. Clients interface with our database through URLs, handled by Django. Right now I'm trying to create a behavior that automatically does some checking/logging whenever a certain table is modified, which naturally means MySQL triggers. However I can also do this in Django, in the request handler that does the table modification. I don't think Django has trigger support yet, so I'm not sure which is better, doing through Django code or MySQL trigger.
Anybody with knowledge on the performance of these options care to shed some light? Thanks in advance!
There are a lot of ways to solve the problem you've described:
Application Logic
View-specific logic -- If the behavior is specific to a single view, then put the changes in the view.
Model-specific logic -- If the behavior is specific to a single model, then override the save() method for the model.
Middleware Logic -- If the behavior relates to multiple models OR needs to wrapped around an existing application, you can use Django's pre-save/post-save signals to add additional behaviors without changing the application itself.
Database Stored Procedures -- Normally a possibility, but Django's ORM doesn't use them. Not portable across databases.
Database Triggers -- Not portable from one database to another (or even one version of a database to the next), but allow you to control shared behavior across multiple (possibly non-Django) applications.
Personally, I prefer using either overriding the save() method, or using a Django signal. Using view-specific logic can catch you out on large applications with multiple views of the same model(s).
What you're describing sounds like "change data capture" to me.
I think the trade-offs might go like this:
Django pros: Middle tier code can be shared by multiple apps; portable if database changes
Django cons: Logically not part of the business transaction
MySQL pros: Natural to do it in a database
MySQL cons: Triggers are very database-specific; if you change vendors you have to rewrite
This might be helpful.

How to handle added input data colums without having to maintain multiple versions of SSIS packages?

I’m writing to solicit ideas for a solution to an upcoming problem.
The product that provides data to our ETL process currently has multiple versions. Our clients are all using some version of the product, but not all use the same version and they will not all be upgraded at the same time.
As new versions of the product are rolled out, the most common change is to add new data columns. Columns being dropped or renamed may happen occasionally, but our main focus right now is how to handle new columns being added.
The problem that we want to address is how to handle the data for clients who use an older version of the product. If we don’t account for the new columns in our SSIS packages, then the data in those columns for clients using an older product version will not be processed.
What we want to avoid is having to maintain a separate version of the SSIS packages for each version of the product. Has anyone successfully implemented a solution for this situation?
Well I had to do something simliar where I got differnt files in differnt formats from differnt vendors that all had to go to the same place. What I did was create a For Each Loop Container that runs though the files and the first step of the loop determines which path it goes down. Then I wrote a separate data flow for each path.
You could do this with a table that lists the expected columns per version and then sends it down the path for the version it matches.
ALternatively, If you know the version each customer has, you could havea table storing that and from the customerid, determine which path.
Or you could write a new package for each version (include the version inthe name) to avoiding having 20 differnt paths in one SSIS package. Then have a for each loop in a calling SSIS package then sends the file to the correct version. Or simply set up differnt jobs for each client knowing what ppackage version they are on. Then you could just change the ssis package their job calls when they upgrade to the new version.
It sounds like you are trying to avoid having to maintain meta data for all the different possible versions. #HLGEM's solution is good, but it still requires you to maintain meta data for all possible combinations of versions.
I had a similar situation where we regularly push out separate client versions, newer versions tend to have extra columns, and we can't force users to upgrade to the latest version. For sources of data where the raw data is from a database table, we always take every column regardless of the user's version. For flat files that we import where the schema is different between versions, we've used three separate solutions:
Conditional Splits: The most obvious solution that works well when there are few variations and a simple way to detect the difference based on a few properties of the row. This doesn't scale well for managing complex changes because the expressions become too difficult to write and maintain.
Script Transformations: If you read in each row as a single string, then you can use script tasks to determine if the additional columns need to be written out. This works well when there are many, many, many different combinations of field combinations and the rules for determining which path to use is highly complex.
Table-driven Metadata: For one corner case where I was importing XML files I built a control table with version numbers. I basically loaded the XML into a XML data type in a table and then processed the XML in a stored procedure. The package then iterated over each version number and dynamically generated the SQL it needed from the table to extract the correct nodes from the XML and then flagged the original row as processed. This was a good solution for my process, but the major challenge with the approach was knowing when to add new rows to the control table. I basically had to give the development group a check box on their SDLC forms that required them to get me to sign-off that I received the new schema changes for major version changes.
I'm not sure if any of these help you, but I hope you can extract something useful from it. Good luck.