Proper and improper use of exception handling? [duplicate] - exception

This question already has answers here:
Why not use exceptions as regular flow of control?
(24 answers)
Closed 9 years ago.
Hello I am trying to get a clearer understanding of when to use exceptions and when to not use them. I will give a few case scenarios. Can you let me know which cases I should use exception, and explain why I should or should not? (note: this is not a homework problem).
Scenario 1: I design a computer game where each unit can move to a square on a board. However, some square can be blocked. Should I throw a SquareIsBlockedException to prevent the movement of the unit?
Scenario 2: I insert a record to the database, however it fails because there the unique ID is already present. It throws a DuplicateIDException.
Why should I use exceptions for scenario 2, but not for scenario 1?

1) No. A square being blocked is not an exceptional thing - one can assume it is quite common in your game. Exceptions should be fired when something happens in your program that shouldn't happen.
2) Possibly. Inserting a duplicate record into the database is something that should not happen normally. It might also hint at a bug.
If you fire the exception, you stop the execution flow. This is good - after your system finds out it's inserting a duplicate row, what should it do? It's very likely that you haven't prepared your system to behave correctly in such a scenario. Plus, you can (in your debugger, in your logs, etc.) see what went wrong, which makes fixing your code easier.

It all depends on the use case and in your two examples it could be meaningfull to use exceptions or not.
Exceptions should be used for ... Exceptions. There are some "bad code" examples in the web where Exceptions are used for flow control. Do not do this.
So Ex1 sounds for me for a flow control, use if's or language flow control.
In Ex2 explicit control of duplicates can lead to more code when it is necessary first to hit the db to look up if the ID is already in. Is the ID an automatic id, use Exceptions because the normal case is that autom. ID are all diferent. Is the id a user editable data field (e.g. name), lookup first if the Id already exists.

Related

Ignore a feature on pre condition

My feature has 4-5 scenarios, all are dependent on fact that certain element is present on page or not. For example username, password field.
How can I skip/ignore entire feature based on this condition.Some like below that I have written for skipping a scenario.
public void skipScenario(String message, Scenario scenario){
if (username==null){
Assume.assumeTrue(false);
}
}
Just like Scenario interface , we have Feature class with several implementations, I don't know which one can has the function to skip it.
From the Cucumber docs: "Each scenario should test one thing and fail for one particular reason. This means there should be no reason to skip steps.
If there does seem to be a reason you’d want to skip steps conditionally, you probably have an anti-pattern. For instance, you might be trying to test multiple things in one scenario or you might not be in control of the state of your test environment or test data.
The best thing to do here is to fix the root cause."

Is there a programming term for selecting or inserting similar to "upsert"?

Is there a programming term for a widget/service that has the ability to either select an existing item or create/insert a new item? Maybe like how "upsert" is a combination of updating existing or inserting new?
Edit
Sorry, maybe my use of "upsert" as an example term is causing confusion. I'm looking for a term that describes (selecting or creating), not (creating or updating). For example, a widget that can query existing contacts to assign to a company, but also allows ad-hoc creation of contacts as well. I'm hoping to use the term in the name of such widgets to succinctly describe their function.
In web services based on representational state transfer (REST), the PUT method has semantics which can mean create or modify. Based on this, I would propose put as a possibility.
This also happens to match Java usage of the word: consider the put method on Hashtable. It maps a key to a value and returns the old value corresponding to the key, if it existed. This implies it can either exist or not already, and the action will succeed. Mozilla uses the same semantics for put on IDBObjectStore accessed in JavaScript.
It seems like where put is used for updating a collection, it typically has the meaning of create or update.
Based on the clarification - something like provide, specify, define or identify might be acceptable options for communicating the contents of a collection created by giving existing instances and/or creating new ones on the spot. That is, I can provide, specify, define or identify a real number by making reference to some well-known ones (pi, sqrt(2), etc.) or by creating a new one (e.g., by listing some digits or describing how it is computed).

Table name changing to avoid SQL injection attack

I understand the basic process of SQL injection attack. My question is related to SQL injection prevention. I was told that one way to prevent such an attack is by frequently changing the table name! Is that possible?
If so, can someone provide me a link to read about it more because I couldn't find an explanation about it on the web.
No. That makes no sense. You'd either have to change every line of code that references the table or you'd have to leave in place something like a view with the old table name that acts exactly like the old table. No reasonable person would do that. Plus, it's not like there are a ton of reasonable names for tables so you'd be doing crazy things like saying table A stores customer data and AA stores employer data and AAA was the intersection between customers and employers.
SQL injection is almost comically simple to prevent. Use prepared statements with bind variables. Don't dynamically build SQL statements. Done. Of course, in reality, making sure that the new developer doesn't violate this dictum either because they don't know any better or because they can hack something out in a bit less time if they just do a bit of string concatenation makes it a bit more complex. But the basic approach is very simple.
Pffft. What? Frequently changing a table name?
That's bogus advice, as far as "preventing SQL Injection".
The only prevention for SQL Injection vulnerabilities is to write code that isn't vulnerable. And in the vast majority of cases, that is very easy to do.
Changing table names doesn't do anything to close a SQL Injection vulnerability. It might make a successful attack vector less repeatable, requiring an attacker to make some adjustments. But it does nothing prevent SQL Injection.
As a starting point for research on SQL Injection, I recommend OWASP (Open Web Application Security Project)
Start here: https://www.owasp.org/index.php/SQL_Injection
If you run across "changing a table name" as a mitigation, let me know. I've never run across that as a prevention or mitigation for SQL Injection vulnerability.
Here's things you can do to prevent SQL injection:
Use an ORM that encapsulates your SQL calls and provides a friendly layer to your database records. Most of these are very good at writing high quality queries and protecting you from injection bugs simply because of how you use them.
Use prepared statements with placeholder values whenever possible. Write queries like this:
INSERT INTO table_name (name, age) VALUES (:name, :age)
Be very careful to properly escape any and all values that are inserted into SQL though any other method. This is always a risky thing to do, so any code you do write like this should have any escaping you do made blindingly obvious so that a quick code review can verify it's working properly. Never hide escaping behind abstractions or methods with cute names like scrub or clean. Those methods might be subtly broken and you'd never notice.
Be absolutely certain any table name parameters, if dynamic, are tested versus a white list of known-good values. For example, if you can create records of more than one type, or put data into more than one table ensure that the parameter supplied is valid.
Trust nothing supplied by the user. Presume every single bit of data is tainted and hostile unless you've taken the trouble to clean it up. This goes doubly for anything that's in your database if you got your database from some other source, like inheriting a historical project. Paranoia is not unfounded, it's expected.
Write your code such that deleting a line does not introduce a security problem. That means never doing this:
$value = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value')");
You're one line away from failure here. If you must do this, write it like so:
$value_escaped = $db->escaped(value);
$db->query("INSERT INTO table (value) VALUES ('$value_escaped')");
That way deleting the line that does the escaping does not immediately cause an injection bug. The default here is to fail safely.
Make every effort to block direct access to your database server by aggressively firewalling it and restricting access to those that actually need access. In practice this means blocking port 3306 and using SSH for any external connections. If you can, eliminate SSH and use a secured VPN to connect to it.
Never generate errors which spew out stack traces that often contain information highly useful to attackers. For example, an error that includes a table name, a script path, or a server identifier is providing way too much information. Have these for development, and ensure these messages are suppressed on production servers.
Randomly changing table names is utterly pointless and will make your code a total nightmare. It will be very hard to keep all your code in sync with whatever random name the table is assuming at any particular moment. It will also make backing up and restoring your data almost impossible without some kind of decoder utility.
Anyone who recommends doing this is proposing a pointless and naïve solution to a an already solved problem.
Suggesting that randomly changing the table names fixes anything demonstrates a profound lack of understanding of the form SQL injection bugs take. Knowing the table name is a nice thing to have, it makes your life easier as an attacker, but many attacks need no knowledge of this. A common attack is to force a login as an administrator by injecting additional clauses in the WHERE condition, the table name is irrelevant.

What is differential execution?

I stumbled upon a Stack Overflow question, How does differential execution work?, which has a VERY long and detailed answer. All of it made sense... but when I was done I still had no idea what the heck differential execution actually is. What is it really?
REVISED. This is my Nth attempt to explain it.
Suppose you have a simple deterministic procedure that executes repeatedly, always following the same sequence of statement executions or procedure calls.
The procedure calls themselves write anything they want sequentially to a FIFO, and they read the same number of bytes from the other end of the FIFO, like this:**
The procedures being called are using the FIFO as memory, because what they read is the same as what they wrote on the prior execution.
So if their arguments happen to be different this time from last time, they can see that, and do anything they want with that information.
To get it started, there has to be an initial execution in which only writing happens, no reading.
Symmetrically, there should be a final execution in which only reading happens, no writing.
So there is a "global" mode register containing two bits, one that enables reading and one that enables writing, like this:
The initial execution is done in mode 01, so only writing is done.
The procedure calls can see the mode, so they know there is no prior history.
If they want to create objects they can, and put the identifying information in the FIFO (no need to store in variables).
The intermediate executions are done in mode 11, so both reading and writing happen, and the procedure calls can detect data changes.
If there are objects to be kept up to date,
their identifying information is read from and written to the FIFO,
so they can be accessed and, if necessary, modified.
The final execution is done in mode 10, so only reading happens.
In that mode, the procedure calls know they are just cleaning up.
If there were any objects being maintained, their identifiers are read from the FIFO, and they can be deleted.
But real procedures do not always follow the same sequence.
They contain IF statements (and other ways of varying what they do).
How can that be handled?
The answer is a special kind of IF statement (and its terminating ENDIF statement).
Here's how it works.
It writes the boolean value of its test expression, and it reads the value that the test expression had last time.
That way, it can tell if the test expression has changed, and take action.
The action it takes is to temporarily alter the mode register.
Specifically, x is the prior value of the test expression, read from the FIFO (if reading is enabled, else 0), and y is the current value of the test expression, written to the FIFO (if writing is enabled).
(Actually, if writing is not enabled, the test expression is not even evaluated, and y is 0.)
Then x,y simply MASKs the mode register r,w.
So if the test expression has changed from True to False, the body is executed in read-only mode. Conversely if it has changed from False to True, the body is executed in write-only mode.
If the result is 00, the code inside the IF..ENDIF statement is skipped.
(You might want to think a bit about whether this covers all cases - it does.)
It may not be obvious, but these IF..ENDIF statements can be arbitrarily nested, and they can be extended to all other kinds of conditional statements like ELSE, SWITCH, WHILE, FOR, and even calling pointer-based functions. It is also the case that the procedure can be divided into sub-procedures to any extent desired, including recursive, as long as the mode is obeyed.
(There is a rule that must be followed, called the erase-mode rule, which is that in mode 10 no computation of any consequence, such as following a pointer or indexing an array, should be done. Conceptually, the reason is that mode 10 exists only for the purpose of getting rid of stuff.)
So it is an interesting control structure that can be exploited to detect changes, typically data changes, and take action on those changes.
Its use in graphical user interfaces is to keep some set of controls or other objects in agreement with program state information. For that use, the three modes are called SHOW(01), UPDATE(11), and ERASE(10).
The procedure is initially executed in SHOW mode, in which controls are created, and information relevant to them populates the FIFO.
Then any number of executions are made in UPDATE mode, where the controls are modified as necessary to stay up to date with program state.
Finally, there is an execution in ERASE mode, in which the controls are removed from the UI, and the FIFO is emptied.
The benefit of doing this is that, once you've written the procedure to create all the controls, as a function of the program's state, you don't have to write anything else to keep it updated or clean up afterward.
Anything you don't have to write means less opportunity to make mistakes.
(There is a straightforward way to handle user input events without having to write event handlers and create names for them. This is explained in one of the videos linked below.)
In terms of memory management, you don't have to make up variable names or data structure to hold the controls. It only uses enough storage at any one time for the currently visible controls, while the potentially visible controls can be unlimited. Also, there is never any concern about garbage collection of previously used controls - the FIFO acts as an automatic garbage collector.
In terms of performance, when it is creating, deleting, or modifying controls, it is spending time that needs to be spent anyway.
When it is simply updating controls, and there is no change, the cycles needed to do the reading, writing, and comparison are microscopic compared to altering controls.
Another performance and correctness consideration, relative to systems that update displays in response to events, is that such a system requires that every event be responded to, and none twice, otherwise the display will be incorrect, even though some event sequences may be self-canceling. Under differential execution, update passes may be performed as often or as seldom as desired, and the display is always correct at the end of a pass.
Here is an extremely abbreviated example where there are 4 buttons, of which buttons 2 and 3 are conditional on a boolean variable.
In the first pass, in Show mode, the boolean is false, so only buttons 1 and 4 appear.
Then the boolean is set to true and pass 2 is performed in Update mode, in which buttons 2 and 3 are instantiated and button 4 is moved, giving the same result as if the boolean had been true on the first pass.
Then the boolean is set false and pass 3 is performed in Update mode, causing buttons 2 and 3 to be removed and button 4 to move back up to where it was before.
Finally pass 4 is done in Erase mode, causing everything to disappear.
(In this example, the changes are undone in the reverse order as they were done, but that is not necessary. Changes can be made and unmade in any order.)
Note that, at all times, the FIFO, consisting of Old and New concatenated together, contains exactly the parameters of the visible buttons plus the boolean value.
The point of this is to show how a single "paint" procedure can also be used, without change, for arbitrary automatic incremental updating and erasing.
I hope it is clear that it works for arbitrary depth of sub-procedure calls, and arbitrary nesting of conditionals, including switch, while and for loops, calling pointer-based functions, etc.
If I have to explain that, then I'm open to potshots for making the explanation too complicated.
Finally, there are couple crude but short videos posted here.
** Technically, they have to read the same number of bytes they wrote last time. So, for example, they might have written a string preceded by a character count, and that's OK.
ADDED: It took me a long time to be sure this would always work.
I finally proved it.
It is based on a Sync property, roughly meaning that at any point in the program the number of bytes written on the prior pass equals the number read on the subsequent pass.
The idea behind the proof is to do it by induction on program length.
The toughest case to prove is the case of a section of program consisting of s1 followed by an IF(test) s2 ENDIF, where s1 and s2 are subsections of the program, each satisfying the Sync property.
To do it in text-only is eye-glazing, but here I've tried to diagram it:
It defines the Sync property, and shows the number of bytes written and read at each point in the code, and shows that they are equal.
The key points are that 1) the value of the test expression (0 or 1) read on the current pass must equal the value written on the prior pass, and 2) the condition of Sync(s2) is satisfied.
This satisfies the Sync property for the combined program.
I read all the stuff I can find and watched the video and will take a shot at a first-principles description.
Overview
This is a DSL-based design pattern for implementing user interfaces and perhaps other state-oriented subsystems in a clean, efficient manner. It focuses on the problem of changing the GUI configuration to match current program state, where that state includes the condition of GUI widgets themselves, e.g. the user selects tabs, radio buttons, and menu items, and widgets appear/disappear in arbitrarily complex ways.
Description
The pattern assumes:
A global collection C of objects that needs periodic updates.
A family of types for those objects, where instances have parameters.
A set of operations on C:
Add A P - Put a new object A into C with parameters P.
Modify A P - Change the parameters of object A in C to P.
Delete A - Remove object A from C.
An update of C consists of a sequence of such operations to transform C to a given target collection, say C'.
Given current collection C and target C', the goal is to find the update with minimum cost. Each operation has unit cost.
The set of possible collections is described in a domain-specific language (DSL) that has the following commands:
Create A H - Instantiate some object A, using optional hints H, and add it to the global state. (Note no parameters here.)
If B Then T Else F - Conditionally execute command sequence T or F based on Boolean function B, which can depend on anything in the running program.
In all the examples,
The global state is a GUI screen or window.
The objects are UI widgets. Types are button, dropdown box, text field, ...
Parameters control widget appearance and behavior.
Each update consists of adding, deleting, and modifying (e.g. relocating) any number of widgets in the GUI.
The Create commands are making widgets: buttons, dropdown boxes, ...
The Boolean functions depend on the underlying program state including the condition of GUI controls themselves. So changing a control can affect the screen.
Missing links
The inventor never explicitly states it, but a key idea is that we run the DSL interpreter over the program that represents all possible target collections (screens) every time we expect any combination of the Boolean function values B has changed. The interpreter handles the dirty work of making the collection (screen) consistent with the new B values by emitting a sequence of Add, Delete, and Modify operations.
There is a final hidden assumption: The DSL interpreter includes some algorithm that can provide the parameters for the Add and Modify operations based on the history of Creates executed so far during its current run. In the GUI context, this is the layout algorithm, and the Create hints are layout hints.
Punch line
The power of the technique lies in the way complexity is encapsulated in the DSL interpreter. A stupid interpreter would start by Deleting all the objects (widgets) in the collection (screen), then Add a new one for each Create command as it sees them while stepping through the DSL program. Modify would never occur.
Differential execution is just a smarter strategy for the interpreter. It amounts to keeping a serialized recording of the interpreter's last execution. This makes sense because the recording captures what's currently on the screen. During the current run, the interpreter consults the recording to make decisions about how to bring about the target collection (widget configuration) with operations having least cost. This boils out to never Deleting an object (widget) only to Add it again later for a cost of 2. DE will always Modify instead, which has a cost of 1. If we happen to run the interpreter in some case where the B values have not changed, the DE algorithm will generate no operations at all: the recorded stream already represents the target.
As the interpreter executes commands, it is also setting up the recording for its next run.
An analogous algorithm
The algorithm has the same flavor as minimum edit distance (MED). However DE is a simpler problem than MED because there are no "repeated characters" in the DE serialized execution strings as there are in MED. This means we can find an optimal solution with a straightforward on-line greedy algorithm rather than dynamic programming. That's what the inventor's algorithm does.
Strengths
My take is that this is a good pattern for implementing systems with many complex forms where you want total control over placement of widgets with your own layout algorithm and/or the "if else" logic of what's visible is deeply nested. If there are K nests of "if elses" N deep in the form logic, then there are K*2^N different layouts to get right. Traditional form design systems (at least the ones I've used) don't support larger K, N values very well at all. You tend to end up with large numbers of similar layouts and ad hoc logic to select them that's ugly and hard to maintain. This DSL pattern seems a way to avoid all that. In systems with enough forms to offset the DSL interpreter's cost, it would even be cheaper during initial implementation. Separation of concerns is also a strength. The DSL programs abstract the content of forms while the interpreter is the layout strategy, acting on hints from the DSL. Getting the DSL and layout hint design right seems like a significant and cool problem in itself.
Questionable...
I'm not sure that avoiding Delete/Add pairs in favor of Modify is worth all the trouble in modern systems. The inventor seems most proud of this optimization, but the more important idea is a concise DSL with conditionals to represent forms, with layout complexity isolated in the DSL interpreter.
Recap
The inventor's has so far has focused on deep details of how the interpreter makes its decisions. This is confusing because it's directed at trees while the forest is of greater interest. This is a description of the forest.

Alternatives to LINQ To SQL on high loaded pages

To begin with, I LOVE LINQ TO SQL. It's so much easier to use than direct querying.
But, there's one great problem: it doesn't work well on high loaded requests. I have some actions in my ASP.NET MVC project, that are called hundreds times every minute.
I used to have LINQ to SQL there, but since the amount of requests is gigantic, LINQ TO SQL almost always returned "Row not found or changed" or "X of X updates failed". And it's understandable. For instance, I have to increase some value by one with every request.
var stat = DB.Stats.First();
stat.Visits++;
// ....
DB.SubmitChanges();
But while ASP.NET was working on those //... instructions, the stats.Visits value stored in the table got changed.
I found a solution, I created a stored procedure
UPDATE Stats SET Visits=Visits+1
It works well.
Unfortunately now I'm getting more and more moments like that. And it sucks to create stored procedures for all cases.
So my question is, how to solve this problem? Are there any alternatives that can work here?
I hear that Stackoverflow works with LINQ to SQL. And it's more loaded than my site.
This isn't exactly a problem with Linq to SQL, per se, it's an expected result with optimistic concurrency, which Linq to SQL uses by default.
Optimistic concurrency means that when you update a record, you check the current version in the database against the copy that was originally retrieved before making any offline updates; if they don't match, report a concurrency violation ("row not found or changed").
There's a more detailed explanation of this here. There's also a fairly sizable guide on handling concurrency errors. Typically the solution involves simply catching ChangeConflictException and picking a resolution, such as:
try
{
// Make changes
db.SubmitChanges();
}
catch (ChangeConflictException)
{
foreach (var conflict in db.ChangeConflicts)
{
conflict.Resolve(RefreshMode.KeepCurrentValues);
}
}
The above version will overwrite whatever is in the database with the current values, regardless of what other changes were made. For other possibilities, see the RefreshMode enumeration.
Your other option is to disable optimistic concurrency entirely for fields that you expect might be updated. You do this by setting the UpdateCheck option to UpdateCheck.Never. This has to be done at the field level; you can't do it at the entity level or globally at the context level.
Maybe I should also mention that you haven't picked a very good design for the specific problem you're trying to solve. Incrementing a "counter" by repeatedly updating a single column of a single row is not a very good/appropriate use of a relational database. What you should be doing is actually maintaining a history table - such as Visits - and if you really need to denormalize the count, implement that with a trigger in the database itself. Trying to implement a site counter at the application level without any data to back it up is just asking for trouble.
Use your application to put actual data in your database, and let the database handle aggregates - that's one of the things databases are good at.
Use a producer/consumer or message queue model for updates that don't absolutely have to happen immediately, particularly status updates. Instead of trying to update the database immediately keep a queue of updates that the asp.net threads can push to and then have a writer process/thread that writes the queue to the database. Since only one thread is writing, there will be much less contention on the relevant tables/roles.
For reads, use caching. For high volume sites even caching data for a few seconds can make a difference.
Firstly, you could call DB.SubmitChanges() right after stats.Visits++, and that would greatly reduce the problem.
However, that still is not going to save you from the concurrency violation (that is, simultaneously modifying a piece of data by two concurrent processes). To fight that, you may use the standard mechanism of transactions. With LINQ-to-SQL, you use transactions by instantiating a TransactionScope class, thusly:
using( TransactionScope t = new TransactionScope() )
{
var stats = DB.Stats.First();
stats.Visits++;
DB.SubmitChanges();
}
Update: as Aaronaught correctly pointed out, TransactionScope is not going to help here, actually. Sorry. But read on.
Be careful, though, not to make the body of a transaction too long, as it will block other concurrent processes, and thus, significantly reduce your overall performance.
And that brings me to the next point: your very design is probably flawed.
The core principle in dealing with highly shared data is to design your application in such way that the operations on that data are quick, simple, and semantically clear, and they must be performed one after another, not simultaneously.
The one operation that you're describing - counting visits - is pretty clear and simple, so it should be no problem, once you add the transaction. I must add, however, that while this will be clear, type-safe and otherwise "good", the solution with stored procedure is actually a much preferred one. This is actually exactly the way database applications were being designed in ye olden days. Think about it: why would you need to fetch the counter all the way from the database to your application (potentially over the network!) if there is no business logic involved in processing it. The database server may increment it just as well, without even sending anything back to the application.
Now, as for other operations, that are hidden behind // ..., it seems (by your description) that they're somewhat heavy and long. I can't tell for sure, because I don't see what's there, but if that's the case, you probably want to separate them into smaller and quicker ones, or otherwise rethink your design. I really can't tell anything else with this little information.