DRY and similar queries - mysql

Working on a particular application, I keep writing very similar queries, again and again. They're not exactly the same, but are of very similar form, and embedded in almost identical chunks of code, e.g.,
$Mysqli = new mysqli;
if ($Stmt = $Mysqli->prepare("SELECT foo
FROM tblFoo
WHERE something = ?")) {
$Stmt->bind_param('s', $this->_something);
$Stmt->execute();
if (0 != $Stmt->errno)
throw new Exception("blah, blah, blah");
$Stmt->bind_result($foo);
while ($Stmt->fetch()){
$this->_foos[] = new Foo($foo);
}
$Stmt->close();
} else {
throw new Exception("blah, blah, blah"););
}
}
and later, somewhere else ...
$Mysqli = new mysqli;
if ($Stmt = $Mysqli->prepare("SELECT bar, baz
FROM tblBar
WHERE somethingElse = ?")) {
$Stmt->bind_param('s', $this->_somethingElse);
$Stmt->execute();
if (0 != $Stmt->errno)
throw new Exception("blah, blah, blah");
$Stmt->bind_result($bar, $baz);
while ($Stmt->fetch()){
// do something else with $bar and $baz
}
$Stmt->close();
} else {
throw new Exception("blah, blah, blah"););
}
}
... and then another, and elsewhere another ... etc.
Is this a real violation of DRY? It doesn't seem to make sense to write a class for performing this kind of query (with constructor params or setters for table, column, bound variables, etc.) and then reusing it throughout my app. But at the same time, I can't shake this nagging feeling that I'm repeating myself.
Maybe it's just that there are only so many ways to write a simple query and that a certain amount of repetition like this is to be expected.
Thoughts?

A lot of people do exactly this. Create a class representing each table, that all inherit from the same class. The base class can handle loading and saving the data. So if you want to load the data, you just call the load method. And you can set and access the field values by the properties of the objects.
There's also libraries like hibernate that handle a lot of the dirty work for you.

I would say it was a violation. From a glance at your code (unless I'm missing something), those two statements are identical except for the strings "foo" and "bar" (repeated a few times) and your actual business logic.
At the minimum you could extract that. More to the point, all the SQL strings should probably be extracted. They've always seemed more like data to me than code. If you had an array of SQL strings, tables, etc--would that make refactoring more obvious?
(Extracting strings and other data from your code is a great way to begin a refactor)
One possibility, if you had an object that had the table name and a method containing your business logic, you could pass it into a "Processor" with all the boilerplate (which is all the rest of the code you have there).
I think that might make it dry.
(PS. Always prefer composition over inheritance. There is no advantage to using inheritance over a class that can "Run" your objects)

You're encountering one of the first desires to modularise. :-)
There are two general solutions. The first is to abstract the call to all (or most) of the similar queries. And then again to another set of queries. And then to another. And so on. Unfortunately, this results in numerous blocks that do the same thing: call a query, check for problems, assemble a result set, pass it back. It's a DAL, but only of a sorts. It also doesn't scale.
The second solution is to abstract the process of making a query and returning the result, and to then look at abstracting the data access on top of that (which basically means you build something that assembles SQL). This is more likely to become a proper ORM rather than a simple DAL. It is much easier to scale this.

Related

Nesting Asynchronous Promises in ActionScript

I have a situation where I need to perform dependent asynchronous operations. For example, check the database for data, if there is data, perform a database write (insert/update), if not continue without doing anything. I have written myself a promise based database API using promise-as3. Any database operation returns a promise that is resolved with the data of a read query, or with the Result object(s) of a write query. I do the following to nest promises and create one point of resolution or rejection for the entire 'initialize' operation.
public function initializeTable():Promise
{
var dfd:Deferred = new Deferred();
select("SELECT * FROM table").then(tableDefaults).then(resolveDeferred(dfd)).otherwise(errorHandler(dfd));
return dfd.promise;
}
public function tableDefaults(data:Array):Promise
{
if(!data || !data.length)
{
//defaultParams is an Object of table default fields/values.
return insert("table", defaultParams);
} else
{
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
}
}
public function resolveDeferred(deferred:Deferred):Function
{
return function resolver(value:*=null):void
{
deferred.resolve(value);
}
}
public function rejectDeferred(deferred:Deferred):Function
{
return function rejector(reason:*=null):void
{
deferred.reject(reason);
}
}
My main questions:
Are there any performance issues that will arise from this? Memory leaks etc.? I've read that function variables perform poorly, but I don't see another way to nest operations so logically.
Would it be better to have say a global resolved instance that is created and resolved only once, but returned whenever we need an 'empty' promise?
EDIT:
I'm removing question 3 (Is there a better way to do this??), as it seems to be leading to opinions on the nature of promises in asynchronous programming. I meant better in the scope of promises, not asynchronicity in general. Assume you have to use this promise based API for the sake of the question.
I usually don't write those kind of opinion based answers, but here it's pretty important. Promises in AS3 = THE ROOTS OF ALL EVIL :) And I'll explain you why..
First, as BotMaster said - it's weakly typed. What this means is that you don't use AS3 properly. And the only reason this is possible is because of backwards compatibility. The true here is, that Adobe have spent thousands of times so that they can turn AS3 into strongly type OOP language. Don't stray away from that.
The second point is that Promises, at first place, are created so that poor developers can actually start doing some job in JavaScript. This is not a new design pattern or something. Actually, it has no real benefits if you know how to structure your code properly. The thing that Promises help the most, is avoiding the so called Wall of Hell. But there are other ways to fix this in a natural manner (the very very basic thing is not to write functions within functions, but on the same level, and simply check the passed result).
The most important here is the nature of Promises. Very few people know what they actually do behind the scenes. Because of the nature of JavaScript (and ECMA script at all), there is no real way to tell if a function completed properly or not. If you return false / null / undefined - they are all regular return values. The only way they could actually say "this operation failed" is by throwing an error. So every promisified method, can potentially throw an error. And each error must be handled, or otherwise your code can stop working properly. What this means, is that every single action inside Promise is within try-catch block! Every time you do absolutely basic stuff, you wrap it in try-catch. Even this block of yours:
else
{
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
}
In a "regular" way, you would simply use else { return null }. But now, you create tons of objects, resolvers, rejectors, and finally - you try-catch this block.
I cannot stress more on this, but I think you are getting the point. Try-catch is extremely slow! I understand that this is not a big problem in such a simple case like the one I just mentioned, but imagine you are doing it more and on more heavy methods. You are just doing extremely slow operations, for what? Because you can write lame code and just enjoy it..
The last thing to say - there are plenty of ways to use asynchronous operations and make them work one after another. Just by googling as3 function queue I found a few. Not to say that the event-based system is so flexible, and there are even alternatives to it (using callbacks). You've got it all in your hands, and you turn to something that is created because lacking proper ways to do it otherwise.
So my sincere advise as a person worked with Flash for a decade, doing casino games in big teams, would be - don't ever try using promises in AS3. Good luck!
var dfd:Deferred = new Deferred();
select("SELECT * FROM table").then(tableDefaults).then(resolveDeferred(dfd)).otherwise(errorHandler(dfd));
return dfd.promise;
This is the The Forgotten Promise antipattern. It can instead be written as:
return select("SELECT * FROM table").then(tableDefaults);
This removes the need for the resolveDeferred and rejectDeferred functions.
var resolved:Deferred = new Deferred();
resolved.resolve(null);
return resolved.promise;
I would either extract this to another function, or use Promise.when(null). A global instance wouldn't work because it would mean than the result handlers from one call can be called for a different one.

LINQ-SQL reuse - CompiledQuery.Compile

I have been playing about with LINQ-SQL, trying to get re-usable chunks of expressions that I can hot plug into other queries. So, I started with something like this:
Func<TaskFile, double> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below (LINQPad example):
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This produces the expected output, except, a query per row is generated for the plugged expression. This is visible within LINQPad. Not good.
Anyway, I noticed the CompiledQuery.Compile method. Although this takes a DataContext as a parameter, I thought I would include ignore it, and try the same Func. So I ended up with the following:
static Func<UserQuery, TaskFile, double> TimeSpent =
CompiledQuery.Compile<UserQuery, TaskFile, double>(
(UserQuery db, TaskFile t) =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Notice here, that I am not using the db parameter. However, now when we use this updated parameter, only 1 SQL query is generated. The Expression is successfully translated to SQL and included within the original query.
So my ultimate question is, what makes CompiledQuery.Compile so special? It seems that the DataContext parameter isn't needed at all, and at this point i am thinking it is more a convenience parameter to generate full queries.
Would it be considered a good idea to use the CompiledQuery.Compile method like this? It seems like a big hack, but it seems like the only viable route for LINQ re-use.
UPDATE
Using the first Func within a Where statment, we see the following exception as below:
NotSupportedException: Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.
Like the following:
.Where(t => TimeSpent(t) > 2)
However, when we use the Func generated by CompiledQuery.Compile, the query is successfully executed and the correct SQL is generated.
I know this is not the ideal way to re-use Where statements, but it shows a little how the Expression Tree is generated.
Exec Summary:
Expression.Compile generates a CLR method, wheras CompiledQuery.Compile generates a delegate that is a placeholder for SQL.
One of the reasons you did not get a correct answer until now is that some things in your sample code are incorrect. And without the database or a generic sample someone else can play with chances are further reduced (I know it's difficult to provide that, but it's usually worth it).
On to the facts:
Expression<Func<TaskFile, double>> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below:
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
(Note: Maybe you used a Func<> type for TimeSpent. This yields the same situation as of you're scenario was as outlined in the paragraph below. Make sure to read and understand it though).
No, this won't compile. Expressions can't be invoked (TimeSpent is an expression). They need to be compiled into a delegate first. What happens under the hood when you invoke Expression.Compile() is that the Expression Tree is compiled down to IL which is injected into a DynamicMethod, for which you get a delegate then.
The following would work:
var q = TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent.Compile().DynamicInvoke()
});
This produces the expected output, except, a query per row is
generated for the plugged expression. This is visible within LINQPad.
Not good.
Why does that happen? Well, Linq To Sql will need to fetch all TaskFiles, dehydrate TaskFile instances and then run your selector against it in memory. You get a query per TaskFile likely because they contains one or multiple 1:m mappings.
While LTS allows projecting in memory for selects, it does not do so for Wheres (citation needed, this is to the best of my knowledge). When you think about it, this makes perfect sense: It is likely you will transfer a lot more data by filtering the whole database in memory, then by transforming a subset of it in memory. (Though it creates query performance issues as you see, something to be aware of when using an ORM).
CompiledQuery.Compile() does something different. It compiles the query to SQL and the delegate it returns is only a placeholder Linq to SQL will use internally. You can't "invoke" this method in the CLR, it can only be used as a node in another expression tree.
So why does LTS generate an efficient query with the CompiledQuery.Compile'd expression then? Because it knows what this expression node does, because it knows the SQL behind it. In the Expression.Compile case, it's just a InvokeExpression that invokes the DynamicMethod as I explained previously.
Why does it require a DataContext Parameter? Yes, it's more convenient for creating full queries, but it's also because the Expression Tree compiler needs to know the Mapping to use for generating the SQL. Without this parameter, it would be a pain to find this mapping, so it's a very sensible requirement.
I'm surprised why you've got no answers on this so far. CompiledQuery.Compile compiles and caches the query. That is why you see only one query being generated.
Not only this is NOT a hack, this is the recommended way!
Check out these MSDN articles for detailed info and example:
Compiled Queries (LINQ to Entities)
How to: Store and Reuse Queries (LINQ to SQL)
Update: (exceeded the limit for comments)
I did some digging in reflector & I do see DataContext being used. In your example, you're simply not using it.
Having said that, the main difference between the two is that the former creates a delegate (for the expression tree) and the latter creates the SQL that gets cached and actually returns a function (sort of). The first two expressions produce the query when you call Invoke on them, this is why you see multiple of them.
If your query doesn't change, but only the DataContext and Parameters, and if you plan to use it repeatedly, CompiledQuery.Compile will help. It is expensive to Compile, so for one off queries, there is no benefit.
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This isn't a LinqToSql query, as there is no DataContext instance. Most likely you are querying some EntitySet, which does not implement IQueryable.
Please post complete statements, not statement fragments. (I see invalid comma, no semicolon, no assignment).
Also, Try this:
var query = myDataContext.TaskFiles
.Where(tf => tf.Parent.Key == myParent.Key)
.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t)
});
// where myParent is the source of the EntitySet and Parent is a relational property.
// and Key is the primary key property of Parent.

Group functions of similar functionality

Sometimes I come across this problem where you have a set of functions that obviously belong to the same group. Those functions are needed at several places, and often together.
To give a specific example: consider the filemtime, fileatime and filectime functions. They all provide a similar functionality. If you are building something like a filemanager, you'll probably need to call them one after another to get the info you need. This is the moment that you get thinking about a wrapper. PHP already provides stat, but suppose we don't have that function.
I looked at the php sourcecode to find out how they solved this particular problem, but I can't really find out what's going on.
Obviously, if you have a naive implementation of such a grouping function, say filetimes, would like this:
function filetimes($file) {
return array(
'filectime' => filectime($file)
,'fileatime' => fileatime($file)
,'filemtime' => filemtime($file)
);
}
This would work, but incurs overhead since you would have to open a file pointer for each function call. (I don't know if it's necessary to open a file pointer, but let's assume that for the sake of the example).
Another approach would be to duplicate the code of the fileXtime functions and let them share a file pointer, but this obviously introduces code duplication, which is probably worse than the overhead introduced in the first example.
The third, and probably best, solution I came up with is to add an optional second parameter to the fileXtime functions to supply a filepointer.
The filetimes functions would then look like this:
function filetimes($file) {
$fp = fopen($file, 'r');
return array(
'filectime' => filectime($file, $fp)
,'fileatime' => fileatime($file, $fp)
,'filemtime' => filemtime($file, $fp)
);
}
Somehow this still feels 'wrong'. There's this extra parameter that is only used in some very specific conditions.
So basically the question is: what is best practice in situations like these?
Edit:
I'm aware that this is a typical situation where OOP comes into play. But first off: not everything needs to be a class. I always use an object oriented approach, but I also always have some functions in the global space.
Let's say we're talking about a legacy system here (with these 'non-oop' parts) and there are lots of dependencies on the fileXtime functions.
tdammer's answer is good for the specific example I gave, but does it extend to the broader problem set? Can a solution be defined such that it is applicable to most other problems in this domain?
Use classes, Luke.
I'd rewrite the fileXtime functions to accept either a filename or a file handle as their only parameter. Languages that can overload functions (like C++, C# etc) can use this feature; in PHP, you'd have to check for the type of the argument at run time.
Passing both a filename and a file handle would be redundant, and ambiguous calls could be made:
$fp = fopen('foo', 'r');
$times = file_times('bar', $fp);
Of course, if you want to go OOP, you'd just wrap them all in a FileInfo class, and store a (lazy-loaded?) private file handle there.

How do you return two values from a single method?

When your in a situation where you need to return two things in a single method, what is the best approach?
I understand the philosophy that a method should do one thing only, but say you have a method that runs a database select and you need to pull two columns. I'm assuming you only want to traverse through the database result set once, but you want to return two columns worth of data.
The options I have come up with:
Use global variables to hold returns. I personally try and avoid globals where I can.
Pass in two empty variables as parameters then assign the variables inside the method, which now is a void. I don't like the idea of methods that have a side effects.
Return a collection that contains two variables. This can lead to confusing code.
Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
This is not entirely language-agnostic: in Lisp, you can actually return any number of values from a function, including (but not limited to) none, one, two, ...
(defun returns-two-values ()
(values 1 2))
The same thing holds for Scheme and Dylan. In Python, I would actually use a tuple containing 2 values like
def returns_two_values():
return (1, 2)
As others have pointed out, you can return multiple values using the out parameters in C#. In C++, you would use references.
void
returns_two_values(int& v1, int& v2)
{
v1 = 1; v2 = 2;
}
In C, your method would take pointers to locations, where your function should store the result values.
void
returns_two_values(int* v1, int* v2)
{
*v1 = 1; *v2 = 2;
}
For Java, I usually use either a dedicated class, or a pretty generic little helper (currently, there are two in my private "commons" library: Pair<F,S> and Triple<F,S,T>, both nothing more than simple immutable containers for 2 resp. 3 values)
I would create data transfer objects. If it is a group of information (first and last name) I would make a Name class and return that. #4 is the way to go. It seems like more work up front (which it is), but makes it up in clarity later.
If it is a list of records (rows in a database) I would return a Collection of some sort.
I would never use globals unless the app is trivial.
Not my own thoughts (Uncle Bob's):
If there's cohesion between those two variables - I've heard him say, you're missing a class where those two are fields. (He said the same thing about functions with long parameter lists.)
On the other hand, if there is no cohesion, then the function does more than one thing.
I think the most preferred approach is to build a container (may it be a class or a struct - if you don't want to create a separate class for this, struct is the way to go) that will hold all the parameters to be returned.
In the C/C++ world it would actually be quite common to pass two variables by reference (an example, your no. 2).
I think it all depends on the scenario.
Thinking from a C# mentality:
1: I would avoid globals as a solution to this problem, as it is accepted as bad practice.
4: If the two return values are uniquely tied together in some way or form that it could exist as its own object, then you can return a single object that holds the two values. If this object is only being designed and used for this method's return type, then it likely isn't the best solution.
3: A collection is a great option if the returned values are the same type and can be thought of as a collection. However, if the specific example needs 2 items, and each item is it's 'own' thing -> maybe one represents the beginning of something, and the other represents the end, and the returned items are not being used interchangably, then this may not be the best option.
2: I like this option the best, if 4, and 3 do not make sense for your scenario. As stated in 3, if you wanted to get two objects that represent the beginning and end items of something. Then I would use parameters by reference (or out parameters, again, depending on how it's all being used). This way your parameters can explicitly define their purpose: MethodCall(ref object StartObject, ref object EndObject)
Personally I try to use languages that allow functions to return something more than a simple integer value.
First, you should distinguish what you want: an arbitrary-length return or fixed-length return.
If you want your method to return an arbitrary number of arguments, you should stick to collection returns. Because the collections--whatever your language is--are specifically tied to fulfill such a task.
But sometimes you just need to return two values. How does returning two values--when you're sure it's always two values--differ from returning one value? No way it differs, I say! And modern languages, including perl, ruby, C++, python, ocaml etc allow function to return tuples, either built-in or as a third-party syntactic sugar (yes, I'm talking about boost::tuple). It looks like that:
tuple<int, int, double> add_multiply_divide(int a, int b) {
return make_tuple(a+b, a*b, double(a)/double(b));
}
Specifying an "out parameter", in my opinion, is overused due to the limitations of older languages and paradigms learned those days. But there still are many cases when it's usable (if your method needs to modify an object passed as parameter, that object being not the class that contains a method).
The conclusion is that there's no generic answer--each situation has its own solution. But one common thing there is: it's not violation of any paradigm that function returns several items. That's a language limitation later somehow transferred to human mind.
Python (like Lisp) also allows you to return any number of
values from a function, including (but not limited to)
none, one, two
def quadcube (x):
return x**2, x**3
a, b = quadcube(3)
Some languages make doing #3 native and easy. Example: Perl. "return ($a, $b);". Ditto Lisp.
Barring that, check if your language has a collection suited to the task, ala pair/tuple in C++
Barring that, create a pair/tuple class and/or collection and re-use it, especially if your language supports templating.
If your function has return value(s), it's presumably returning it/them for assignment to either a variable or an implied variable (to perform operations on, for instance.) Anything you can usefully express as a variable (or a testable value) should be fair game, and should dictate what you return.
Your example mentions a row or a set of rows from a SQL query. Then you reasonably should be ready to deal with those as objects or arrays, which suggests an appropriate answer to your question.
When your in a situation where you
need to return two things in a single
method, what is the best approach?
It depends on WHY you are returning two things.
Basically, as everyone here seems to agree, #2 and #4 are the two best answers...
I understand the philosophy that a
method should do one thing only, but
say you have a method that runs a
database select and you need to pull
two columns. I'm assuming you only
want to traverse through the database
result set once, but you want to
return two columns worth of data.
If the two pieces of data from the database are related, such as a customer's First Name and Last Name, I would indeed still consider this to be doing "one thing."
On the other hand, suppose you have come up with a strange SELECT statement that returns your company's gross sales total for a given date, and also reads the name of the customer that placed the first sale for today's date. Here you're doing two unrelated things!
If it's really true that performance of this strange SELECT statement is much better than doing two SELECT statements for the two different pieces of data, and both pieces of data really are needed on a frequent basis (so that the entire application would be slower if you didn't do it that way), then using this strange SELECT might be a good idea - but you better be prepared to demonstrate why your way really makes a difference in perceived response time.
The options I have come up with:
1 Use global variables to hold returns. I personally try and avoid
globals where I can.
There are some situations where creating a global is the right thing to do. But "returning two things from a function" is not one of those situations. Doing it for this purpose is just a Bad Idea.
2 Pass in two empty variables as parameters then assign the variables
inside the method, which now is a
void.
Yes, that's usually the best idea. This is exactly why "by reference" (or "output", depending on which language you're using) parameters exist.
I don't like the idea of methods that have a side effects.
Good theory, but you can take it too far. What would be the point of calling SaveCustomer() if that method didn't have a side-effect of saving the customer's data?
By Reference parameters are understood to be parameters that contain returned data.
3 Return a collection that contains two variables. This can lead to confusing code.
True. It wouldn't make sense, for instance, to return an array where element 0 was the first name and element 1 was the last name. This would be a Bad Idea.
4 Build a container class to hold the double return. This is more self-documenting then a collection containing other collections, but it seems like it might be confusing to create a class just for the purpose of a return.
Yes and no. As you say, I wouldn't want to create an object called FirstAndLastNames just to be used by one method. But if there was already an object which had basically this information, then it would make perfect sense to use it here.
If I was returning two of the exact same thing, a collection might be appropriate, but in general I would usually build a specialized class to hold exactly what I needed.
And if if you are returning two things today from those two columns, tomorrow you might want a third. Maintaining a custom object is going to be a lot easier than any of the other options.
Use var/out parameters or pass variables by reference, not by value. In Delphi:
function ReturnTwoValues(out Param1: Integer):Integer;
begin
Param1 := 10;
Result := 20;
end;
If you use var instead of out, you can pre-initialize the parameter.
With databases, you could have an out parameter per column and the result of the function would be a boolean indicating if the record is retrieved correctly or not. (Although I would use a single record class to hold the column values.)
As much as it pains me to do it, I find the most readable way to return multiple values in PHP (which is what I work with, mostly) is using a (multi-dimensional) array, like this:
function doStuff($someThing)
{
// do stuff
$status = 1;
$message = 'it worked, good job';
return array('status' => $status, 'message' => $message);
}
Not pretty, but it works and it's not terribly difficult to figure out what's going on.
I generally use tuples. I mainly work in C# and its very easy to design generic tuple constructs. I assume it would be very similar for most languages which have generics. As an aside, 1 is a terrible idea, and 3 only works when you are getting two returns that are the same type unless you work in a language where everything derives from the same basic type (i.e. object). 2 and 4 are also good choices. 2 doesn't introduce any side effects a priori, its just unwieldy.
Use std::vector, QList, or some managed library container to hold however many X you want to return:
QList<X> getMultipleItems()
{
QList<X> returnValue;
for (int i = 0; i < countOfItems; ++i)
{
returnValue.push_back(<your data here>);
}
return returnValue;
}
For the situation you described, pulling two fields from a single table, the appropriate answer is #4 given that two properties (fields) of the same entity (table) will exhibit strong cohesion.
Your concern that "it might be confusing to create a class just for the purpose of a return" is probably not that realistic. If your application is non-trivial you are likely going to need to re-use that class/object elsewhere anyway.
You should also consider whether the design of your method is primarily returning a single value, and you are getting another value for reference along with it, or if you really have a single returnable thing like first name - last name.
For instance, you might have an inventory module that queries the number of widgets you have in inventory. The return value you want to give is the actual number of widgets.. However, you may also want to record how often someone is querying inventory and return the number of queries so far. In that case it can be tempting to return both values together. However, remember that you have class vars availabe for storing data, so you can store an internal query count, and not return it every time, then use a second method call to retrieve the related value. Only group the two values together if they are truly related. If they are not, use separate methods to retrieve them separately.
Haskell also allows multiple return values using built in tuples:
sumAndDifference :: Int -> Int -> (Int, Int)
sumAndDifference x y = (x + y, x - y)
> let (s, d) = sumAndDifference 3 5 in s * d
-16
Being a pure language, options 1 and 2 are not allowed.
Even using a state monad, the return value contains (at least conceptually) a bag of all relevant state, including any changes the function just made. It's just a fancy convention for passing that state through a sequence of operations.
I will usually opt for approach #4 as I prefer the clarity of knowing what the function produces or calculate is it's return value (rather than byref parameters). Also, it lends to a rather "functional" style in program flow.
The disadvantage of option #4 with generic tuple classes is it isn't much better than returning a collection (the only gain is type safety).
public IList CalculateStuffCollection(int arg1, int arg2)
public Tuple<int, int> CalculateStuffType(int arg1, int arg2)
var resultCollection = CalculateStuffCollection(1,2);
var resultTuple = CalculateStuffTuple(1,2);
resultCollection[0] // Was it index 0 or 1 I wanted?
resultTuple.A // Was it A or B I wanted?
I would like a language that allowed me to return an immutable tuple of named variables (similar to a dictionary, but immutable, typesafe and statically checked). But, sadly, such an option isn't available to me in the world of VB.NET, it may be elsewhere.
I dislike option #2 because it breaks that "functional" style and forces you back into a procedural world (when often I don't want to do that just to call a simple method like TryParse).
I have sometimes used continuation-passing style to work around this, passing a function value as an argument, and returning that function call passing the multiple values.
Objects in place of function values in languages without first-class functions.
My choice is #4. Define a reference parameter in your function. That pointer references to a Value Object.
In PHP:
class TwoValuesVO {
public $expectedOne;
public $expectedTwo;
}
/* parameter $_vo references to a TwoValuesVO instance */
function twoValues( & $_vo ) {
$vo->expectedOne = 1;
$vo->expectedTwo = 2;
}
In Java:
class TwoValuesVO {
public int expectedOne;
public int expectedTwo;
}
class TwoValuesTest {
void twoValues( TwoValuesVO vo ) {
vo.expectedOne = 1;
vo.expectedTwo = 2;
}
}

Is it bad to perform two different tasks in the same loop? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working on a highly-specialized search engine for my database. When the user submits a search request, the engine splits the search terms into an array and loops through. Inside the loop, each search term is examined against several possible scenarios to determine what it could mean. When a search term matches a scenario, a WHERE condition is added to the SQL query. Some terms can have multiple meanings, and in those cases the engine builds a list of suggestions to help the user to narrow the results.
Aside: In case anyone is interested to know, ambigous terms are refined by prefixing them with a keyword. For example, 1954 could be a year or a serial number. The engine will suggest both of these scenarios to the user and modify the search term to either year:1954 or serial:1954.
Building the SQL query and the refine suggestions in the same loop feels somehow wrong to me, but to separate them would add more overhead because I would have to loop through the same array twice and test all the same scenarios twice. What is the better course of action?
I'd probably factor out the two actions into their own functions. Then you'd have
foreach (term in terms) {
doThing1();
doThing2();
}
which is nice and clean.
No. It's not bad. I would think looping twice would be more confusing.
Arguably some of the tasks might be put into functions if the tasks are decoupled enough from each other, however.
I don't think it makes sense to add multiple loops for the sake of theoretical purity, especially given that if you're going to add a loop against multiple scenarios you're going from an O(n) -> O(n*#scenarios). Another way to break this out without falling into the "God Method" trap would be to have a method that runs a single loop and returns an array of matches, and another that runs the search for each element in the match array.
Using the same loop seems as a valid optimization to me, try to keep the code of the two tasks independent so this optimization can be changed if necessary.
Your scenario fits the builder pattern and if each operation is fairly complex then it would serve you well to break things up a bit. This is waaaaaay over engineering if all your logic fits in 50 lines of code, but if you have dependencies to manage and complex logic, then you should be using a proven design pattern to achieve separation of concerns. It might look like this:
var relatedTermsBuilder = new RelatedTermsBuilder();
var whereClauseBuilder = new WhereClauseBuilder();
var compositeBuilder = new CompositeBuilder()
.Add(relatedTermsBuilder)
.Add(whereClauseBuilder);
var parser = new SearchTermParser(compositeBuilder);
parser.Execute("the search phrase");
string[] related = relatedTermsBuilder.Result;
string whereClause = whereClauseBuilder.Result;
The supporting objects would look like:
public interface ISearchTermBuilder {
void Build(string term);
}
public class SearchTermParser {
private readonly ISearchTermBuilder builder;
public SearchTermParser(ISearchTermBuilder builder) {
this.builder = builder;
}
public void Execute(string phrase) {
foreach (var term in Parse(phrase)) {
builder.Build(term);
}
}
private static IEnumerable<string> Parse(string phrase) {
throw new NotImplementedException();
}
}
I'd call it a code smell, but not a very bad one. I would separate out the functionality inside the loop, putting one of the things first, and then after a blank line and/or comment the other one.
I would look to it as if it were an instance of the observer pattern: each time you loop you raise an event, and as many observers as you want can subscribe to it. Of course it would be overkill to do it as the pattern but the similarities tell me that it is just fine to execute two or three or how many actions you want.
I don't think it's wrong to make two actions in one loop. I'd even suggest to make two methods that are called from inside the loop, like:
for (...) {
refineSuggestions(..)
buildQuery();
}
On the other hand, O(n) = O(2n)
So don't worry too much - it isn't such a performance sin.
You could certainly run two loops.
If a lot of this is business logic, you could also create some kind of data structure in the first loop, and then use that to generate the SQL, something like
search_objects = []
loop through term in terms
search_object = {}
search_object.string = term
// suggestion & rules code
search_object.suggestion = suggestion
search_object.rule = { 'contains', 'term' }
search_objects.push(search_object)
loop through search_object in search_objects
//generate SQL based on search_object.rule
This at least saves you from having to do if/then/elses in both loops, and I think it is a bit cleaner to move SQL code creation outside of the first loop.
If the things you're doing in the loop are related, then fine. It probably makes sense to code "the stuff for each iteration" and then wrap it in a loop, since that;s probably how you think of it in your head.
Add a comment and if it gets too long, look at splitting it or using simple utility methods.
I think one could argue that this may not exactly be language-agnostic; it's also highly dependent on what you're trying to accomplish. If you're putting multiple tasks in a loop in such a way that they cannot be easily parallelized by the compiler for a parallel environment, then it is definitely a code smell.