Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
When writing a program, my understanding as a hobbyist programmer is, there are three ways to accomplish most of the things:
create loops
create and use function
create and use object
I am using javascript here to ask my question since I took learning on it about 2-3 weeks before. It is somewhat strange compared to what I was used to in python or MatLab in University but that's not the point. I often think what should be the good choice among three in particular application so I wanted to know your suggestion.
I wanted to create a list of array to subsequent use for plotting. The program is supposed to take coefficients of the equation, incremental step and the boundary for x-values. Below is the code (Sorry if I missed something below when changing to fit SO, but it was working some moments before!):
function array_creator(input_coeff,inc, boundary){
var bound=boundary||[0,1];
var eqn_deg=input_coeff.length-1;
var increment=inc;
var x_init=bound[0];
var y_val=0;
var graph_array=[];
while (x_init<=bound[1]){
for(var i=0;i<input_coeff.length;i++){
y_val=y_val+input_coeff[i]*Math.pow(x_init,eqn_deg);
eqn_deg--;
}
new_arr=[x_init,y_val];
eqn_deg=input_coeff.length-1;
y_val=0;
graph_array.push(new_arr);
x_init=x_init+increment;
}
return graph_array;
}
In above code, I have one nested loop which goes inside the while but I am used to writing codes that goes more than 3-4 deep in nesting and I cannot dig my own program a week after. So my question is, when should I know that it is time to implement separate function rather than having nesting or know the time to create an object. What are the gains and losses of breaking one big looped function into several function in terms of clarity and efficiency? At what point the creating of object becomes essential or is it just when I have to re-use the same code again.
When the only tool you have is hammer, everything looks like nail. When I started learning python after MatLab, I was so impressed by OO approach that I used to create classes in every situation whether needed or not. I think many SO newbies will be glad to find some systematic approach on this programming fundamentals.
Personally, as far as loops go I have a hard cutoff of three. If I ever hit three nested loops (not counting if/else conditions or try/catch conditions) then I know it's time to break it up into separate functions.
As far as tradeoffs, so long as the function you're running is run many times in quick succession (as in the lower tiers of a loop) there shouldn't be really any performance loss. There is always a slight overhead associated with making a function call, but luckily computers are really smart and they have these things call temporal caches where a cache is an area of extremely fast memory (read: SRAM). That will recognize and load your function call into the cache. Since accessing things already in the cache is effectively free (read times of a few ns) you won't really pay any performance loss for those extra function calls.
The usage of classes is very language dependant though. In javascript, everything is already an object, so you really shouldn't worry about wrapping functions in classes, though again there will be a slight overhead. For languages like Java however, you should endeavour to make a large number of small classes. The JVM is extremely optimized for talking between multiple classes and the JIT compiler shouldn't load up any of the extra "goo" involved in the classes unless you really need it.
In general though, performance is not what you should be making most of your decisions based on (performance is very 80/20, and all you usually need is the 80 of not doing anything overtly silly.) You should really try to follow a pattern that makes your code as readable as possible to other developers. It's pretty hard to define a hard and fast rule as there are many camps on the subject. In general though, for a starting programmer my advice would be to look at a LOT of code and try to understand what's happening. Try to rewrite portions of code in a more readable format if you can. There's enough open source code on Github that it should be pretty easy to do.
Also good programming practices have always been opinions, it's just that people remarkably agree sometimes.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
As I know, HTML parsing is difficult to parallelize due to its strong dependences.
Is there any parallel HTML parser existing or in design, so that a single HTML document can be parsed in parallel and a single DOM tree would be produced finally?
It could be either for earlier HTML versions, or the latest HTML5.
The "strong dependencies" in HTML aren't much different from a parsing point of view, than strong dependencies in any other language you might parse. The real issue is that parsing one part of the file, usually depends on the left context. The problem for a parallel parser is how to get left context?
There's general theory about how to build parallel parsers, by breaking the text into chunks, parsing them separately, and stitching the parts together. McKeeman's paper (referenced) claimed .85N speedup for N processors.
I seem to remember a paper that proposed to parse a file from both ends, meeting in the middle. The right-going parser generated left context; the left-going parser generated right context. You can do the bi-directional scanning relatively easily by reversing the grammar, and feed the forward and backward grammars to parser generators. Gluing it together likely requires the kinds of techniques sketched in referenced paper.
Our DMS Software Reengineering Toolkit has a GLR parser that uses pipelining to separate the lexing stages from parsing, and has a full HTML4 parser available. (DMS is built on parallel foundations; it is relatively easy to configure it to parse individual files in parallel, too.) That HTML4 parser is likely extendable to HTML5 using DMS's support for language dialects.
As a general rule, if you are only parsing one program (or HTML) file, this kind of parallelism really doesn't matter much, as it won't affect your overall performance much. Most parsers are pretty fast and their time is largely covered by the effort to process the individual characters. You'd probably get much of the speedup by breaking the file into chunks, and lexing the chunks individually, especially since much of HTML files is wasted whitespace.
If you had to process lots of HTML files, you'd probably be better off with one thread per file being parsed. Then you can use pretty conventional parser technology in each thread.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
What database would you suggest for a startup that might possibly grow very fast?
To be more specific:
We are using JSON to interchange data with mobile clients, so the data should be stored ideally in this format
The data model is relatively simple, like users, categories, history of
actions...
The users interact in "real time" (5 second propagation delay is still OK)
The queries are known beforehand (can cache results or use mapreduce)
The system would have up to 10000 concurrent users (just guessing...)
Transactions are a plus but can live without them I think
Spatially enabled is a plus
The data replication between nodes should be easy to administer
Open source
Hosting services available (we'd like to outsource the sysadmin part)
We have now a functional private prototype with a standard relational PostgreSQL/PostGIS. But scalability apart questions, I have to convert relational data to JSON and vice versa which seems like an overhead in high load.
I did a little research but I lack experience with all the new NoSQL stuff.
So far, I think of these solutions:
Couchbase: master-master replication, native JSON document store, spatial extension, couchapps and although I don't know iriscouch hosting they seem good techs.
The downside I see so far is javascript debugging, disk occupation.
MongoDb: has only one master but safe failover. Uses binary JSON.
Cluster MySQL: the evergreen of web (one master I think)
PostgresSQL&Slony: because I just love Postgres:-)
But there are plenty of others, Cassandra, Membase...
Do you guys have some real life experience? The bad one counts too!
Thanks in advance,
Karel
Unless you are already having problems with scaling, you can't really have a good idea what you actually need for the future. You should be basing your design decisions on what you need now, not when you have your best estimate of customers. Remember, you have to impress your first few customers with how well your product solves their problems before you can worry about impressing your 10,000th
That said, I've found that its almost always neccesary to have basically everything:
A smart/powerful database for the important data and queries that are part of the current application. For this I have no choice ahead of PostgreSQL/PostGIS.
A document database (sometimes called NoSQL) to record forever anything that has passed through your system. It was an invalid or useless request a year ago, but now you have an application that can use that kind of data, and the vendor finally gave you the API spec you need to parse it, I hope you've got it around in a form you can work with it. At my current organization we are using CouchDB for this and it's proven to be a great choice so far.
I have to convert relational data to JSON and vice versa which seems like an overhead in high load.
Not really; the expensive stuff is IO and poorly written queries. The marshalling/unmarshalling is pure CPU, which is about the cheapest thing in the world to grow. Don't worry about it.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
When writing a mathematical proof, one goal is to continue compressing the proof. The proof gets more elegant but not necessarily more readable. Compression translates to better understanding, as you weed out unnecessary characters and verbosity.
I often hear developers say you should make your code foot print as small as possible. This can very quickly yield unreadable code. In mathematics, it isn't such an issue since the exercise is purely academic. However, in production code where time is money, having people try to figure out what some very concise code is doing doesn't seem to make much sense. For a little more verbose code, you get readability and savings.
At what point do you stop compressing software code?
I try to reach a level of verbosity where my program statements read like a sentence any programmer could understand. This does mean heavily refactoring my code such that it's all short pieces of a story, so each action would be described in a separate method (an even further level might be to another class).
Meaning I would not reduce my number of characters just because it can be expressed in fewer. That's what code-golf competitions are for.
My rule is say what you mean. One common way I see people go wrong is "strength reduction." Basically, they replace the concept they are thinking with something that seems to skip steps. Unfortunately, they are leaving concepts out of their code, making it harder to read.
For example, changing
for (int i = 0; i < n; i++)
foo[i] = ...
to
int * p = foo, q = foo+n;
while ( *p++ = ... < q );
is an example of a strength reduction that seems to save steps, but it leaves out the fact that foo is an array, making it harder to read.
Another common one is using bool instead of an enum.
enum {
MouseDown,
MouseUp
};
Having this be
bool IsMouseDown;
leaves out the fact that this is a state machine, making the code harder to maintain.
So my rule of thumb would be, in your implementation, don't dig down to a lower level than the concepts you are trying to express.
You can make code smaller by seeing redundancy and eliminating it, or by being clever. Do the former and not the latter.
Here's a good article by Steve McConnell - Best Practices http://www.stevemcconnell.com/ieeesoftware/bp06.htm
I think short/concise are two results from well written code. There are many aspects to make code good and many results from well written code, realize the two are different. You don't plan for a small foot print, you plan for a function that is concise and does a single thing extremely well - this SHOULD lead to a small foot print (but may not). Here's a short list of what I would focus on when writing code:
single focused functions - a function should do only one thing, a simple delivery, multi featured functions are buggy and not easily reusable
loosely coupled - don't reach out from inside one function to global data and don't rely heavily on other functions
precise naming - use meaningful precise variable names, cryptic names are just that
keep the code simple and not complex - don't over use language specific technical wow's, good for impressing others, difficult to easily understand and maintain - if you do add something 'special' comment it so at least people can appreciate it prior to cursing you out
evenly comment - to many comments will be ignored and outdated to few have no meaning
formatting - take pride in how the code looks, properly indented code helps
work with the mind of a code maintenance person - think what it would be like to maintain the code you're writting
do be afraid or to lazy to refactor - nothing is perfect the first time, clean up your own mess
One way to find a balance is to seek for readability and not concise-ness. Programmers are constantly scanning code visually to see what is being done, and so the code should as much as possible flow nicely.
If the programmer is scanning code and hits a section that is hard to understand, or takes some effort to visually parse and understand, it is a bad thing. Using common well understood constructs is important, stay away from the vague and infrequently used unless necessary.
Humans are not compilers. Compilers can eat the stuff and keep moving on. Obscure code is not mentally consumed by humans as quickly as clearly understood code.
At times it is very hard to produce readable code in a complicated algorithm, but for the most part, human readability is what we should look for, and not cleverness. I don't think length of code is really a measure of clearness either, because sometimes a more verbose method is more readable than a concise method, and sometimes a concise method is more readable than a long one.
Also, comments should only supplement, and should not describe your code, your code should describe itself. If you have to comment a line because it isn't obvious what is done, that is bad. It takes longer for most experienced programmers to read an English explanation than it does to read the code itself. I think the book Code Complete hammers this one home.
As far as object names go, the thinking on this has gone through an evolution with the introduction of new programming languages.
If you take the "curly brace" languages, starting with C, brevity was considered the soul of wit. So, you would have a variable to hold a loan value named "lv", for instance. The idea was that you were typing a lot of code, so keep the keystrokes to a minimum.
Then along came the Microsoft-sanctioned "Hungarian notation", where the first letters of a variable name were meant to indicate its underlying type. One might use "fLV", or some such, to indicate that the loan value was represented by a float variable.
With Java, and then C#, the paradigm has become one of clarity. A good name for a loan value variable would be "loanValue". I believe part of the reason for this is the command-completion feature in most modern editors. Since its not necessary to type an entire name anymore, you might as well use as many characters as is needed to be descriptive.
This is a good trend. Code needs to be intelligible. Comments are often added as an afterthought, if at all. They are also not updated as code is updated, so they become out of date. Descriptive, well-chosen, variable names are the first, best and easiest way to let others know what you were coding about.
I had a computer science professor who said "As engineers, we are constantly creating types of things that never existed before. The names that we give them will stick, so we should be careful to name things meaningfully."
There needs to be a balance between short sweet source code and performance. If it is nice source and runs the fastest, then good, but for the sake of nice source it runs like a dog, then bad.
Strive to refactor until the code itself reads well. You'll discover your own mistakes in the process, the code will be easier to grok for the "next guy", and you won't be burdened by maintaining (and later forgetting to change) in comments what you're already expressed in code.
When that fails... sure, leave me a comment.
And don't tell me "what" in the comment (that's what the code is for), tell me "why".
As opposed to long/rambling? Sure!
But it gets to the point where it's so short and so concise that it's hard to understand, then you've gone too far.
Yes. Always.
DRY: Don't Repeat Yourself. That will give you a code that is both concise and secure. Writing the same code several times is a good way to make it hard to maintain.
Now that does not mean you should make a function of any blocks of code looking remotely alike.
A very common error (horror ?) for instance is factorizing code doing nearly the same thing, and to handle the differences between occurences by adding a flag to function API. This may look inocuous at first, but generates code flow hard to understand and bug prone, and even harder to refactor.
If you follow common refactoring rules (looking about code smells) your code will become more and more concise as a side effect as many code smells are about detecting redundancy.
On the other hand, if you try to make the code as short as possible not following any meaningfull guidelines, at some point you will have to stop because you just won't see any more how to reduce code.
Just imagine if the first step is removing all useless whitespaces... after that step code in most programming languages will become so hard to read you won't have much chance to find any other possible enhancement.
The example above is quite caricatural, but not so far from what you get when trying to optimise for size without following any sensible guideline.
There's no exact line that can be drawn to distinguish between code that is glib and code that is flowery. Use your best judgment. Have others look at your code and see how easily they can understand it. But remember, correctness is the number 1 goal.
The need for small code footprints is a throwback from the days of assembly language and the first slightly high level languages... there small code footprints where a real and pressing need. These days though, its not so much of a necessity.
That said, I hate verbose code. Where I work, we write code that reads as much as possible like a natural language, without any extra grammar or words. And we don't abbreviate anything unless its a very common abbreviation.
Company.get_by_name("ABC")
makeHeaderTable()
is about as terse as we go.
In general, I make things obvious and easy to work with. If concision/shortness serves me in that end, all the better. Often short answers are the clearest, so shortness is a byproduct of obvious.
There are a couple points to my mind that determine when to stop optimizing:
Worth of spending time performing optimizations. If you have people spending weeks and not finding anything, are there better uses of those resources?
What is the order of optimization priority. There are a few different factors that one could care about when it comes to code: Execution time, execution space(both running and just the compiled code), scalability, stability, how many features are implemented, etc. Part of this is the trade off of time and space, but it can also be where does some code go, e.g. can middleware execute ad hoc SQL commands or should those be routed through stored procedures to improve performance?
I think the main point is that there is a moderation that most good solutions will have.
The code optimizations have little to do with the coding style. The fact that the file contains x spaces or new lines less than at the beginning does not make it better or faster, at least at the execution stage - you format the code with white characters that are unsually ignored by the compiler. It even makes the code worse, because it becomes unreadable for the other programmers and yourself.
It is much more important for the code to be short and clean in its logical structure, such as testing conditions, control flow, assumptions, error handling or the overall programming interface. Of course, I would also include here smart and useful comments + the documentation.
There is not necessarily a correlation between concise code and performance. This is a myth. In mature languages like C/C++ the compilers are capable of optimizing the code very effectively. There is cause need in such languages to assume that the more concise code is the better performing code. Newer, less performance-optimized languages like Ruby lack the compiler optimization features of C/C++ compilers, but there is still little reason to believe that concise code is better performing. The reality is that we never know how well code will perform in production until it gets into production and is profiled. Simple, innocuous, functions can be huge performance bottlenecks if called from enough locations within the code. In highly concurrent systems the biggest bottlenecks are generally caused by poor concurrency algorithms or excessive locking. These issues are rarely solved by writing "concise" code.
The bottom line is this: Code that performs poorly can always be refactored once profiling determines it is the bottleneck. Code can only be effectively refactored if it is easy to understand. Code that is written to be "concise" or "clever" is often more difficult to refactor and maintain.
Write your code for human readability then refactor for performance when necessary.
My two cents...
Code should be short, concrete, and concentrated. You can always explain your ideas with many words in the comments.
You can make your code as short or compact as you like as long as you comment it. This way your code can be optimized but still make sence. I tend to stay in the middle somewhere with descriptive variables and methods and sparce comments if it is still unclear.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We all know to keep it simple, right?
I've seen complexity being measured as the number of interactions between systems, and I guess that's a very good place to start. Aside from gut feel though, what other (preferably more objective) methods can be used to determine the level of complexity of a particular design or piece of software?
What are YOUR favorite rules or heuristics?
Here are mine:
1) How hard is it to explain to someone who understands the problem but hasn't thought about the solution? If I explain the problem to someone in the hall (who probably already understands the problem if they're in the hall) and can explain the solution, then it's not too complicated. If it takes over an hour, chances are good the solution's overengineered.
2) How deep in the nested functions do you have to go? If I have an object which requires a property held by an object held by another object, then chances are good that what I'm trying to do is too far removed from the object itself. Those situations become problematic when trying to make objects thread-safe, because there'd be many objects of varying depths from your current position to lock.
3) Are you trying to solve problems that have already been solved before? Not every problem is new (and some would argue that none really are). Is there an existing pattern or group of patterns you can use? If you can't, why not? It's all good to make your own new solutions, and I'm all for it, but sometimes people have already answered the problem. I'm not going to rewrite STL (though I tried, at one point), because the solution already exists and it's a good one.
Complexity can be estimated with the coupling and how cohesive are all your objects. If something is have too much coupling or is not enough cohesive, than the design will start to be more complex.
When I attended the Complex Systems Modeling workshop at the New England Complex Systems Institute (http://necsi.org/), one of the measures that they used was the number of system states.
For example if you have two nodes, which interact, A and B, and each of these can be 0 or 1, your possible states are:
A B
0 0
1 0
0 1
1 1
Thus a system of only 1 interaction between binary components can actually result in 4 different states. The point being that the complexity of the system does not necessarily increase linearly as the number of interactions increases.
good measures can be also number of files, number of places where configuration is stored, order of compilation on some languages.
Examples:
.- properties files, database configuration, xml files to hold related information.
.- tens of thousands of classes with interfaces, and database mappings
.- a extremely long and complicated build file (build.xml, Makefile, others..
If your app is built, you can measure it in terms of time (how long a particular task would take to execute) or computations (how much code is executed each time the task is run).
If you just have designs, then you can look at how many components of your design are needed to run a given task, or to run an average task. For example, if you use MVC as your design pattern, then you have at least 3 components touched for the majority of tasks, but depending on your implementation of the design, you may end up with dozens of components (a cache in addition to the 3 layers, for example).
Finally something LOC can actually help measure? :)
i think complexity is best seen as the number of things that need to interact.
A complex design would have n tiers whereas a simple design would have only two.
Complexity is needed to work around issues that simplicity cannot overcome, so it is not always going to be a problem.
There is a problem in defining complexity in general as complexity usually has a task associated with it.
Something may be complex to understand, but simple to look at (very terse code for example)
The number of interactions getting this web page from the server to your computer is very complex, but the abstraction of the http protocol is very simple.
So having a task in mind (e.g. maintenance) before selecting a measure may make it more useful. (i.e. adding a config file and logging to an app increases its objective complexity [yeah, only a little bit sure], but simplifies maintenance).
There are formal metrics. Read up on Cyclomatic Complexity, for example.
Edit.
Also, look at Function Points. They give you a non-gut-feel quantitative measurement of system complexity.