what is the time complexity of iterator increments and decrements for stl::map [duplicate] - stl

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
What's the complexity of iterator++ operation for stl RB-Tree(set or map)?
I always thought they would use indices thus the answer should be O(1), but recently I read the vc10 implementation and shockly found that they did not.
To find the next element in an ordered RB-Tree, it would take time to search the smallest element in the right subtree, or if the node is a left child and has no right child, the smallest element in the right sibling. This introduce a recursive process and I believe the ++ operator takes O(lgn) time.
Am I right? And is this the case for all stl implementations or just visual C++?
Is it really difficult to maintain indices for an RB-Tree? As long as I see, by holding two extra pointers in the node structure we can maintain a doubly linked list as long as the RB-Tree. Why don't they do that?

The amortized complexity when incrementing the iterator over the whole container is O(1) per increment, which is all that's required by the standard. You're right that a single increment is only O(log n), since the depth of the tree has that complexity class.
It seems likely to me that other RB-tree implementations of map will be similar. As you've said, the worst-case complexity for operator++ could be improved, but the cost isn't trivial.
It quite possible that the total time to iterate the whole container would be improved by the linked list, but it's not certain since bigger node structures tend to result in more cache misses.

Related

Is it good to keep multi-valued attributes in a table existing in a 24/7 running service? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am working on a project where I am using a table with a multi-valued attribute having 5-10 values. Is it good to keep multivalued attributes or should I normalize it into normal forms ?
But I think that it unnecessarily increases the no of rows.If we have 10 multi values for an attribute then each row or tuple will be replaced with new 10 rows which might increase the query running time.
Can anyone give suggestions on this?
The first normal form requests that each attribute be atomic.
I would say that the answer to this question hinges on the “atomic”: it is too narrow to define it as “indivisible”, because then no string would be atomic, as it can be split into letters.
I prefer to define it as “a single unit as far as the database is concerned”. So if this array (or whatever it is) is stored and retrieved in its entirety by the application, and its elements are never accessed inside the database, it is atomic in this sense, and there is nothing wrong with the design.
If, however, you plan to use elements of that attribute in WHERE conditions, if you want to modify individual elements with UPDATE statements or (worst of all) if you want the elements to satisfy constraints or refer to other tables, your design is almost certainly wrong. Experience shows that normalization leads to simpler and faster queries in that case.
Don't try to get away with few large table rows. Databases are optimized for dealing with many small table rows.

Is there an algorithm for weighted reservoir sampling? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Is there an algorithm for how to perform reservoir sampling when the points in the data stream have associated weights?
The algorithm by Pavlos Efraimidis and Paul Spirakis solves exactly this problem. The original paper with complete proofs is published with the title "Weighted random sampling with a reservoir" in Information Processing Letters 2006, but you can find a simple summary here.
The algorithm works as follows. First observe that another way to solve the unweighted reservoir sampling is to assign to each element a random id R between 0 and 1 and incrementally (say with a heap) keep track of the top k ids. Now let's look at weighted version, and let's say the i-th element has weight w_i. Then, we modify the algorithm by choosing the id of the i-th element to be R^(1/w_i) where R is again uniformly distributed in (0,1).
Another article talking about this algorithm is this one by the Cloudera folks.
You can try the A-ES algorithm from this paper of S. Efraimidis. It's quite simple to code and very efficient.
Hope this helps,
Benoit

Why do some languages index from 0 and not 1? Efficiency? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
So in my lift in my flat the buttons aren't (being in the UK) labelled: G, 1, 2, 3 etc. Nor in the American fashion of: 1,2,3,4 etc.
They're labelled: 0, 1, 2, 3 i.e. they're index from 0
I though to myself: 'Clearly, if you were to write a goToFloor like function to represent moving between floors, you could do so by the index of the element. Easy!'
And then I realised not all languages start their arrays from 0, some start from 1.
How is this decision made? Is it one of efficiency (I doubt it!)? Ease on new programmers (arguably, anyone who makes the mistake once, won't again)?
I can't see any reason a programming language would deviate from a standard, whether it be 0, 1 or any other number. With that in mind, perhaps it would help to know the first language that had the ability to index and then the first language to break whatever convention was set?
I hope this isn't too 'wishy-washy' a question for SO, I'm very eager to hear the history behind indexing.
When the first programming languages were designed it used to start at 0 because an array maps to memory positions. The array mapped to a memory position and the number was used as an offset to retrieve the adjacent values. According to this, the number should be seen as the distance from the start, not as the order in the array.
From a mathematical point of view it makes sense, because it helps to implement algorithms more naturally.
However 0 is not appealing to humans, because we start counting at 1. It's counter intuitive and this is why some languages decided to "fake" the start arrays at 1. (Note that some of them like VB allows you to choose between 0 and 1 based arrays.)
Interesting information on this topic could be found in this famous Dijkstra article:
Why numbering should start at zero
The first "language" would have been assembler. There an array is simply the memory address of the first element. To access one element in the array, an offset is added. So if the array is at position t0, then t0+0 is the first element, t0+1 is the second element etc. This leads to indexes starting at 0. Later, higher level languages added a better nicer syntax, but the indexes stayed the same way.
Sometimes however, exceptions were made. In Pascal for example, a String is an array of bytes. However the first byte of the array/string stores the length of the string, so that the first letter is stored at index 1. However index 0 still exists and can be used to get said length.

Machine learning of word structure [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am working on a system that can create made up fanatsy words based on a variety of user input, such as syllable templates or a modified Backus Naur Form. One new mode, though, is planned to be machine learning. Here, the user does not explicitly define any rules, but paste some text and the system learns the structure of the given words and creates similar words.
My current naïve approach would be to create a table of letter neighborhood probabilities (including a special end-of-word "letter") and filling it by scanning the input by letter pairs (using whitespace and punctuation as word boundaries). Creating a word would mean to look up the probabilities for every letter to follow the current letter and randomly choose one according to the probabilities, append, and reiterate until end-of-word is encountered.
But I am looking for more sophisticated approaches that (probably?) provide better results. I do not know much about machine learning, so pointers to topics, techniques or algorithms are appreciated.
I think that for independent words (an especially names), a simple Markov chain system (which you seem to describe when talking about using letter pairs) can perform really well. Feed it a lexicon and throw it a seed to generate a new name based on what it learned. You may want to tweak the prefix length of the Markov chain to get nicely sounding results (as pointed out in a comment to your question, 2 letters are much better than one).
I once tried it with elvish and orcish names dictionaries and got very satisfying results.

How do you call those little annoying cases you have to check all the time? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
How do you call those little annoying cases that has to be checked , like "it's the first time someone entered a record / delete the last record in a linked list (in c implementation) / ... " ?
The only term I know translates not-very-nicely to "end-cases" . Does it have a better name?
Edge cases.
Corner cases
Ever prof I have ever had has referred to them as boundary cases or special cases.
I use the term special cases
I call it work ;-).
Because they pay me for it.
But edge cases (as mentioned before) is probably a more correct name.
I call them "nigglies". But, to be honest, I don't care about the linked list one any more.
Because memory is cheap, I always implement lists so that an empty list contains two special nodes, first and last.
When searching, I iterate from first->next to last->prev inclusively (so I'm not looking at the sentinel first/last nodes).
When I insert, I use this same limit to find the insertion point - that guarantees that I'm never inserting before first or after last, so I only ever have to use the "insert-in-the-middle" case.
When I delete, it's similar. Because you can't delete the first or last node, the deletion code only has to use the "insert-in-the-middle" case as well.
Granted, that's just me being lazy. In addition, I don't do a lot of C work any more anyway, and I have a huge code library to draw on, so my days of implementing new linked lists are long gone.