Co-spanning tree - language-agnostic

Does anyone know that what is co spanning tree. If there are some good answers then it would be really good to have an example also.

From The Structurally Optimal Dual Graph Pyramid and Its Application in Image Partitioning
In other words, it is basic graph theory - You can't expect to understand what it is about without at least trying to study it firsthand.

Related

How to extract relation between entities for stock prediction

I am trying to extract relation between two entities (entity1- relation- entity2) from news articles for stock prediction. I have used NER for entity extraction. It would be great if anyone could help me with relationship extraction.
Relation Extraction is a difficult task in NLP and most of the time there's not a one-size-fits-all solution to that. Depending on the task that you're trying to solve, I would suggest reading some literature about it on Google Scholar and see if there's something similar to what you're trying to do.
Sometimes, authors are kind enough to publish the code of their solution, which are mainly PyTorch/Tensorflow models (hopefully) trained on a specific dataset. One example is this paper.
If you want to stick with Spacy, there are some guides that might help you, but I'm not sure how well it could scale with the task that you need to solve.
Another more basic approach could be to just extract the shortest path between two entities in the semantic graph of a sentence. This might be quite limited, but can be fairly easy to implement.
One final idea that comes to mind is to use encoders and compute the similarity between sentences. If you're doing multi-class classification, this could help solving your problem.
Hope you find something useful among these.

modifiying random forest such that all trees have some (predetermined) features in common

I am currently looking at a regression problem that I am trying to solve using random forest. The data set has approx 200 features.
Two of 200 features are important (known from the business use case), and I would like to make sure they are used in every tree in random forest.
Question: does this makes sense from a theoretical point of view?
And if yes, has anyone done it before?
I would appreciate any thoughts, references, etc.
Thanks.

What's a good explanation of statistical machine translation?

I'm trying to find a good high level explanation of how statistical machine translation works. That is, supposing I have a corpus of non-aligned English, French and German texts, how could I use that to translate any sentence from one language to another ? It's not that I'm looking to build a Google Translate myself, but I'd like to understand how it works in more detail.
I've seen searched Google but come across nothing good, it either quickly needs advanced mathematics knowledge to understand or is way too generalized. Wikipedia's article on SMT seems to be both, so it doesn't really help much. I'm skeptical that this is such a complex area that it's simply not possible to understand without all the mathematics.
Can anyone give, or know of, a general step-by-step explanation of how such a system works, targeted towards programmers (so code examples are fine) but without needing a mathematics degree to understand ? Or a book that's like this would be great too.
Edit: A perfect example of what I'm looking for would be an SMT equivalent to Peter Norvig's great article on spelling correction. That gives a good idea of what it's involved in writing a spell checker, without going into detailed maths on Levenshtein/soundex/smoothing algorithms etc...
Here is a nice video lecture (in 2 parts):
http://videolectures.net/aerfaiss08_koehn_pbfs/
For in-depth details, I highly advise this book:
http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/0521874157
Both are from the guy who created the most widely used MT system in research. It covers all the fundamental stuff, is very well explained and accurate. This probably one of the de-facto standard books that any researcher beginning in this field should read.
The Atlantic Online had a very straightforward nontechnical description of statistical machine translation back in December 1998:
Lost in Translation by Stephen Budiansky
I've read nontechnical stuff on statistical MT before but always wondered "yeah but how does the statistical stuff know which words map to which when word orders vary and supposedly no dictionary and no grammar are used?" Well this article actually does answer that and it's simple and straightforward and I was quite surprised.
A Peter Norvig talk from Google Developer Day 2007, Theorizing from Data: Avoiding the Capital Mistake, contains some accessible high-level explanation of the principles of statstical machine translation (starting from about 21:20).

So was that Data Structures & Algorithms course really useful after all?

I remember when I was in DSA I was like wtf O(n) and wondering where would I use it other than in grad school or if you're not a PhD like Bloch. Somehow uses for it does pop up in business analysis, so I was wondering when have you guys had to call up your Big O skills to see how to write an algorithm, which data structure did you use to fit or whether you had to actually create a new ds (like your own implementation of a splay tree or trie).
Understanding Data Structures has been fundamental to many of the projects I've worked on, and that goes beyond the ten minute song 'n dance one does when asked such a question in an interview situation.
Granted that modern environments with all sorts of collection classes can make light work of storing and accessing large amounts of data, but having an understanding that a particular problem is best solved with a particular data structure can be a great timesaver. And by "timesaver" I mean "the difference between something working and not working".
Honestly, being able to answer that stuff is my biggest criterion for taking interviewees seriously in an interview. Knowing how basic data structures work, basic O(n) analysis, and some light theory is really crucial to being able to write large applications successfully.
It's important in the interview because it's important in the job. I've worked with techs in the past that were self taught, without taking the data structures course or reading a data structures book, and their code is occasionally bad in ways they should have seen coming.
If you don't know that n2 is going to run slowly compared to n log n, you've got more to learn.
As far as the later half of the data structures courses, it isn't generally applicable to most tech jobs, but if you ever do wind up needing it, you'll wish you had paid more attention.
Big-O notation is one of the basic notations used when describing algorithms implemented by a particular library. For example, all documentation on STL that I've seen describes various operations in terms of big-O, so naturally you have to e.g. understand the difference between O(1), O(log n) and O(n) to understand the implications of your choice of STL containers and algorithms. MSDN also does that for .NET classes, and IIRC Java documentation does that for standard Java classes. So, I'd say that knowing the notation is pretty much a requirement for understanding documentation of most popular frameworks out there.
Sure (even though I'm a humble MS in EE -- no PhD, no CS, differently from my colleague Joshua Block), I write a lot of stuff that needs to be highly scalable (or components that may need to be reused in highly scalable apps), so big-O considerations are most always at work in my design (and it's not hard to take them into account). The data structures I use are almost always from Python's simple but rich supply (which I did lend a hand developing;-), rarely is a totally custom one needed (rather than building on top of list, dict, etc); but when it does happen (e.g. the bitvectors in my open source project gmpy), no big deal.
I was able to use B-Trees right when I learned about them in algorithm class (that was about 15 years ago when there were much less open source implementations available). And even later the knowledge about the differences of e. g. container classes came in handy...
Absolutely: even though stacks, queues, etc. are pretty straightforward, it helps to have been introduced to them in a disciplined fashion.
B-Tree's and more advanced sorting are a bit more difficult so learning them early was a big benefit and I have indeed had to implement each of them at various points.
Finally, I created an algorithm for single-connected components a few years back that was significantly better than the one our signal-processing team was using but I couldn't convince them that it was better until I could show that it was O(n) complexity rather than O(nlogn).
...just to name a few examples.
Of course, if you are content to remain a CRUD-system hacker with no real desire to do more than collect a paycheck, then it may not be necessary...
I found my knowledge of data structures very useful when I needed to implement a customizable event-driven system about ten years ago. That's the biggie, but I use that sort of knowledge fairly frequently in lesser ways.
For me, knowing the exact algorithms has been... nice as background knowledge. However, the thing that's been the most useful is the more general background of having to pay attention to how different pieces of an algorithm interact. For instance, there can be places in code where moving one piece of code (ie, outside a loop) can make a huge difference in both time and space.
Its less of the specific knowledge the course taught and, rather, more that it acted like several years of experience. The course took something that might take years to encounter (have drilled into you) all the variations of in pure "real world experience" and condensed it.
The title of your question asks about data structures and algorithms, but the body of your question focuses on complexity analysis, so I'll focus on that too:
There are lots of programming jobs where being able to do complexity analysis is at least occasionally useful. See What career can I hope for if I like algorithms? for some examples of these.
I can think of several instances in my career where either I or a co-worker have discovered a a piece of code where the (usually time, sometimes space) complexity was higher that it should have been. eg: something that was quadratic or cubic when it could have been linear or nlog(n). Such code would work fine when given small inputs, but on larger inputs would quickly become really slow or consume all available memory. Knowing alternative algorithms and data structures, their complexities, and also how to analyze the complexity to build new algorithms is vital in being able to correct these problems (or avoid them in the first place).
Networking is all I've used it: in an implementation of traveling salesman.
Unfortunately I do a lot of "line of business" and "forms over data" apps, so most problems I work on can be solved by hammering together arrays, linked lists, and hash tables. However, I've had the chance to work my data structures magic here and there:
Due to weird complex business rules, I worked on an application which used a custom thread pool implemented as a leftist-heap.
My dev team struggled to write a complex multithreaded app. It was plagued with race conditions, dead locks, and lousy performance due to very fine-grained locking. We re-worked the code to share state between threads, opting to write a very light-weight wrapper to facilitate message passing. Next, we converting our linked lists and hash tables to immutable stacks and immutable style and immutable red-black trees, we had no more problems with thread safety or performance. The resulting code was immaculate and surprisingly readable.
Frequently, a business rules engine requires you to roll your own state machine, which is very naturally modelled as a graph where vertexes and states and edges are transitions between states.
If for no other reasons, I'm glad I took the time to readable about data structures and algorithms simply to be able picture novel problems a little differently, especially combinatorial problems and graph problems. Graph theory is no longer a synonym for "scary".

Territory Map Generation

Is there a trivial, or at least moderately straight-forward way to generate territory maps (e.g. Risk)?
I have looked in the past and the best I could find were vague references to Voronoi diagrams. An example of a Voronoi diagram is this:
.
These hold promise, but I guess i haven't seen any straight-forward ways of rendering these, let alone holding them in some form of data structure to treat each territory as an object.
Another approach that holds promise is flood fill, but again I'm unsure on the best way to start with this approach.
Any advice would be much appreciated.
The best reference I've seen on them is Computational Geometry: Algorithms and Applications, which covers Voronoi diagrams, Delaunay triangulations (similar to Voronoi diagrams and each can be converted into the other), and other similar data structures.
They talk about all the data structures you need but they don't give you the code necessary to implement it (which may be a good exercise). In terms of code, an Amazon search shows the book Computational Geometry in C, which presumably comes with the code (although since you're stuck in C, you'd mind as well get the other one and implement it in whatever language you want). I also don't have any experience with this book, only the first.
Sorry to have only books to recommend! The only decent online resource I've seen on them are the two Wikipedia articles, which doesn't really tell you implementation details. This link may be helpful though.
Why not use a map of primitives (triangles, squares), distribute the starting points for the countries (the "capitals"), and then randomly expanding the countries by adding a random adjacent primitive to the country.
CGAL is a C++ library that has data structures and algorithms used in Computational Geometry.
I'm actually dealing with exactly this kind of stuff for my company's video game. The most useful info I've found are at these two links:
Paul Bourke's page at UWA, with his 1989 paper on Delaunay and a series of implementation links.
A great explanation of the psudocode and a visual of doing Delaunay at codeGuru.com.
In terms of rendering these - most of the implementations I've found will need massaging to get what you'd want, but since using this for a game map would lead to a number of points plus lines between them, it could be a very simple matter to do draw this out to screen.