Encoding paths in a graph as strings? - mysql

Suppose I have a DAG and, rather than using a graph db, the paths are encoded as {id:"node3", path:"node0|node1|node2"} to represent that node3 can reach node0 via node2 then node1. Would it be a good idea to encode the path in a string if reads are not frequent? The paths generally don't contain more than 50 nodes each.
Thanks

I think that you will find that your approach won't work as well as you want. One of the properties that makes graphs interesting are the combinatorial explosions that can occur from their structures. Storing every path for every node is going to get big very fast and I think it would cease to scale and do what you expected.
Consider the following blog posts:
http://thinkaurelius.com/2012/04/21/loopy-lattices/
http://thinkaurelius.com/2013/06/12/loopy-lattices-redux/
The posts explore path counting on 20x20 directed lattice. It finds that a graph "with only 441 vertices and 840 edges, has over 137 billion unique directed paths."

Related

Is it possible to use topic modeling for a single document

Is it rational to use topic modelling for a single document or to be more precise is it mathematically okay to use LDA-gibbs method for a single document.If so what should be value of k and seed.
Also what is be the role of k and seed for single as well as large set of documents.
K and SEED are variable of the function LDA (in r studio).
Also let me know if I am wrong anywhere in this question.
To tell about my project ,I am trying to find out the main topics which can be used to represent the content of a single document.
I have already tried using k=4,7,10.Part of my question also is what value of k should be better.
It really depends on the document. A document could be a 700 page book or a single sentence. Your k is also going to be dependent on the document I think you mean the number of topics? If your document is the entire Wikipedia corpus 1500 topics might be appropriate if your document is a list of comments about movies then 20 topics might be appropriate. Optimizing that number can be done using the elbow method check out 17.
Seed can be pretty random it's just a leaver so your results can be replicated - it runs if you leave it blank. I would say try it and check your coherence, eyeball your topics and if it looks right then sure you can train an LDA on one document. A single document should process pretty fast.
Here is an example in python of using seed parameters. My data set is 1,048,575 rows note the seed is much higher:
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=bow_corpus,
num_topics=20, alpha =.1, id2word=dictionary, iterations = 1000,
random_seed = 569356958)

Machine Learning for gesture recognition with Myo Armband

I'm trying to develop a model to recognize new gestures with the Myo Armband. (It's an armband that possesses 8 electrical sensors and can recognize 5 hand gestures). I'd like to record the sensors' raw data for a new gesture and feed it to a model so it can recognize it.
I'm new to machine/deep learning and I'm using CNTK. I'm wondering what would be the best way to do it.
I'm struggling to understand how to create the trainer. The input data looks like something like that I'm thinking about using 20 sets of these 8 values (they're between -127 and 127). So one label is the output of 20 sets of values.
I don't really know how to do that, I've seen tutorials where images are linked with their label but it's not the same idea. And even after the training is done, how can I avoid the model to recognize this one gesture whatever I do since it's the only one it's been trained for.
An easy way to get you started would be to create 161 columns (8 columns for each of the 20 time steps + the designated label). You would rearrange the columns like
emg1_t01, emg2_t01, emg3_t01, ..., emg8_t20, gesture_id
This will give you the right 2D format to use different algorithms in sklearn as well as a feed forward neural network in CNTK. You would use the first 160 columns to predict the 161th one.
Once you have that working you can model your data to better represent the natural time series order it contains. You would move away from a 2D shape and instead create a 3D array to represent your data.
The first axis shows the number of samples
The second axis shows the number of time steps (20)
The thirst axis shows the number of sensors (8)
With this shape you're all set to use a 1D convolutional model (CNN) in CNTK that traverses the time axis to learn local patterns from one step to the next.
You might also want to look into RNNs which are often used to work with time series data. However, RNNs are sometimes hard to train and a recent paper suggests that CNNs should be the natural starting point to work with sequence data.

Mapping Nonlinear Functions By Using Artificial Neural Network

I am dealing with an hard assignment which I could not move the pen. What is the way to solve the following problem? Any help would be appreciated.
f(x)=1/x and x is between 0.1 and 1
The problem is asking to traing the network by using back propagation algorithm with one hidden layer.
Trainin set will have 200 input/output pattern, test set will have 100 and validation will have 50 patterns.
How can I solve this? Regards.
That sound much more complicated than it actually is. The network does not know anything about what you actually want to represent with the input and output pattern. So do not worry about that. All you need to do is setup such a network (I assume that you know how to do that - otherwise just check around there are couple of libs, but it is even possible in Excel to set it up quickly for testing purposes)
Then just run the test data against the network in a loop. Once the network is kind of stable store it and start testing.
I assume the representation of the patters has been defined already? It's one of the most important point that defines the quality. The closer the x/y pairs are semantically the closer the representation patterns have to be - meaning here the delta between x/y pairs. In particular for the small x value/large y pairs!
Otherwise the network will not "understand" that and you can teach forever - since there is no correct representation of the similarity - in this case the delta x and delta y
For example the value 7 in binary format is not close at all to the value 8. Meaning if the network did not "learn" that because it has never seen the 8 it will not work well.
So the closer the values the more similarities the representation of the values should be for the network! - That's the key.
Tweaking the parameters will then fine tune your model

Bulk export of binary waveform data from oscilloscope to data points (csv preferred)

I'm working with some binary waveform files from various early to mid-90's HP scopes. I am trying to do a bulk conversion (we have over 5000) of the files to CSV's and then upload them into a database. I've tried hexdump, xxd, od, strings, etc. and none of them seem to work. I did hunt down a programmers manual but it's not making a whole lot of sense.
The files have a preamble line as ascii text but then the data points are in binary and for some reason nothing I try can decode them. The preamble gives the data necessary to use the binary values and calculate the correct values. It also states that the data is in WORD format.
:WAV:PRE 2,1,32768,1,+4.000000E-08,-4.9722700001108E-06,0,+2.460630E-04,+2.500000E+00,16384;:WAV:DATA #800065536^W�^W�^W�^
I'm pretty confused.
Have a look at
http://www.naic.edu/~phil/hardware/oscilloscopes/9000A_Programmer_Reference.pdf
specifically page 1-21. After ":WAV:DATA", I think the rest of the chunk above will have 65536 8-bit data bytes (the start of which is represented above by �) . The ^W is probably a delimiter, so you would have to parse that out. Just a thought.
UPDATE: I'm new to oscilloscope data collection and am trying to figure the whole thing out from scratch. So, on further digging, it looks like the data you have provided shows this:
PREamble:
- WORD format (16-bit signed integers split into 2 8-bit bytes)
- If there is a WAV:BYT section, that would specify byte order for each pair
- RAW data
- 32768 data points
- COUNT = 1 (I'm not clear on the meaning of this)
- Next 3 should be X increment, origin, reference
- Next 3 should be Y increment, origin, reference, although the manual that I pointed you at above has many more fields than just these, so you might want to consult your specific scope manual.
DATA:
- On closer examination, I don't think the ^W is a delimiter, I think it is the first byte of the pair (0010111). The � character is apparently a standard "I don't know how to represent this character" web representation. You would need to look at that character as 8 bits also.
- 65536 byte pairs of data
I'm not finding a utility that will do this for you. I think you're going to have to write or acquire some code (Perl, C, Java, Python, VB, etc.) to get this done.

AS3 repeat same code in a vertor which has 2500 objects

This is my problem, i'm making a path finding program 'jump point search algorithm'. And i need to reset every node (object) in the vector 40 by 40 vector so 2500 nodes, so i need to do the following
//* some type of loop*//
{
node.is_been_on = false;
}
But my path finding may happen 5 times every seconds with a few objects. So that a lot of looping.
What is the CPU friendly way to do this, or another solution which means i don't need to do it.
One of my friends saying that i should make a 40 by 40 boolean array and having the is_been_on variable it, so i would refer to that and not the node, would that be better?
Thanks for reading, and i hope you can help
The most simple idea is to reset only the nodes that you've changed - store them in different array and iterate only it - JPS should modify only a small part of the given nodes.
The idea of your friends is not better, since you will still iterate over all nodes, and modify each value. The values of the node are also boolean (or at least I hope so), so you win nothing but having second array (vector) of values.
Either way I don't find it that bad to modify bool values, but if you really need to optimize (which I find great) - go with "reset what's changed" - can't imagine better one.
But why do you recalculate path 5 times every second? You have graph with a size 40x40, by the help of A* or another one algorithm, you will be able find correct path. As you calculate path, you don't recalculate it again, only if you have dynamic obstacles in the game.
If you don't know how to implement pathfinding algorithm in AS3 project. There are several ready solutions