Why the ouput of nn.Embeddings(vocab_size, dim) chnages on re-running the code for same input string? - deep-learning

I am trying to understand how word embeddings are generated, I've been reading that 1-hot encoded vector is used and it servers as a lookup table but, I want to print that and see, how can I do that. When I am doing the following:
self.embeddings = nn.Embedding(vocab_size, dim)
print('embed',self.embeddings.weight)
I am getting different results when I re-run the code for same input string/vocab_size and same dim.
I want to know why this happens and how are those weight values calulated? What equation/fucntions is used to get these values? Is any function like softmax etc used to get the weights?

Related

d3.js How to not graph values outside of range?

I have a multi-bar graph with 7 different bar listings. Dates are on the x axis and decimal values are on the y axis. Some of these listings have empty strings ("") for their decimal values and they are graphed as 0.000. I don't want these to show up at all. I tried using chart.yDomain.([0, 3]); and setting the empty values to -1 and they don't show up on the graph, but the spacing between the bars is the same as if they were graphed.
I also tried not putting empty value pairs into the graph datum array, but that messed up the date sorting since not every listing has a value for each date.
Here's an example of the JSON data I am using for the graphing:
"x_data":["08\/15\/13","11\/11\/13","11\/13\/13","11\/14\/13","11\/18\/13","11\/19\/13","11\/20\/13","11\/25\/13","12\/05\/13","12\/09\/13","12\/11\/13","12\/12\/13"],
"y_data":[[["","","","","","","",0.875,"",0.41,"",""]],[["","","","","","","","",0.285,"",0.92,""]],[["",0.203,0.17,0.223,0.193,0.303,0.263,"","","","",""]],[["",0.433,0.333,0.665,0.353,0.413,0.458,"","","","",""]],[["",0.355,0.3,0.263,0.258,0.355,0.215,"","","","",""]],[["",0.195,0.43,0.243,0.28,0.44,0.4,"","","","",""]],[[1.218,"","","","","","","","","","",""]]]}
Here is a screen shot of how it looks without setting the domain:
http://i.imgur.com/TO3wwWF.png?1
Here is a screen shot of what it looks like when I do set the domain:
http://i.imgur.com/NEwgkJf.png?1
Since you haven't provided a fiddle or equivalent, it's not possible to provide a copy-and-paste answer, but a general approach would be to remove the null values from the data before creating the chart.
Since the data in your example isn't formatted exactly as D3.js expects, I'll assume you're not simply fetching it using D3's built-in request function (e.g. d3.json('url/to/data.json')) but, rather, have the data in local variable. Assuming you also want to preserve the structure above, you could do something like the following. (It's not optimized to make the logic as clear as possible.)
var cleandata = {
x_data: [],
y_data: []
};
data.y_data.forEach(function(y_value, idx){
if (y_value) {
cleandata.x_data.push(data.x_data[idx]);
cleandata.y_data.push(data.y_data[idx]);
}
})

Complex Gremlin queries to output nodes/edges

I am trying to implement a query and graph visualisation framework that allows a user to enter a Gremlin query, returning a D3 graph of results. The D3 graph is built using a JSON - this is created using separate vertices and edges outputs from the Gremlin query. For simple queries such as:
g.V.filter{it.attr_a == "foo"}
this works fine. However, when I try to perform a more complicated query such as the following:
g.E.filter{it.attr_a == 'foo'}.groupBy{it.attr_b}{it.outV.value}.cap.next().findAll{k,e->e.size()<=3}
- Find all instances of *value*
- Grouped by unique *attr_b*
- Where *attr_a* = foo
- And *attr_b* is paired with no more than 2 other instances of *value*
Instead, the output is of the following form:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
I would like to know if there is a way for Gremlin to output the results as a list of nodes and edges so I can display the results as a graph. I am aware that I could edit my D3 code to take in this new output but there are currently no restrictions to the type/complexity of the query, so the key/value pairs will no necessarily be the same every time.
Thanks.
You've hit what I consider one of the key problems with visualizing Gremlin results. They can be anything. Gremlin results might not just be a list of vertices and edges. There is no way to really control this that I can think of. At the end of the day, you can really only visualize results that match a pattern that D3 expects. I'd start by trying to detect that pattern and visualize only in those cases (simply display non-recognized patterns as JSON perhaps).
Thinking of your specific example that results like this:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
What would you want D3 to visualize there? The vertices/edges that were traversed over to get that result? If so, you might be stuck. Gremlin doesn't give you a way to introspect the pipeline to see what's passing through it. In other words, unless the user explicitly gathers vertices and edges within the pipeline that were touched you won't have access to them. It would be nice to be able to "spy" on a pipeline in that way, but at the moment it doesn't do that. There's been internal discussion within TinkerPop to create a new kind of pipeline implementation that would help with that, but at the moment, it doesn't exist.
So, without the "spying" capability, I think your only workarounds would be to:
detect vertex/edge list on your client side and only render those with d3. this would force users to always write gremlin that returned data in such a format, if they wanted visualization. put it in the users hands.
perhaps supply server-side bindings for a list of vertices/edges that a user could explicitly side-effect their vertices/edges into if their results did not conform to those expected by your visualization engine. again, this would force users to write their gremlin appropriately for your needs if they want visualization.

F# Read File, Split string list, summarize data, Nonfloat decimal numbers

I'm new to F# and got this assignment to create a very simple bankrepresentation.
I do not want any code answers directly related to the problem, but preferally links or tips on where to find solutions or how to find do the solutions.
The issues are the following:
Reading lines of a file (a line looks like this: "126,145001,1500.00" and it's sequence_number, account_number, amount)
Split the line to use the data from the line
summarize the data (to return the bank account balance)
Not using floating point numbers representing the amount, due to rounding errors(?)
Doing all of these in one function.
I know how to read a file, in a function.
I also know how to split a string.
I know how to recursivly add values from a list.
I do not know how to add values that are decimal without floating-point variables.
I do not know how to retrieve the string from a list in a function and split it.
I do not know how to do all of these things in on function taking in file name, account number, and account currency.
The function should return the balance after the transactions in the file have been proccessed.
My idea to solve this is to create a datatype that have the three variables sequence_number, account_number and amount, and then do the following:
Read the file,
Split the data and create an object of my custom type for each line in the file
Add and remove the values from the types and return the final balance.
If anyone could point me in the right direction for each or any problem I would be really thankful!
.NET contains a type called System.Decimal that is indeed more appropriate for storing financial figures than the typical floating point types. In F#, you can use the decimal function to convert a value of a different type (say a string) to a System.Decimal (which F# abbreviates as a type also named decimal): let d = decimal "1.23" You can also create these values directly by using the M suffix: let d' = 1.23M, but in your case that doesn't seem relevant.
Regarding your other questions, if you use System.IO.File.ReadLines, then you can get the individual lines of your file as a sequence. Then you can string together a bunch of operations on that sequence to achieve your desired result. For instance, you can take the sequence and use Seq.map <your splitting code here> to split each line (and convert to instances of your specific data type, if desired), and then use Seq.groupBy to group the transactions by account number, and then Seq.map again to apply your summarization logic to each group. Ask follow-up questions if any of this is unclear.

how do i decode/encode the url parameters for the new google maps?

Im trying to figure out how to extract the lat/long of the start/end in a google maps directions link that looks like this:
https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0
Im guessing the "!" is a separator between variables followed by XY where x is a number and y is a lower case letter, but can not quite figure out how to reliably extract the coordinates as the number/order of variables changes as well as their XY prefixes.
ideas?
thanks
Well, this is old, but hey. I've been working on this a bit myself, so here's what I've figured out:
The data is an encoded javascript array, so the trick when trying to generate your own data string is to ensure that your formatting keeps the structure of the array intact. To do this, let's look at what each step represents.
As you're correctly figured out, each exclamation point defines the start of a value definition. The first character, an int value, is an inner count, and (I believe) acts as an identifier, although I'm not 100% certain on this. It seems to be pretty flexible in terms of what you can have here, as long as it's an int. The second character, however, is much more important. It defines the data type of the value. I don't know if I've found all the data types yet, but the ones I have figured out are:
m: matrix
f: float
d: double
i: integer
b: boolean
e: enum (as integer)
s: string
u: unsigned int
x: hexdecimal value?
the remaining characters actually hold the value itself, so a string will just hold the string, a boolean will be '1' or '0', and so on. However, there's an important gotcha: the matrix data type.
The value of the matrix will be an integer. This is the length of the matrix, measured in the number of values. That is, for a matrix !1mx, the next x value definitions will belong to the matrix. This includes nested matrix definitions, so a matrix of form [[1,2]] would look like !1m3!1m2!1i1!2i2 (outer matrix has three children, inner matrix has 2). this also means that, in order to remove a value from the list, you must also check it for matrix ancestors and, if they exist, update their values to reflect the now missing member.
The x data type is another anomaly. I'm going to guess it's hexdecimal encoded for most purposes, but in my particular situation (making a call for attribution info), they appear to also use the x data type to store lat/long information, and this is NOT encoded in hex, but is an unsigned long with the value set as
value = coordinate<0 ? (430+coordinate)*1e7 : coordinate*1e7
An example (pulled directly from google maps) of the x data type being used in this way:
https://www.google.com/maps/vt?pb=!1m8!4m7!2u7!5m2!1x405712614!2x3250870890!6m2!1x485303036!2x3461808386!2m1!1e0!2m20!1e2!2spsm!4m2!1sgid!2sznfCVopRY49wPV6IT72Cvw!4m2!1ssp!2s1!8m11!13m9!2sa!15b1!18m5!2b1!3b0!4b1!5b0!6b0!19b1!19u12!3m1!5e1105!4e5!18m1!1b1
For the context of the question asked, it's important to note that there are no reliable identifiers in the structure. Google reads the values in a specific order, so always keep in mind when building your own encoded data that order matters; you'll need to do some research/testing to determine that order. As for reading, your best hope is to rebuild the matrix structure, then scan it for something that looks like lat/long values (i.e. a matrix containing exactly two children of type double (or x?))
Looks like the developer tools from current browsers (I am using Chrome for that) can give you a lot of info.
Try the following:
Go to Google Maps with Chrome (or adapt the instructions for other browser);
Open Developer Tools (Ctrl + Shift + I);
Go to Network tab. Clear the current displayed values;
Drag the map until some url with encoded data appears;
Click on that url, and then go to the Preview sub-tab;
Try this.
function URLtoLatLng(url) {
this.lat = url.replace(/^.+!3d(.+)!4d.+$/, '$1');
this.lng = url.replace(/^.+!4d(.+)!6e.+$/, '$1');
return this;
}
var url = new URLtoLatLng('https://www.google.com/maps/preview#!data=!1m4!1m3!1d189334!2d-96.03687!3d36.1250439!4m21!3m20!1m4!3m2!3d36.0748342!4d-95.8040972!6e2!1m5!1s1331-1399+E+14th+St%2C+Tulsa%2C+OK+74120!2s0x87b6ec9a1679f9e5%3A0x6e70df70feebbb5e!3m2!3d36.1424613!4d-95.9736986!3m8!1m3!1d189334!2d-96.03687!3d36.1250439!3m2!1i1366!2i705!4f13.1&fid=0');
console.log(url.lat + ' ' + url.lng);

Where do I get "junk" data to help test my code?

For my C class I've written a simple statistics program -- it calculates max, min, mean, etc. Anyway, I've gotten the program successfully compiled, so all I need to do now is actually test it; the only problem is that I don't have anything to test with.
In my case, I need a list of doubles -- my program needs to accept between 2 and 1,000,000; Is there some resource online that can produce lists of otherwise meaningless data? I know Lorem Ipsum gets used for typesetting, and I'm wondering if there's something similar for various types of numerical data.
Or am I out of luck, and I'll have to just create my own junk data?
The problem with testing software is not the source of the data, but the test set. I mean, can you test an int sum(int a, int b) method by just inputting random numbers to it? No, you need to know what to expect. This is a test set: inputs and expected outputs.
What do you say when you discover that 548888876+99814465=643503341? How can you tell this is the real result?
More than finding random numbers to give your program, you must somehow know the results of your computation in advance in order to compare it.
There are a few ways to do it: what I suggest you is to pick a random number generator (amphetamachine +1) and use the data both on your code and on a program that you already know is good, ie. Matlab for your purposes. After computing your statistics with both, compare your results and see if you coded good or need to do some debug.
By the way, I volountarily altered the result of the above sum...
What about just generating a random double?
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
double number = r.NextDouble();
//do something with the value
}
Since the data you need will depend on the program, there is no source of generic data that I know of.
If you are able to write that program, you should be able to write a script to generate dummy data for yourself.
Just use a loop to print out random numbers within the range your program can accept.
Generate a file with random bytes:
$ dd \
of=random-bytes \
if=/dev/urandom \
bs=1024 \
count=1024
http://www.generatedata.com/#generator
I've used that data generator before with some success. To be fair, it will usually involve copy/pasting the data it generates into some other format that you'll be able to read in.
You can generate your own data for this specific case quite easily though. Loop a random number of times with a terminating condition of 1,000,000. Generating random doubles within the range you expect. Feed that in and away you go.
Generating your own test data in this case is probably the best option.
You could take the first million digits of pi and chop them up into however many doubles you want.
The first few could be 3.14159, 2.65358, 9.79323, 8.46264, 3.38327, 9.50288, 4.19716, and 9.39937, for example.