Regression model using two collinear time-related variables? - regression

I am building a regression model to assess how a certain outcome, tracked from 2015-2018, changed in the year 2018 specifically relative to 2015-2017. However, the outcome underwent a natural year-by-year decline that I would also like to capture in the regression model. As a result, I am currently using a variable X as my independent variable (X=0 for 2015-2017 vs. X=1 for 2018), and a variable Y as a confounder (modeled continuously) to adjust for changes across the entire study period (X=0 for 2015, X=1 for 2016, X=2 for 2017, X=3 for 2018).
However, as you can see in the table below, there is a great deal of collinearity between these two variables. One solution would be removing confounder Y from the model but I believe it is important to capture year-by-year change from 2015-2017. Is there an alternative way I can set up this model or an alternative methodology I can use (ex. time series) to perform this analysis? Thank you very much!

Related

LSTM Evolution Forecast

I have a confusion about the way the LSTM networks work when forecasting with an horizon that is not finite but I'm rather searching for a prediction in whatever time in future. In physical terms I would call it the evolution of the system.
Suppose I have a time series $y(t)$ (output) I want to forecast, and some external inputs $u_1(t), u_2(t),\cdots u_N(t)$ on which the series $y(t)$ depends.
It's common to use the lagged value of the output $y(t)$ as input for the network, such that I schematically have something like (let's consider for simplicity just lag 1 for the output and no lag for the external input):
[y(t-1), u_1(t), u_2(t),\cdots u_N(t)] \to y(t)
In this way of thinking the network, when one wants to do recursive forecast it is forced to use the predicted value at the previous step as input for the next step. In this way we have an effect of propagation of error that makes the long term forecast badly behaving.
Now, my confusion is, I'm thinking as a RNN as a kind of an (simple version) implementation of a state space model where I have the inputs, my output and one or more state variable responsible for the memory of the system. These variables are hidden and not observed.
So now the question, if there is this kind of variable taking already into account previous states of the system why would I need to use the lagged output value as input of my network/model ?
Getting rid of this does my long term forecast would be better, since I'm not expecting anymore the propagation of the error of the forecasted output. (I guess there will be anyway an error in the internal state propagating)
Thanks !
Please see DeepAR - a LSTM forecaster more than one step into the future.
The main contributions of the paper are twofold: (1) we propose an RNN
architecture for probabilistic forecasting, incorporating a negative
Binomial likelihood for count data as well as special treatment for
the case when the magnitudes of the time series vary widely; (2) we
demonstrate empirically on several real-world data sets that this
model produces accurate probabilistic forecasts across a range of
input characteristics, thus showing that modern deep learning-based
approaches can effective address the probabilistic forecasting
problem, which is in contrast to common belief in the field and the
mixed results
In this paper, they forecast multiple steps into the future, to negate exactly what you state here which is the error propagation.
Skipping several steps allows to get more accurate predictions, further into the future.
One more thing done in this paper is predicting percentiles, and interpolating, rather than predicting the value directly. This adds stability, and an error assessment.
Disclaimer - I read an older version of this paper.

QAT output nodes for Quantized Model got the same min max range

Recently, I have worked on quantization aware training on tf1.x to push the model to Coral Dev Board. However, when I finished training the model, why is my min max of my 2 outputs fake quantization is the same?
Should it be different when one's maximum target is 95 and one is 2pi?
I have figured out the problem. It is the problem when that part of the model is not really trained QAT. This happens for the output node that somehow forgets to QAT when training. The -6 and 6 values come from the default source of the quantization of tf1.x as mention here
To overcome the problem, we should provide some op to trigger the QAT for the output nodes. In my regression case, I add a dummy op: tf.maximum(output,0) in the model to make the node QAT. If your output is strictly between 0-1, applying "sigmoid" activation at output instead of relu can also solve the problems.

Machine Learning for gesture recognition with Myo Armband

I'm trying to develop a model to recognize new gestures with the Myo Armband. (It's an armband that possesses 8 electrical sensors and can recognize 5 hand gestures). I'd like to record the sensors' raw data for a new gesture and feed it to a model so it can recognize it.
I'm new to machine/deep learning and I'm using CNTK. I'm wondering what would be the best way to do it.
I'm struggling to understand how to create the trainer. The input data looks like something like that I'm thinking about using 20 sets of these 8 values (they're between -127 and 127). So one label is the output of 20 sets of values.
I don't really know how to do that, I've seen tutorials where images are linked with their label but it's not the same idea. And even after the training is done, how can I avoid the model to recognize this one gesture whatever I do since it's the only one it's been trained for.
An easy way to get you started would be to create 161 columns (8 columns for each of the 20 time steps + the designated label). You would rearrange the columns like
emg1_t01, emg2_t01, emg3_t01, ..., emg8_t20, gesture_id
This will give you the right 2D format to use different algorithms in sklearn as well as a feed forward neural network in CNTK. You would use the first 160 columns to predict the 161th one.
Once you have that working you can model your data to better represent the natural time series order it contains. You would move away from a 2D shape and instead create a 3D array to represent your data.
The first axis shows the number of samples
The second axis shows the number of time steps (20)
The thirst axis shows the number of sensors (8)
With this shape you're all set to use a 1D convolutional model (CNN) in CNTK that traverses the time axis to learn local patterns from one step to the next.
You might also want to look into RNNs which are often used to work with time series data. However, RNNs are sometimes hard to train and a recent paper suggests that CNNs should be the natural starting point to work with sequence data.

Determining edge weights given a list of walks in a graph

These questions regard a set of data with lists of tasks performed in succession and the total time required to complete them. I've been wondering whether it would be possible to determine useful things about the tasks' lengths, either as they are or with some initial guesstimation based on appropriate domain knowledge. I've come to think graph theory would be the way to approach this problem in the abstract, and have a decent basic grasp of the stuff, but I'm unable to know for certain whether I'm on the right track. Furthermore, I think it's a pretty interesting question to crack. So here we go:
Is it possible to determine the weights of edges in a directed weighted graph, given a list of walks in that graph with the lengths (summed weights) of said walks? I recognize the amount and quality of permutations on the routes taken by the walks will dictate the quality of any possible answer, but let's assume all possible walks and their lengths are given. If a definite answer isn't possible, what kind of things can be concluded about the graph? How would you arrive at those conclusions?
What if there were several similar walks with possibly differing lengths given? Can you calculate a decent average (or other illustrative measure) for each edge, given enough permutations on different routes to take? How will discounting some permutations from the available data set affect the calculation's accuracy?
Finally, what if you had a set of initial guesses as to the weights and had to refine those using the walks given? Would that improve upon your guesstimation ability, and how could you apply the extra information?
EDIT: Clarification on the difficulties of a plain linear algebraic approach. Consider the following set of walks:
a = 5
b = 4
b + c = 5
a + b + c = 8
A matrix equation with these values is unsolvable, but we'd still like to estimate the terms. There might be some helpful initial data available, such as in scenario 3, and in any case we can apply knowledge of the real world - such as that the length of a task can't be negative. I'd like to know if you have ideas on how to ensure we get reasonable estimations and that we also know what we don't know - eg. when there's not enough data to tell a from b.
Seems like an application of linear algebra.
You have a set of linear equations which you need to solve. The variables being the lengths of the tasks (or edge weights).
For instance if the tasks lengths were t1, t2, t3 for 3 tasks.
And you are given
t1 + t2 = 2 (task 1 and 2 take 2 hours)
t1 + t2 + t3 = 7 (all 3 tasks take 7 hours)
t2 + t3 = 6 (tasks 2 and 3 take 6 hours)
Solving gives t1 = 1, t2 = 1, t3 = 5.
You can use any linear algebra techniques (for eg: http://en.wikipedia.org/wiki/Gaussian_elimination) to solve these, which will tell you if there is a unique solution, no solution or an infinite number of solutions (no other possibilities are possible).
If you find that the linear equations do not have a solution, you can try adding a very small random number to some of the task weights/coefficients of the matrix and try solving it again. (I believe falls under Perturbation Theory). Matrices are notorious for radically changing behavior with small changes in the values, so this will likely give you an approximate answer reasonably quickly.
Or maybe you can try introducing some 'slack' task in each walk (i.e add more variables) and try to pick the solution to the new equations where the slack tasks satisfy some linear constraints (like 0 < s_i < 0.0001 and minimize sum of s_i), using Linear Programming Techniques.
Assume you have an unlimited number of arbitrary characters to represent each edge. (a,b,c,d etc)
w is a list of all the walks, in the form of 0,a,b,c,d,e etc. (the 0 will be explained later.)
i = 1
if #w[i] ~= 1 then
replace w[2] with the LENGTH of w[i], minus all other values in w.
repeat forever.
Example:
0,a,b,c,d,e 50
0,a,c,b,e 20
0,c,e 10
So:
a is the first. Replace all instances of "a" with 50, -b,-c,-d,-e.
New data:
50, 50
50,-b,-d, 20
0,c,e 10
And, repeat until one value is left, and you finish! Alternatively, the first number can simply be subtracted from the length of each walk.
I'd forget about graphs and treat lists of tasks as vectors - every task represented as a component with value equal to it's cost (time to complete in this case.
In tasks are in different orderes initially, that's where to use domain knowledge to bring them to a cannonical form and assign multipliers if domain knowledge tells you that the ratio of costs will be synstantially influenced by ordering / timing. Timing is implicit initial ordering but you may have to make a function of time just for adjustment factors (say drivingat lunch time vs driving at midnight). Function might be tabular/discrete. In general it's always much easier to evaluate ratios and relative biases (hardnes of doing something). You may need a functional language to do repeated rewrites of your vectors till there's nothing more that romain knowledge and rules can change.
With cannonical vectors consider just presence and absence of task (just 0|1 for this iteratioon) and look for minimal diffs - single task diffs first - that will provide estimates which small number of variables. Keep doing this recursively, be ready to back track and have a heuristing rule for goodness or quality of estimates so far. Keep track of good "rounds" that you backtraced from.
When you reach minimal irreducible state - dan't many any more diffs - all vectors have the same remaining tasks then you can do some basic statistics like variance, mean, median and look for big outliers and ways to improve initial domain knowledge based estimates that lead to cannonical form. If you finsd a lot of them and can infer new rules, take them in and start the whole process from start.
Yes, this can cost a lot :-)

Creating a logic gate simulator

I need to make an application for creating logic circuits and seeing the results. This is primarily for use in A-Level (UK, 16-18 year olds generally) computing courses.
Ive never made any applications like this, so am not sure on the best design for storing the circuit and evaluating the results (at a resomable speed, say 100Hz on a 1.6Ghz single core computer).
Rather than have the circuit built from the basic gates (and, or, nand, etc) I want to allow these gates to be used to make "chips" which can then be used within other circuits (eg you might want to make a 8bit register chip, or a 16bit adder).
The problem is that the number of gates increases massively with such circuits, such that if the simulation worked on each individual gate it would have 1000's of gates to simulate, so I need to simplify these components that can be placed in a circuit so they can be simulated quickly.
I thought about generating a truth table for each component, then simulation could use a lookup table to find the outputs for a given input. The problem occurred to me though that the size of such tables increase massively with inputs. If a chip had 32 inputs, then the truth table needs 2^32 rows. This uses a massive amount of memory in many cases more than there is to use so isn't practical for non-trivial components, it also wont work with chips that can store their state (eg registers) since they cant be represented as a simply table of inputs and outputs.
I know I could just hardcode things like register chips, however since this is for educational purposes I want it so that people can make their own components as well as view and edit the implementations for standard ones. I considered allowing such components to be created and edited using code (eg dlls or a scripting language), so that an adder for example could be represented as "output = inputA + inputB" however that assumes that the students have done enough programming in the given language to be able to understand and write such plugins to mimic the results of their circuit which is likly to not be the case...
Is there some other way to take a boolean logic circuit and simplify it automatically so that the simulation can determine the outputs of a component quickly?
As for storing the components I was thinking of storing some kind of tree structure, such that each component is evaluated once all components that link to its inputs are evaluated.
eg consider: A.B + C
The simulator would first evaluate the AND gate, and then evaluate the OR gate using the output of the AND gate and C.
However it just occurred to me that in cases where the outputs link back round to the inputs, will cause a deadlock because there inputs will never all be evaluated...How can I overcome this, since the program can only evaluate one gate at a time?
Have you looked at Richard Bowles's simulator?
You're not the first person to want to build their own circuit simulator ;-).
My suggestion is to settle on a minimal set of primitives. When I began mine (which I plan to resume one of these days...) I had two primitives:
Source: zero inputs, one output that's always 1.
Transistor: two inputs A and B, one output that's A and not B.
Obviously I'm misusing the terminology a bit, not to mention neglecting the niceties of electronics. On the second point I recommend abstracting to wires that carry 1s and 0s like I did. I had a lot of fun drawing diagrams of gates and adders from these. When you can assemble them into circuits and draw a box round the set (with inputs and outputs) you can start building bigger things like multipliers.
If you want anything with loops you need to incorporate some kind of delay -- so each component needs to store the state of its outputs. On every cycle you update all the new states from the current states of the upstream components.
Edit Regarding your concerns on scalability, how about defaulting to the first principles method of simulating each component in terms of its state and upstream neighbours, but provide ways of optimising subcircuits:
If you have a subcircuit S with inputs A[m] with m < 8 (say, giving a maximum of 256 rows) and outputs B[n] and no loops, generate the truth table for S and use that. This could be done automatically for identified subcircuits (and reused if the subcircuit appears more than once) or by choice.
If you have a subcircuit with loops, you may still be able to generate a truth table. There are fixed-point finding methods which can help here.
If your subcircuit has delays (and they are significant to the enclosing circuit) the truth table can incorporate state columns. E.g. if the subcircuit has input A, inner state B, and output C, where C <- A and B, B <- A, the truth table could be:
A B | B C
0 0 | 0 0
0 1 | 0 0
1 0 | 1 0
1 1 | 1 1
If you have a subcircuit that the user asserts implements a particular known pattern such as "adder", provide an option for using a hard-coded implementation for updating that subcircuit instead of by simulating its inner parts.
When I made a circuit emulator (sadly, also incomplete and also unreleased), here's how I handled loops:
Each circuit element stores its boolean value
When an element "E0" changes its value, it notifies (via the observer pattern) all who depend on it
Each observing element evaluates its new value and does likewise
When the E0 change occurs, a level-1 list is kept of all elements affected. If an element already appears on this list, it gets remembered in a new level-2 list but doesn't continue to notify its observers. When the sequence which E0 began has stopped notifying new elements, the next queue level is handled. Ie: the sequence is followed and completed for the first element added to level-2, then the next added to level-2, etc. until all of level-x is complete, then you move to level-(x+1)
This is in no way complete. If you ever have multiple oscillators doing infinite loops, then no matter what order you take them in, one could prevent the other from ever getting its turn. My next goal was to alleviate this by limiting steps with clock-based sync'ing instead of cascading combinatorials, but I never got this far in my project.
You might want to take a look at the From Nand To Tetris in 12 steps course software. There is a video talking about it on youtube.
The course page is at: http://www1.idc.ac.il/tecs/
If you can disallow loops (outputs linking back to inputs), then you can significantly simplify the problem. In that case, for every input there will be exactly one definite output. Cycles however can make the output undecideable (or rather, constantly changing).
Evaluating a circuit without loops should be easy - just use the BFS algorithm with "junctions" (connections between logic gates) as the items in the list. Start off with all the inputs to all the gates in an "undefined" state. As soon as a gate has all inputs "defined" (either 1 or 0), calculate its output and add its output junctions to the BFS list. This way you only have to evaluate each gate and each junction once.
If there are loops, the same algorithm can be used, but the circuit can be built in such a way that it never comes to a "rest" and some junctions are always changing between 1 and 0.
OOps, actually, this algorithm can't be used in this case because the looped gates (and gates depending on them) would forever stay as "undefined".
You could introduce them to the concept of Karnaugh maps, which would help them simplify truth values for themselves.
You could hard code all the common ones. Then allow them to build their own out of the hard coded ones (which would include low level gates), which would be evaluated by evaluating each sub-component. Finally, if one of their "chips" has less than X inputs/outputs, you could "optimize" it into a lookup table. Maybe detect how common it is and only do this for the most used Y chips? This way you have a good speed/space tradeoff.
You could always JIT compile the circuits...
As I haven't really thought about it, I'm not really sure what approach I'd take.. but it would possibly be a hybrid method and I'd definitely hard code popular "chips" in too.
When I was playing around making a "digital circuit" simulation environment, I had each defined circuit (a basic gate, a mux, a demux and a couple of other primitives) associated with a transfer function (that is, a function that computes all outputs, based on the present inputs), an "agenda" structure (basically a linked list of "when to activate a specific transfer function), virtual wires and a global clock.
I arbitrarily set the wires to hard-modify the inputs whenever the output changed and the act of changing an input on any circuit to schedule a transfer function to be called after the gate delay. With this at hand, I could accommodate both clocked and unclocked circuit elements (a clocked element is set to have its transfer function run at "next clock transition, plus gate delay", any unclocked element just depends on the gate delay).
Never really got around to build a GUI for it, so I've never released the code.