Regression / "Learning" function - regression

I have a tabulated data with columns: x, a, b, c, ... x has finite range.
I would like to find/learn a function f(a,b,c, ...) such that, if we plot all points (x, y = f(a,b,c, ...)) then the line fitting all such points has positive slope. That is, as x increases, f(a,b,c, ...) also increases. What is the best and simplest way to find such f. In other words, for any two rows in the table (x1, a1, b1, c1, ...) and (x2, a2, b2, c2, ...) we have, if x1 < x2 then f(a1, b1, c1, ...) < f(a2, b2, c2, ...).
One can see this as a "relaxed" version of predicting x in supervised learning.

Related

MySQL GROUP BY if Multiple Numbered Columns are Close to Each Other (+/- 1)

I have a mysql table with a large list of coordinates (x, y, z). I want to find the most common spots, but when the same place is logged, it isn't identical. For example, x could be 496.0481 or 496.3904, but that is actually the same place.
When I do the following query I get a list of the absolute exact matches, but those are very few and far between:
SELECT x, y, z, COUNT(*) AS coords
FROM coordinates
GROUP BY x, y, z
ORDER BY coords DESC
LIMIT 10;
How can I adjust this to be grouped by each of x, y, and z to be +/- 1 to catch a larger area? I've tried a mix of IF and BETWEEN statements but can't seem to get anything to work.
If I do GROUP BY round(x), round(y), round(z), that gets a larger range but doesn't capture if the number goes from 496 to 497 even if they are just slightly different.
Thanks in advance for the help.
Very naive way:
select t1.x as x1, t1.y as y1, t1.z as z1, t2.x as x2, t2.y as y2, t2.z as z2
from coordinates t1
join coordinates t2 on sqrt(power(t2.x-t1.x, 2) + power(t2.y-t1.y, 2) + power(t2.z-t1.z, 2)) <= 1
For each coordinates (t1) query finds all other coordinates (t2) that dinstanced less or equal than 1 from each other.
But this query has very bad performance: O(n^2)

SQL Dense and Spare Indexes

I am preparing for my Database final and I would like to understand these two questions. Could you please explain to me which is the correct answer and why it is correct.
Suppose that you have a relation with the schema R(X, Y, Z). Every value of X is unique, but the
other columns could have duplicate values. Assume that a sparse index is created for relation R on attribute X. Which of the following queries would use this index effectively?
(a) SELECT MAX(X)
FROM R
(b) SELECT MAX(Y)
FROM R
GROUP BY X
(c) SELECT *
FROM R
WHERE X <> 30
(d) SELECT MAX(Y)
FROM R
WHERE X = 23
(e) none of the above uses the index effectively
I believe (a) could be the correct answer as we have an index for X and they are all unique values.
Suppose that you have a relation with the schema R(X, Y, Z). Every value of X is unique, but the
other columns could have duplicate values. Assume that a dense index is created for relation R on attributes X and Y. Which of the following queries would use this index effectively?
(a) SELECT *
FROM R
WHERE X < Y
(b) SELECT DISTINCT X, Y
FROM R
WHERE X = 23 AND Y > 39
(c) SELECT X, Y
FROM R
(d) SELECT X
FROM R
WHERE Y = 23
(e) none of the above uses the index effectively
I believe (c) could be the correct answer as we have an indexes for both X and Y.
MySQL does not have "dense" vs "sparse". Here are the optimal indexes:
(a) SELECT MAX(X) FROM R -- INDEX(X)
(b) SELECT MAX(Y) FROM R GROUP BY X -- INDEX(x,y); INDEX(x) is not as good
(c) SELECT * FROM R WHERE X <> 30 -- No index is _likely_ to be useful
(d) SELECT MAX(Y) FROM R WHERE X = 23 -- Does not make sense if X is unique
(e) none of the above uses the index effectively -- Some: a,b,d
and
(a) SELECT * FROM R WHERE X < Y -- no index
(b) SELECT DISTINCT X, Y FROM R WHERE X = 23 AND Y > 39 -- Dumb due to uniqueness
(c) SELECT X, Y FROM R -- no index; just read the table
(d) SELECT X FROM R WHERE Y = 23 -- INDEX(Y,X), or, not as good, INDEX(Y)
(e) none of the above uses the index effectively -- ambiguous; note that (Y,X) is not same as (X,Y)

If xy determines z can x determine z and y determine z?

It is a functional dependency question.
I know that when x->yz then x->y and x->z.But is the above dependency possible?
If xy determines z can x determine z and y determine z?
Yes, if xy -> z then it's possible that also x -> z and y -> z.
Suppose z can only have one value; then a given x, y or xy only ever appears with that one value. Or suppose x -> z and y -> z and x must equal y. Or suppose both x and y are unique; then xy is unique. (A case of that is when both x & y are candidate keys.) In fact any time that x -> z and y -> z, xy -> z.
(To show something is possible it's always worth trying some cases, especially very simple ones, in case they are examples, so you don't have to prove the general case.)

Polynomial-time logic puzzle using AND/OR/NOT to create <= on N inputs?

So let's say you have n boolean inputs of x1, x2, x3, ..., xn. How do you determine that <= k of your boolean inputs are True using only And/Or/Not logic gates, and doing so in polynomial time?
I'm quite honestly befuddled.
There are many ways to do it. One is to (recursively) make two nets:
one (A) determining that <= k-1 of boolean inputs x1 ... x[n-1] are True.
another (B) determining that <= k of boolean inputs x1 ... x[n-1] are True.
Connect them as (B And Not x[n]) Or A

Simple Excel function that splits a numbered entered into one cell randomy and evenly over 3 others

I am looking for a simple function that will take a number entered into a single cell say 20 and divide it evenly and randomly over three other cells, none of the values can be 0.
ie. A1 = 20
then
B1=6
C1=8
D1=6
Thanks!!
I don't have Excel in front of me, but something like this
B1: =Round(Rand()*(a1-3)+1, 0)
C1: =Round(Rand()*(a1-b1-2)+1, 0)
D1: =A1-B1-C1
B1 is set to a number from 1 to A1-2
C1 is set to a number from 1 to A1-B1-1
D1 is set to what's left.
I think you would have to write a macro to expand the values into B1, C1, D1 automatically, but the way I would do it would be to put the following code into B1:
=RANDBETWEEN(1, (A1-2))
The following into C1:
=RANDBETWEEN(1, (A1-B1-1))
The following into D1:
=A1-B1-C1
If you don't have the RANDBETWEEN() Function available here is how to enable it:
From the 'Tools' menu, select 'Add-Ins'
Find the reference to 'Analysis ToolPak' and put a tick in the box
Select 'OK
Without a macro, I dont see any way to get around having some temporary values shown.
Look at my illustration here which does what you are trying to achieve:
http://www.gratisupload.dk/download/43627/
A2 holds the initial value to split
Temporary values:
C2,D2,E2 are just =RAND()
Your evenly, but radomly split values will apear in these cells:
C5 = A2 / (C2 + D2 + E2) * C2
D5 = A2 / (C2 + D2 + E2) * D2
E5 = A2 / (C2 + D2 + E2) * E2
Edit: Of course you could show the temporary values (C2, D2, E2) on a seperate sheet. Still, only to avoid the evil world of macros.