Multinominal logistic regression in R from a 2x2 table - regression

Sorry, a bit new to all this.
I want to do logistic regression in R with multiple dependent variables. However, all I have access to are the 2x2 tables as follows:
How do I plug these into a glm model?
Dataset is as follows:
Disease
No Yes
Var1 No 41 21
Yes 65 42
Var2 No 55 26
Yes 66 46
Var3 No 99 50
Yes 22 22
Would I have to de-aggregate or reverse-summarise the dataset and revert the data set back to a non-count observation state? Not sure how to go about this though.
I really appreciate any help you can provide. Thanks!

Related

Can I query 285 IDs and URLs in a single elegantly formed SQL query

I have to query 285 IDs and paths. At first glance, there are no real patterns to the ID's and I'm not certain there's a way to achieve this end with a simple and elegant query.
However, I'm by no means an expert and still learning SQL. So, hoping for some guidance.
Below is an extract of the first 50 enttries i'm trying to match:
ID path
261617 /About/Factfile
31 /About/Factfile/18060
761 /About/Factfile/18060/11550
762 /About/Factfile/18060/11552
763 /About/Factfile/18060/11555
35 /About/Factfile/scotlandsnapshot
63 /About/Government/background
74 /About/Government/sgprevious
1555 /About/Government/sgprevious/2007-2011
328782 /About/Government/sgprevious/2011-2016
1553 /About/Government/sgprevious/sgprevious1999-2003
1554 /About/Government/sgprevious/sgprevious2003-2007
46 /About/Information/expenditure
271169 /About/Information/expenditure/GPC
329992 /About/Information/expenditure/GPC/epc500-16-17
297247 /About/Information/expenditure/GPC/GPC
297249 /About/Information/expenditure/GPC/GPC/GPC
297243 /About/Information/expenditure/GPC/GPC500
271168 /About/Information/expenditure/over-25k
1550 /About/Information/expenditure/over-25k/background
1551 /About/Information/expenditure/over-25k/reports
22138 /About/Information/expenditure/over-25k/reports/2011-2012
291275 /About/Information/expenditure/over-25k/reports/expenditure
22137 /About/Information/expenditure/over-25k/reports/Expenditure2010
266779 /About/Information/expenditure/over-25k/reports/reports
303729 /About/Information/expenditure/over-25k/reports/Test
316271 /About/Information/expenditure/over-25k/reports/Test1
276826 /About/Information/expenditure/PSRA2010
293815 /About/Information/expenditure/PSRA2010/2011-12-PSRduties
318093 /About/Information/expenditure/PSRA2010/duties-13-14
311621 /About/Information/expenditure/PSRA2010/duties-2012-13
294347 /About/Information/expenditure/PSRA2010/historic-efficiency-reports
276831 /About/Information/expenditure/PSRA2010/historicexpenditure10-11and11-12
261611 /About/People
23769 /About/People/14944/Events-Engagements/MinisterialEngagements
148038 /About/People/14944/Events-Engagements/MinisterialEngagements/2008
148037 /About/People/14944/Events-Engagements/MinisterialEngagements/2009
148036 /About/People/14944/Events-Engagements/MinisterialEngagements/2010
148039 /About/People/14944/Events-Engagements/MinisterialEngagements/201112
268177 /About/People/14944/Events-Engagements/MinisterialEngagements/2012-13
296737 /About/People/14944/Events-Engagements/MinisterialEngagements/2013-14Engagements
304646 /About/People/14944/Events-Engagements/MinisterialEngagements/2014-15Engagements
317634 /About/People/14944/Events-Engagements/MinisterialEngagements/MinisterialEngagements
987 /About/People/14944/Special-Advisers
254426 /About/People/14944/Special-Advisers/gifts-hospitality
1048 /About/People/14944/travel
23479 /About/People/14944/travel/airtravel
23481 /About/People/14944/travel/ferrytravel
23483 /About/People/14944/travel/MinisterialCarJourneys
148029 /About/People/14944/travel/MinisterialCarJourneys/2010-11
Should this be split into 2 queries, 1 for the ID's and another for the paths?
Thank you all in advance and please forgive me if this is n00b stuff.
Kind regards,
V
SELECT id, path
FROM tbl
ORDER BY path
LIMIT 285;

Quick search the most similar objects in the n-dimensional space

Lets assume that we have a points in the n-dimensional space. So we have a n coords(n columns) which can describe location of the each point.
We need to implement a table which can be used for a quick searching the most similar points, i.e. points which have the smallest distance to the desired point.
E.g. points in the db:
id c1 c2 c3 c4 c5
1 5 19 42 12 16
2 3 23 38 15 12
3 14 21 32 33 1
4 12 29 21 24 5
If we want to find the best matching for point with coords:
c1 c2 c3 c4 c5
4 20 40 14 15
We will get points with id 1 and 2.
We also have mean coordinate for each dimension(column) and vector for each point in which first element - number of the dimension in which point has the largest difference from the mean coordinate in this dimension, and last - number of the dimension in which point has the smallest difference. Maybe it can be used for the more rapid filtering points which have the biggest distance to the desired point.
So how can I do something like this using MySQL?
I think the composite index and order by abs(cx - $mycx) can be a good solution, but I can't use it because I will have more then 16 columns which I need to include in the one index.
Any help will be very useful!

5 point average in SSRS

I try to put a 5 point avg in my chart. I add a trendline, but it looks like this:
And then I created a new series to calculate there the avg. and this looks like this:
but I would like to show this in a 5 point average. How can I do this?
This answer is based on my experience with Excel, not reporting services, but it is probably the same problem.
Your chart is probably a scatter plot rather than a line chart (note: this is Excel terminology). A scatter plot does not have an intrinsic ordering in the data. A line chart does.
The solution (for a scatter plot) is simply to sort the data by the x-values. The same will probably work for you. If you are pulling the data from a database, then order by can accomplish this. Otherwise, you can sort the data in the application.
Using this post as a starting point you can see that it is possible to calculate a moving average for a chart using the SQL query that pulls the data from the database.
For example, using this table in my database called mySalesTable
myDate sales myDate sales myDate sales
---------- ------ ---------- ------ ---------- ------
01/01/2015 456 16/01/2015 546 31/01/2015 658
02/01/2015 487 17/01/2015 12 01/02/2015 121
03/01/2015 245 18/01/2015 62 02/02/2015 654
04/01/2015 812 19/01/2015 516 03/02/2015 261
05/01/2015 333 20/01/2015 1 04/02/2015 892
06/01/2015 449 21/01/2015 65 05/02/2015 982
07/01/2015 827 22/01/2015 15 06/02/2015 218
08/01/2015 569 23/01/2015 656 07/02/2015 212
09/01/2015 538 24/01/2015 25 08/02/2015 312
10/01/2015 455 25/01/2015 549 09/02/2015 21
11/01/2015 458 26/01/2015 261
12/01/2015 542 27/01/2015 21
13/01/2015 549 28/01/2015 21
14/01/2015 432 29/01/2015 61
15/01/2015 685 30/01/2015 321
You can pull out this data, and create a Moving average based on the last 5 dates by using the following query for your dataset
SELECT mst.myDate, mst.sales, avg(mst_past.sales) AS moving_average
FROM mySalesTable mst
JOIN mySalesTable as mst_past
ON mst_past.myDate
BETWEEN DATEADD(D, -4, mst.myDate) AND mst.myDate
GROUP BY mst.myDate, mst.sales
ORDER BY mst.myDate ASC
This is effectively joining a sub-table for each row consisting of the previous 4 dates and the current date, and finds the average for these dates, outputting that as the column moving_average
You can then chart both these fields as normal, to give the following output (with the data table so you and see the actual calculated moving average)
Hopefully this will help you. Please let me know if you require further assistance

Creating multiple new columns using one function in Pandas

I know that this is probably very simple but I have been trying to work this out for a while, and I need it for quite a few functions.
I have a DataFrame with 2 columns, both with share price data.
I would like to compute 2 new columns in a new dataframe called 'returns', with each column named as the same as in the first (i.e. 'AAPL' and 'GOOG').
I use this procedure to get the original data and create the 'data' dataframe:
names = ['AAPL', 'GOOG']
def get_data(stock, start, end):
return web.get_data_yahoo(stock, start, end)['Adj Close']
data = pd.DataFrame({n: get_data(n, '1/1/2009', '6/1/2012') for n in names})
I know that the returns could be generate using (from the pandas library):
returns = pd.DataFrame(index=data.index)
returns['*COLUMN A*'] = data['*COLUMN A*'].pct_change()
However I am guessing that I need to use some sort of loop to iterate over either 'names' or the columns but I cannot get anything to work.
Any help would be greatly appreciated. I am sorry if I have been rather vague, but this is my first question and I have searched for 30 minutes through the forum :)
You can use pct_change on the entire df
In [15]: df = DataFrame(np.random.randint(20,size=20).reshape(10,2),
columns=['AAPL','GOOG'],index=date_range('20130101',periods=10))+50
In [16]: df
Out[16]:
AAPL GOOG
2013-01-01 53 54
2013-01-02 66 64
2013-01-03 50 59
2013-01-04 53 57
2013-01-05 67 65
2013-01-06 61 55
2013-01-07 68 52
2013-01-08 64 65
2013-01-09 62 62
2013-01-10 66 50
In [17]: 100*df.pct_change()
Out[17]:
AAPL GOOG
2013-01-01 NaN NaN
2013-01-02 24.528302 18.518519
2013-01-03 -24.242424 -7.812500
2013-01-04 6.000000 -3.389831
2013-01-05 26.415094 14.035088
2013-01-06 -8.955224 -15.384615
2013-01-07 11.475410 -5.454545
2013-01-08 -5.882353 25.000000
2013-01-09 -3.125000 -4.615385
2013-01-10 6.451613 -19.354839

Converting data from matrix to vector format

I have huge set of data in an excel file in the following format
School_id percentage year subject
1100 90 2005 maths
1100 95 2006 maths
1100 81 2005 science
2310 45 2007 biology
I want to convert this data to this format
School_id year maths science biology
1100 2005 90 81
1100 2006 95
23100 2007 45
I dont have any idea for how to do this conversion. Will this be possible with excel or mysql or any other tool? Need some suggestions.
Thanks in advance
Yea so like they said pivot tables. To get your results, I had to do it using CONCATENATE function. There might be a better way. But this is how I did it:
first do a CONCATENATE column:
Then insert your pivot table and select the right options:
Then de-concatenate your School_id and Year
Like I said there might be a better way to do this - but then you would just need to organize your headers the way you want them and you should have what you are looking for. Good Luck.