Cox regression in a subset of rows for many different columns? - function

I have a large dataset, with many columns. Columns 56 to 77 are miRNA expression dividied into tertiles (but only written: 1, 2 or 3). Columns 33:54 is miRNA expression in values (1.453, 3.245, etc). I want to run a Cox regression using tertile 1 and tertile 3, ignoring tertile 2, for each miRNA.
Example:
Model 1: I want a Cox regression with miRNA 1 (column 33), time and event.
Model 2: Cox regression with miRNA 2 (column 34), time and event.
Model 3: Cox regression with miRNA 3 (column 35), time and event.
Etc.
The data for each miRNA column is the three tertiles, and I just one to use tertile 1 and 3.
I tried with lapply. First using subset to obtain tertiles 1 and 3. Then Cox regression. But I get an error. Anyone knows how to do this?
Thanks! :)
The code:
miRNA_tertiles <- DB[56:77]
cox_tert = lapply(miRNA_tertiles, function(x){
new.data = subset(DB, x != 2)
formula = as.formula(paste('Surv(years, AD)~', x))
cox_fit_tert = coxph(formula, data = new.data)
summary(cox_fit_tert)$coefficients[,c(2,3,5)] %>% round(3)
})

Related

Assigning new value to column rows based on another column's value

Sorry for the vague title - I couldn't figure out how to word it.
I have a dataframe containing data of a bike-sharing company. The relevant column rows for this question are start_station_name, start_lat and start_lng. The start_lat and start_lng values for each row differ slightly, so I want to assign each start_station_name a single unique value. The following code finds the average start_lat for each start station:
round(df.groupby('start_station_name')['start_lat'].mean(), 4)
2112 W Peterson Ave 41.9912
351 41.9300
63rd St Beach 41.7810
What I would like to do now is assign 41.9912 where the start_station_name is 2112 W Peterson Ave, 41.9300 to 351, etc..
In your case do transform
df['new'] = df.groupby('start_station_name')['start_lat'].transform('mean').round(4)

PostGIS: Finding duplicate label within a radius

I have data in PostGIS that have value and geometry. If there is a same value within let say <10 m, I wanna detect or remove that value from my table. Here is the small example:
create table points (id serial primary key, val integer, label2);
select addGeometryColumn('points', 'geom', 1, 'point', 2);
insert into points (id, val, label2, geom) values
(1, 1, aaa, st_geomFromText('POINT(1 1)', 1)),
(2, 1, bbb, st_geomFromText('POINT(1 2)', 1)),
(3, 1, aaa, st_geomFromText('POINT(10 100)', 1)),
(4, 2, ccc, st_geomFromText('POINT(10 101)', 1));
because of data(id) 1 and 2 has the same value and distance<10m, so there just will be:
id |val| source | geom
-----+------------+------
3 | 1 | aaa | xxx
4 | 2 | ccc | xxx
Do you know how to query that in PostGIS?
First, I would consider what are the real requirements? E.g. consider points on a line with 8 meter distance: A, B, C and equal value. Do you want that to be reduced to A and C, or B? Both eliminate duplicates within 10 meters, but the result is different. What about A, B, C, D - would you like result to be A, C, or B, D, or A, D, or maybe B, C? Defining specific criteria is not trivial, and sometimes is hard to implement in SQL.
Or maybe you don't care, and just want to reduce point density? Then it is simpler, just compute snapped = ST_SnapToGrid with appropriate grid size, and group by equal values of snapped, value and chose arbitrary point from each group. Note that this does not guarantee there are no close points (points with similar coordinates can snap to different grid cells) but it does reduce most duplicates and it is very cheap computationally.

Django Mysql Many 2 Many field IN multiple array (at least one match per array)

My table M2M with this values on Django 1.11:
id, m0, m3
1, 1, 1
2, 2, 1
3, 2, 2
4, 3, 1
5, 3, 2
6, 4, 2
7, 5, 2
8, 5, 3
9, 6, 3
10, 6, 4
I need to select only products matching M3 id (1 OR 2) AND M3 id (3 OR 4) in Django. Expected result is only product with M0 id 5. Because M0 id:5 has M3 id-> 2 and has M3 id-> 3.
How i can achieve this using Q() ?
This query return zero objects. (i need to query M1)
main_query = Q()
main_query &= Q(m0__classifications__in=[1,2])
main_query &= Q(m0__classifications__in=[3,4])
models.M1.objects.filter(main_query)
My models:
class M0(Basic):
classifications = model.ManyToManyField('M3')
...
class M1(Basic):
m0 = model.ForeingKey(M0)
...
class M3(Basic):
...
Any suggestions?
This query return zero objects. (i need to query M1)
That is because you add both Q objects to the same filter, as a result you filter on the same related object. That means that your query basically looks like:
Give me all the M1 objects for which there is a related M0 object with classifications in [1,2] and classifications in [3,4].
We are thus talking about the same M0 object, but classifications can have only one value, and since no elements appear in both lists, indeed we obtain noM1` objects.
If you are looking for two related M1 objects for which the first one has a classification in [1,2] and a (possibly different) related object has a classification in [3,4], you can query with:
models.M1.objects.filter(
m0__classifications__in=[1,2]
).filter(
m0__classifications__in=[3,4]
).distinct()
We thus use two .filter(..) calls here. The .distinct() is used to prevent returning the same M1 object multiple times.
We can use chaining to programmatically add an arbitrary number of sublists, like:
sublists = [[1,2], [3,4]]
qs = M1.objects.all()
for sublist in sublists:
qs = M1.filter(m0__classifications__in=sublist)
qs = qs.distinct()
at the end, qs is here a queryset, that is the same as the one we "manually" constructed above.

SSRS Lookup combined with SUM IIF

I have the following situation (in SQL Server 2008 R2):
. Two datasets (Dataset 1 / Dataset 2)
. Rows (A - H ) <br>
. Columns (2011 - 2012 - 2013 - P12 M)
I recieve the first three columns from dataset 1, the last column (Past 12 Months) with an lookup
on
Column A=Lookup(Fields!A.Value, Fields!A.Value, Fields!Total.Value, "Dataset 2")
So far so good..
The challenging part:
Row B/A = ROW B divided by ROW A
I use this statement:
=SUM(IIF( Fields!A.Value = "B", Fields!Total.Value, 0)) / SUM(IIF( Fields!A.Value = "A", Fields!Total.Value, 0))
But how to get there with an lookup?
How to get to the first question mark (Answer: 2,23)?
I tried to combine the statement with an lookup, so far no result.
Searched the internet and found/tried some URL:
SSRS nested iif expression in lookup
Method 1:
Handling it the way the report is designed currently.
You would need to explicitly give the value in source expression of Lookup.
=Lookup("A", Fields!A.Value, Fields!Total.Value, "Dataset 2")
/Lookup("B", Fields!A.Value, Fields!Total.Value, "Dataset 2")
Method 2:
I would probably redesign the datasets and combine the dataset 1 and dataset 2. It will help remove doing funky logic in the report. Use new dataset to populate the matrix.
New Dataset:
SELECT CAST(Year as varchar(10)), A, Total
FROM DataSet1_Table
UNION ALL
SELECT 'L12M', A, Total
FROM DataSet2_Table

Series Grouping SSRS

I have the following data in the dataset
Key Assignee Sev InOps InTek
1 A 1 Y Y
2 B 2 Y N
3 C 3 N Y
Need to plot the chart as follows so that I get
Sev on X Axis
Count(Key) on Y
Assignee belongs to Ops (Y) as Ops bar
Assignee belongs to Tek(Y) as Tek bar -For each severity we will have two bars then , one for Ops and another for Tek
which will show as follows
Sev 1 Ops Tek
1 1
Sev2 1 0
Sev3 0 1
I have the chart configuration done as follows
In Count I have dragged the Key column
In Category group I have the Sev column
in the series group , do I need to put two series opscolumn and tek respectively ?
The simplest way to do this, if possible, would be to pivot the data when generating the Dataset, i.e. having something like this:
From here it's trivial to create the Chart - Series based on InType, Category based on Severity and the data is a Count of Key.
If you can't pivot your data, create a Chart like this:
The expression for the first Chart Series is:
=Sum(IIf(Fields!InOps.Value = "Y", 1, 0))
and the second:
=Sum(IIf(Fields!InTek.Value = "Y", 1, 0))
It's also useful to set a custom Legend text for each of the Series:
Either way, you get the required result: