How to export a minimum spanning tree created with R ape package in newick format? - minimum-spanning-tree

I have been using the R ape package to create a minimum spanning tree from a distance matrix.
if my distance matrix is, for example:
distmat
sample1 sample2 sample3 sample4 sample5
sample1 0 5 4 4 6
sample2 5 0 1 1 3
sample3 4 1 0 0 2
sample4 4 1 0 0 2
sample5 6 3 2 2 0
I'm using ape package mst function to calculate a minimum spanning tree:
MST = ape::mst(distmat)
To get branch lengths, I'm using:
MSTwlength = MST
MSTwlength [MST >0] <- distmat[MST >0]
MSTwlength
sample1 sample2 sample3 sample4 sample5
sample1 0 0 4 0 0
sample2 0 0 1 0 0
sample3 4 1 0 0 2
sample4 0 0 0 0 0
sample5 0 0 2 0 0
attr(,"class")
[1] "mst"
Now I want to export this minimum spanning tree to a Newick format. Any ideas how to do that?
I have searched APE and TreeTools manuals and tutorials and found nothing.
Thank you,
Mor

Related

Bar chart from many variables where varx = in Stata

I have a bar chart question here. Given that for all the variables in the dataset 1 = yes and 0 = No. I would like to plot a bar graph with the percentages (where var=1) on the y-axis and the variables on the x axis. Thanks in advance.
Dataset
Water
Ice
Fire
Vapor
1
1
0
1
1
0
0
1
0
1
1
1
1
1
1
1
1
1
0
1
1
1
1
0
0
1
1
1
0
1
0
1
0
1
1
1
1
0
1
1
0
1
0
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
1
1
0
1
0
1
1
0
1
1
1
0
1
0
1
1
0
1
1
0
0
1
0
1
1
1
1
1
0
1
1
0
0
1
0
1
1
1
The percent of 1s in a (0, 1) variable is just the mean multiplied by 100. As you probably want to see the percent as text on the graph, one method is to clone the variables and multiply each by 100.
You could then use graph bar directly as it defaults to showing means. I don't like its default in this case and the code instead uses statplot, which must be installed before you can use it.
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(water ice fire vapor)
1 1 0 1
1 0 0 1
0 1 1 1
1 1 1 1
1 1 0 1
1 1 1 0
0 1 1 1
0 1 0 1
0 1 1 1
1 0 1 1
0 1 0 0
0 1 1 0
1 0 1 0
1 0 1 0
1 1 1 1
0 1 0 1
1 0 1 1
1 0 1 0
1 1 0 1
1 0 0 1
0 1 1 1
1 1 0 1
1 0 0 1
0 1 1 1
end
quietly foreach v of var water-vapor {
clonevar `v'2 = `v'
label var `v'2 "`v'"
replace `v'2 = 100 * `v'
}
* ssc install statplot
statplot *2 , recast(bar) ytitle(%) blabel(bar, format(%2.1f))
Try
. ssc install mylabels
checking mylabels consistency and verifying not already installed...
all files already exist and are up to date.
. sysuse nlsw88, clear
(NLSW, 1988 extract)
. mylabels 0(10)70, myscale(#/100) local(labels)
0 "0" .1 "10" .2 "20" .3 "30" .4 "40" .5 "50" .6 "60" .7 "70"
. graph bar (mean) married collgrad south union, showyvars legend(off) nolabel bargap(20) ylabel(`labels')
. table, statistic(mean married collgrad south union)
------------------------------
Married | .6420303
College graduate | .2368655
Lives in the south | .4194123
Union worker | .2454739
------------------------------
This relies on mylabels, and implements the bar gap (which I also like).

Scrapy - how to index and extract from html tables

This is the webpage I am scraping: http://laxreports.sportlogiq.com/nll/GS2200.html
Below is the code for the spider I created:
import scrapy
class MatchesSpider(scrapy.Spider):
name = 'matches'
allowed_domains = ['laxreports.sportlogiq.com']
start_urls = ['http://laxreports.sportlogiq.com/nll/GS2200.html']
def parse(self, response):
tables = response.xpath('//table')
print(tables)
table = tables[0].xpath('//tbody')
I see 22 tables that have been selected for this XPath expression but my problem is that I don't fully understand how to select each individual table and extract its contents.
I am a beginner in scrapy and after searching online for a solution all I see is how to select the tables using the class or ID which in this case is not an option.
You can do that using only pandas
Code:
import pandas as pd
dfs = pd.read_html('https://laxreports.sportlogiq.com/nll/GS2200.html')
df = dfs[10]#.to_csv('d.csv', index = False)
print(df)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12
0 # Name G A +/- PIM S SOFF LB T CT FO TOF
1 2 W.Malcom 0 0 0 0 1 1 1 4 0 - 11:28
2 3 T.Edwards 0 0 -2 2 0 0 8 1 2 7-18 20:28
3 4 J.Sullivan 0 0 -3 2 0 0 3 0 0 - 15:29
4 11 T.Stuart 0 0 -3 0 0 0 4 1 1 - 21:09
5 14 W.Jeffrey 0 1 -1 0 0 0 9 2 1 - 19:17
6 16 R.Lee 2 1 2 0 9 4 6 6 1 - 23:13
7 17 C.Wardle 2 0 1 2 5 3 4 2 2 - 20:55
8 18 R.Hope (A) 0 0 -2 2 0 0 11 0 0 - 22:02
9 20 J.Ruest 3 2 3 0 8 1 3 2 0 - 24:16
10 23 J.Gilles 0 0 -1 0 0 0 4 0 3 - 14:44
11 27 S.Carnegie 0 0 -1 0 0 0 3 0 0 - 12:19
12 37 D.Coates (C) 0 0 0 0 1 0 1 0 0 1-1 2:31
13 51 E.McLaughlin 0 5 2 0 7 3 5 7 0 - 21:41
14 55 D.Kinnear 0 1 2 0 2 0 2 1 0 0-2 10:14
15 67 K.Killen 1 1 0 0 6 1 4 2 0 - 16:42
16 82 J.Cupido (A) 0 1 -1 0 3 0 4 1 0 - 20:52
17 86 J.Lintz 0 1 -1 0 0 0 4 0 1 - 19:26
18 30 T.Carlson 0 0 NaN 0 0 0 0 0 0 - NaN
19 45 D.Ward 0 0 NaN 0 0 0 0 1 0 - NaN
20 NaN Totals: 8 13 NaN 8 42 13 76 30 11 8-21 NaN

Find record less than or equals 0 but no repeat record

I have 2 tables, 1 first look this:
Table state_inventary
ID_STATE_INVENTARY, DESCRIPTION
0 STORE
1 TRANSIT
2 SOLD_STORE
3 STORAGE
Table article_stock
ID_STOCK ID_ORIGIN ARTICLE UNIT_SOLD ID_STATE_INVENTARY
0 1 A 10 0
1 2 A 0 1
2 1 B 5 2
3 3 C 0 3
4 4 D 0 3
5 5 E 10 1
6 2 A 0 2
7 1 B 0 2
I need to find articles with ID_STATE_INVENTORY with value 0 or 2 or 3, I get it
But I need to find articles with the UNIT_SOLD the sum is zero, I don't know how do these
I want to find somthing like that
ID_STOCK ID_ORIGIN ARTICLE UNIT_SOLD ID_STATE_INVENTARY
3 3 C 0 3
4 4 D 0 3
OR
ARTICLE
C
D
In my query I have next result
ID_STOCK ID_ORIGIN ARTICLE UNIT_SOLD ID_STATE_INVENTARY
1 2 A 0 1
3 3 C 0 3
4 4 D 0 3
6 2 A 0 2
7 1 B 0 2
Anyone idea how can I do?
Try this.
SELECT SUM(`UNIT_SOLD`) AS `UNIT_SOLD`, `ARTICLE` FROM `table_name` GROUP BY `ARTICLE`;

Is there a way to web scrape HTML table data that keeps showing up as "" when using rvest tools?

<td headers="apcl1" data-dyn="1" class="text-center">1<span class="hidden"> authorized course</span></td>
<td headers="apcl2" data-dyn="2" class="text-center">1<span class="hidden"> authorized course</span></td>
<td headers="apcl3" data-dyn="3" class="text-center">1<span class="hidden"> authorized course</span></td>
<td headers="apcl4" data-dyn="4" class="text-center">--<span class="hidden"> no authorized courses</span></td>
For the above HTML code, I am trying to scrape the data in the td tag between > and < span (i.e., 1, 1, 1, --).
I am using R and the rvest package and my code is below:
individual_temp_url <- "https://apcourseaudit.inflexion.org/ledger/school.php?a=MTQ4Mzk=&b=MA=="
read_html(individual_temp_url) %>%
html_nodes('td') %>%
html_text()
However, when I do this, all I get is "" for each of the td tags. Looking for help to extract the numbers for each td tag?
The td elements are blank on the html you download. In the browser, they are populated by javascript after the page loads, from a JSON included in one of the page's script tags. You can extract this and parse the JSON to get a nice data frame:
library(rvest)
#> Loading required package: xml2
individual_temp_url <- "https://apcourseaudit.inflexion.org/ledger/school.php?a=MTQ4Mzk=&b=MA=="
df <- read_html(individual_temp_url) %>%
html_nodes('script') %>%
html_text() %>%
`[`(4) %>%
strsplit("dataSet = |\r\n|;") %>%
unlist() %>%
`[`(3) %>%
jsonlite::fromJSON()
df
#> data data data data data data data data data
#> 1 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16
#> 2 0 0 0 0 0 1 1 1 1
#> 3 2 2 2 2 2 2 2 2 2
#> 4 3 3 3 3 3 2 2 4 3
#> 5 1 1 1 1 1 1 1 1 2
#> 6 2 3 2 2 2 2 2 2 2
#> 7 1 1 1 1 1 1 1 1 1
#> 8 0 0 0 0 0 0 0 0 0
#> 9 1 1 1 1 1 1 1 1 1
#> 10 1 1 1 1 1 1 1 1 1
#> 11 1 1 1 1 1 2 2 3 1
#> 12 0 0 2 2 2 2 2 2 1
#> 13 0 0 1 1 1 1 1 1 1
#> 14 0 0 0 0 0 1 1 1 0
#> 15 0 0 0 0 1 1 1 1 1
#> 16 0 0 0 0 0 0 0 2 2
#> 17 0 0 0 0 0 0 0 0 1
#> 18 0 0 0 0 0 2 2 0 0
#> 19 0 0 0 0 0 0 0 0 0
#> 20 1 1 1 1 1 1 2 2 2
#> 21 1 1 1 1 1 1 1 1 1
#> 22 1 1 1 1 1 1 1 1 1
#> 23 1 1 1 1 1 2 2 2 2
#> 24 1 2 2 1 1 1 1 1 1
#> 25 2 3 4 2 1 1 1 1 2
#> 26 2 3 3 2 1 2 1 1 2
#> data data data data
#> 1 2016-17 2017-18 2018-19 2019-20
#> 2 1 1 1 0
#> 3 2 2 2 1
#> 4 0 0 1 2
#> 5 0 0 0 2
#> 6 2 2 2 1
#> 7 1 1 1 1
#> 8 1 1 1 1
#> 9 1 1 1 1
#> 10 1 2 2 1
#> 11 1 1 1 1
#> 12 2 2 2 2
#> 13 1 1 1 1
#> 14 0 0 0 0
#> 15 1 1 1 1
#> 16 2 2 2 1
#> 17 0 1 1 0
#> 18 0 0 0 0
#> 19 0 0 1 1
#> 20 0 0 1 1
#> 21 1 1 1 1
#> 22 0 0 1 0
#> 23 2 2 2 2
#> 24 1 1 0 1
#> 25 2 2 3 3
#> 26 0 0 1 1
Created on 2020-03-07 by the reprex package (v0.3.0)

mysql parent and child with level

I have table structure like
id | parent_id
----------
1 0
2 0
3 0
4 0
5 1
6 1
7 2
8 2
9 5
10 7
This table has unlimited parent child relation
I want the end result as given below
id | parent_id | level
---------------------------
1 0 0
2 0 0
3 0 0
4 0 0
5 1 1
6 1 1
7 2 1
8 2 1
9 5 2
10 7 2
Can anybody help with suggestion?
Refer the following article:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
From: The Nested Set Model
To: Finding the Depth of the Nodes
You shall arrive at the complete solution :)