Getting the font for a specified style - officer

If I load up the default word document in officer and look at the styles that are available:
library("officer")
my_doc <- read_docx()
styles_info(my_doc)
I can see the following styles are available:
style_type style_id style_name is_custom is_default
1 paragraph Normal Normal FALSE TRUE
2 paragraph Titre1 heading 1 FALSE FALSE
3 paragraph Titre2 heading 2 FALSE FALSE
4 paragraph Titre3 heading 3 FALSE FALSE
5 character Policepardfaut Default Paragraph Font FALSE TRUE
6 table TableauNormal Normal Table FALSE TRUE
7 numbering Aucuneliste No List FALSE TRUE
8 character strong strong TRUE FALSE
9 paragraph centered centered TRUE FALSE
10 table tabletemplate table_template TRUE FALSE
11 table Listeclaire-Accent2 Light List Accent 2 FALSE FALSE
12 character Titre1Car Titre 1 Car TRUE FALSE
13 character Titre2Car Titre 2 Car TRUE FALSE
14 character Titre3Car Titre 3 Car TRUE FALSE
15 paragraph graphictitle graphic title TRUE FALSE
16 paragraph tabletitle table title TRUE FALSE
17 table Professionnel Table Professional FALSE FALSE
18 paragraph TM1 toc 1 FALSE FALSE
19 paragraph TM2 toc 2 FALSE FALSE
20 paragraph Textedebulles Balloon Text FALSE FALSE
21 character TextedebullesCar Texte de bulles Car TRUE FALSE
22 character referenceid reference_id TRUE FALSE
Is it possible for me to get the font for a given style. For example how would I go about getting the font associated with the Normal style?

Related

Append information in the th tags to td rows

I am an economist struggling with coding and data scraping.
I am scarping data from the main and unique table on this webpage (https://www.oddsportal.com/basketball/europe/euroleague-2013-2014/results/). I can retrieve all the information of the td HTML tags with python selenium by referring to the class element. The same goes for the th tag where it is stored the information of the date and stage of the competition. In my final dataset, I would like to have the information stored in the th tag in two rows (data and stage of the competition) next to the other rows in the table. Basically, for each match, I would like to have the date and the stage of the competition in rows and not as the head of each group of matches.
The only solution I came up with is to index all the rows (with both th and td tags) and build a while loop to append the information in the th tags to the td rows whose index is lower than the next index for the th tag. Hope I made myself clear (if not I will try to give a more graphical explanation). However, I am not able to code such a logic construct due to my poor coding abilities. I do not know if I need two loops to iterate through different tags (td and th) and in case how to do that. If you have any easier solution, it is more than welcome!
Thanks in advance for the precious help!
code below:
from selenium import webdriver
import time
import pandas as pd
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define empty data
data_keys = ["Season", "Match_Time", "Home_Team", "Away_Team", "Home_Odd", "Away_Odd", "Home_Score",
"Away_Score", "OT", "N_Bookmakers"]
data = dict()
for key in data_keys:
data[key] = list()
del data_keys
# Define 'driver' variable and launch browser
#path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
#path office pc
path = "C:/Users/aldi/Downloads/chromedriver.exe"
driver = webdriver.Chrome(path)
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
# Teams
for el in driver.find_elements_by_class_name('name.table-participant'):
el = el.text.strip().split(" - ")
data["Home_Team"].append(el[0])
data["Away_Team"].append(el[1])
data["Season"].append(season_filt)
# Scores
for el in driver.find_elements_by_class_name('center.bold.table-odds.table-score'):
el = el.text.split(":")
if el[1][-3:] == " OT":
data["OT"].append(True)
el[1] = el[1][:-3]
else:
data["OT"].append(False)
data["Home_Score"].append(el[0])
data["Away_Score"].append(el[1])
# Match times
for el in driver.find_elements_by_class_name("table-time"):
data["Match_Time"].append(el.text)
# Odds
i = 0
for el in driver.find_elements_by_class_name("odds-nowrp"):
i += 1
if i%2 == 0:
data["Away_Odd"].append(el.text)
else:
data["Home_Odd"].append(el.text)
# N_Bookmakers
for el in driver.find_elements_by_class_name("center.info-value"):
data["N_Bookmakers"].append(el.text)
# TODO think of inserting the dates list in the dataframe even if it has a different size (19 rows and not 50)
except:
pass
driver.quit()
data = pd.DataFrame(data)
data.to_csv("data_odds.csv", index = False)
I would like to add this information to my dataset as two additional rows:
for el in driver.find_elements_by_class_name("first2.tl")[1:]:
el = el.text.strip().split(" - ")
data["date"].append(el[0])
data["stage"].append(el[1])
Few things I would change here.
Don't overwrite variables. You store elements in your el variable, then you over write the element with your strings. It may work for you here, but you may get yourself into trouble with that practice later on, especially since you are iterating through those elements. It makes it hard to debug too.
I know Selenium has ways to parse the html. But I personally feel BeautifulSoup is a tad easier to parse with and is a little more intuitive if you are simply just trying to pull out data from the html. So I went with BeautifulSoup's .find_previous() to get the tags that precede the games, essentially then able to get your date and stage content.
Lastly, I like to construct a list of dictionaries to make up the data frame. Each item in the list is a dictionary key:value where the key is the column name and value is the data. You sort of do the opposite in creating a dictionary of lists. Now there is nothing wrong with that, but if the lists don't have the same length, you're get an error when trying to create the dataframe. Where as with my way, if for what ever reason there is a value missing, it will still create the dataframe, but will just have a null or nan for the missing data.
There may be more work you need to do with the code to go through the pages, but this gets you the data in the form you need.
Code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import pandas as pd
from bs4 import BeautifulSoup
import re
# Season to filter
seasons_filt = ['2013-2014', '2014-2015', '2015-2016','2016-2017', '2017-2018', '2018-2019']
# Define 'driver' variable and launch browser
path = "C:/Users/ALESSANDRO/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(path)
rows = []
# Loop through pages based on page_num and season
for season_filt in seasons_filt:
page_num = 0
while True:
page_num += 1
# Get url and navigate it
page_str = (1 - len(str(page_num)))* '0' + str(page_num)
url ="https://www.oddsportal.com/basketball/europe/euroleague-" + str(season_filt) + "/results/#/page/" + page_str + "/"
driver.get(url)
time.sleep(3)
# Check if page has no data
if driver.find_elements_by_id("emptyMsg"):
print("Season {} ended at page {}".format(season_filt, page_num))
break
try:
soup = BeautifulSoup(driver.page_source, 'html.parser')
table = soup.find('table', {'id':'tournamentTable'})
trs = table.find_all('tr', {'class':re.compile('.*deactivate.*')})
for each in trs:
teams = each.find('td', {'class':'name table-participant'}).text.split(' - ')
scores = each.find('td', {'class':re.compile('.*table-score.*')}).text.split(':')
ot = False
for score in scores:
if 'OT' in score:
ot == True
scores = [x.replace('\xa0OT','') for x in scores]
matchTime = each.find('td', {'class':re.compile('.*table-time.*')}).text
# Odds
i = 0
for each_odd in each.find_all('td',{'class':"odds-nowrp"}):
i += 1
if i%2 == 0:
away_odd = each_odd.text
else:
home_odd = each_odd.text
n_bookmakers = soup.find('td',{'class':'center info-value'}).text
date_stage = each.find_previous('th', {'class':'first2 tl'}).text.split(' - ')
date = date_stage[0]
stage = date_stage[1]
row = {'Season':season_filt,
'Home_Team':teams[0],
'Away_Team':teams[1],
'Home_Score':scores[0],
'Away_Score':scores[1],
'OT':ot,
'Match_Time':matchTime,
'Home_Odd':home_odd,
'Away_Odd':away_odd,
'N_Bookmakers':n_bookmakers,
'Date':date,
'Stage':stage}
rows.append(row)
except:
pass
driver.quit()
data = pd.DataFrame(rows)
data.to_csv("data_odds.csv", index = False)
Output:
print(data.head(15).to_string())
Season Home_Team Away_Team Home_Score Away_Score OT Match_Time Home_Odd Away_Odd N_Bookmakers Date Stage
0 2013-2014 Real Madrid Maccabi Tel Aviv 86 98 False 18:00 -667 +493 7 18 May 2014 Final Four
1 2013-2014 Barcelona CSKA Moscow 93 78 False 15:00 -135 +112 7 18 May 2014 Final Four
2 2013-2014 Barcelona Real Madrid 62 100 False 19:00 +134 -161 7 16 May 2014 Final Four
3 2013-2014 CSKA Moscow Maccabi Tel Aviv 67 68 False 16:00 -278 +224 7 16 May 2014 Final Four
4 2013-2014 Real Madrid Olympiacos 83 69 False 18:45 -500 +374 7 25 Apr 2014 Play Offs
5 2013-2014 CSKA Moscow Panathinaikos 74 44 False 16:00 -370 +295 7 25 Apr 2014 Play Offs
6 2013-2014 Olympiacos Real Madrid 71 62 False 18:45 +127 -152 7 23 Apr 2014 Play Offs
7 2013-2014 Maccabi Tel Aviv Olimpia Milano 86 66 False 17:45 -217 +179 7 23 Apr 2014 Play Offs
8 2013-2014 Panathinaikos CSKA Moscow 73 72 False 16:30 -106 -112 7 23 Apr 2014 Play Offs
9 2013-2014 Panathinaikos CSKA Moscow 65 59 False 18:45 -125 +104 7 21 Apr 2014 Play Offs
10 2013-2014 Maccabi Tel Aviv Olimpia Milano 75 63 False 18:15 -189 +156 7 21 Apr 2014 Play Offs
11 2013-2014 Olympiacos Real Madrid 78 76 False 17:00 +104 -125 7 21 Apr 2014 Play Offs
12 2013-2014 Galatasaray Barcelona 75 78 False 17:00 +264 -333 7 20 Apr 2014 Play Offs
13 2013-2014 Olimpia Milano Maccabi Tel Aviv 91 77 False 18:45 -286 +227 7 18 Apr 2014 Play Offs
14 2013-2014 CSKA Moscow Panathinaikos 77 51 False 16:15 -303 +247 7 18 Apr 2014 Play Offs

What is the color palette used by igraph?

My reproducible example is the following:
get.vertex.attribute(g)
$name
[1] "LV" "Ve" "Ca" "Ai" "BN" "EN" "Or" "So" "SG" "Bo" "AX" "Sa" "To" "Pe" "Da" "He" "VI" "Ke" "Va" "At" "Ac" "Mi"
[23] "Cr" "Le" "Pu" "Re" "Te" "C." "N." "Y." "M." "D." "F." "L." "P." "S." "B." "J." "I." "A." "H." "R." "E." "O."
$color
[1] 1 1 1 1 1 2 3 1 1 3 1 3 3 3 1 4 3 5 3 1 1 6 2 6 1 3 3 1 1 1 1 3 1 2 3 1 5 1 2 3 3 4 3 6
In my case, the following code:
library("igraph")
vertices<-data.frame("name" = unique(unlist(relations)))
g = graph.data.frame(relations, directed=F, vertices=vertices)
vertices$group = edge.betweenness.community(g)$membership
V(g)$color <- vertices$group
plot(g,layout=layout.auto,vertex.size=6, vertex.label.cex = 0.8)
gives this graph:
where the color 1 seems to be orange, 2 is light blue, etc...
yet
palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
>
So what is the color palette used by igraph?
I am curious because I would like to use it in another package that only takes names of colors as input and doesn't seem to recognize the V(g)$color vector as a candidate for input (ie outputs only black).
The short answer is categorical_pal(8).
Full Story
If you look at the help page ?igraph.plotting and search on palette you will find.
palette
The color palette to use for vertex color. The default is
categorical_pal, which is a color-blind friendly categorical palette.
See its manual page for details and other palettes.
The help page ?categorical_pal says:
This is a color blind friendly palette from
http://jfly.iam.u-tokyo.ac.jp/color. It has 8 colors.
We can make a quick demonstration of this.
library(igraph)
x = 1:8
y = rep(1,8)
plot(x,y, pch=20, cex=10, col=categorical_pal(8), xlim=c(0.5,8.5))

SSRS running total on percentage, show based on only every 10th value

In SSRS report I have a field with percentage values from which I have calculated cumulative running total. On the third field user wants to see only values which are closest to every ten value and blank out everything else.
So in the example we show 8 as cumulative value, as it is closest to 10. For the second value we choose 20 as it is closest value to 20. For the third we take 32, closest to 30. then 40, 52 , 62, 73, 79, 91
% cumulative val showed values
3 3
5 8 8
6 14
3 17
2 19
1 20 20
4 24
3 27
5 32 32
7 39
1 40 40
2 42
2 44
3 47
5 52 52
2 54
3 57
1 58
4 62 62
3 65
1 66
7 73 73
2 75
1 76
3 79 79
4 83
2 85
3 88
1 89
2 91 91
I have tried to use the solution with different set of records and that's what I can see
Result Set
Whenever I find questions like this in reporting-services tag I think about the possibilities users could have if there were something like a NEXT() function, contrary to the PREVIOUS() supported function.
Okay, I vented.
I recreated the table you posted in the question and added row number to the dataset.
Go to Report menu / Report Properties / Code tab, in the text area put the following code.
Dim running_current as Integer = 0
Dim flag as Integer = 0
Public Function Visible(ByVal a as Integer,ByVal b as Integer) As String
running_current = running_current + a
Dim running_next as Integer = running_current + b
Dim a_f as Integer = CInt(Math.ceiling(running_current / 10.0)) * 10
Dim b_f as Integer = CInt(Math.ceiling(running_next / 10.0)) * 10
if flag = 1 Then
flag = 0
return "true"
End If
If a_f = b_f Then
return "false"
Else
IF Closest(running_current,running_next) = "false" Then
flag = 1
return "false"
End If
return "true"
End if
End Function
Public Function Closest(ByVal a as Integer, ByVal b as Integer) as String
Dim target as Integer = CInt(Math.ceiling(a / 10.0)) * 10
IF Math.abs(a-target)>= Math.abs(b-target) Then
return "false"
Else
return "true"
End IF
End Function
This function compares every value with the next based on the row number in order to determine if it must be showed or not. If the value must be showed it returns true, otherwise false string is returned.
You have to pass the value to the function and the next to it, based on the row number. The lookup() function plus row number give us a similar behaviour to a NEXT() function.
=IIf(
Code.Visible(Fields!Value.Value,
Lookup(Fields!RowNumber.Value+1,Fields!RowNumber.Value,Fields!Value.Value,"DataSet11")),
ReportItems!Textbox169.Value,
Nothing
)
For the first row it will pass 3 and 5 to determine if 3 must be showed, then 5 and 6 to determine if the cumulative value must be showed, and so on.
I've created another column as your example, and used the above expression. It says if the function Code.Visible function returns true show the Textbox169 (cumulative value column) otherwise show Nothing.
For Running column I've used the typical running value expression:
=RunningValue(Fields!Value.Value,Sum,"DataSet11")
The result is something like this:
Let me know if this helps.

Error in twFromJSON(out) - R twitteR package

I have searched extensively online but am still unable to find a work-around for the following error whilst using the 'twitteR' package in R:
"Error in twFromJSON(out) :
Error: Malformed response from server, was not JSON.
The most likely cause of this error is Twitter returning a character which
can't be properly parsed by R. Generally the only remedy is to wait long
enough for the offending character to disappear from searches (e.g. if
using searchTwitter())."
It comes after running the following code:
# Clear the previously used libraries
rm(list=ls())
#Set Working Directory
setwd("C:/Users/toshiba/Google Drive/Programming/Projects/HDS/Shiny/Twitter")
#Load Libraries
library (twitteR)
library (RJSONIO)
library (dismo)
library (maps)
library (ggplot2)
library (XML)
load("twitteR_credentials")
registerTwitterOAuth(twitCred)
##############################Start App########################################
start_date = '2014-10-10'
end_date = toString(as.Date(start_date)+1)
#Search for tweets containing ebola - only goes back 8 days including today
ebolaTweets <- searchTwitter("ebola",
n = 1250,
since=start_date,
until=end_date,
cainfo="cacert.pem")
tweetFrame <- twListToDF(ebolaTweets) # convert to dataframe
Is it not possible to somehow skip the offending tweet instead of breaking the loop?
Any help is much appreciated!
Using the current version of twitteR (1.1.8) on Github (which handles authentication much more simply), I have no problems.
library(devtools)
install_github('twitteR', 'geoffjentry')
library (twitteR)
setup_twitter_oauth(consumer_key='blah', consumer_secret='blah',
access_token='blah', access_secret='blah')
# get keys and tokens from apps.twitter.com
start.date <- '2014-10-10'
end.date <- as.character(as.Date(start_date) + 1)
ebolaTweets <- searchTwitter('ebola', 1250, since=start.date, until=end.date)
ebola <- twListToDF(ebolaTweets)
head(ebola)
# text
# 1 RT #ChillVibessonly: Ebola: I'm in the US broom broom\n\nAmerica: Get out me country
# 2 RT #RubenSanchezTW: #Alucinante TVE usa imágenes de un hospital alemán para ilustrar una info sobre el Carlos III http://t.co/5f5okmsjar ht…
# 3 RT #DanLpda: Dejen de hablar del Ébola, me da miedo
# 4 Realiza Colombia estudios a 3 viajeros por temor a ébola: Pese a no presentar síntomas del virus los pasajeros... http://t.co/C5iJmgoQkI
# 5 Que es eso del ebola? Voy a llorar
# 6 RT #MicaMamonde: ebola no te tenemos miedo http://t.co/osgLmAmFnP
# favorited favoriteCount replyToSN created truncated replyToSID id replyToUID
# 1 FALSE 0 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725464223326209 <NA>
# 2 FALSE 0 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725464185581568 <NA>
# 3 FALSE 0 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725463304785921 <NA>
# 4 FALSE 0 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725463270821889 <NA>
# 5 FALSE 1 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725463120244737 <NA>
# 6 FALSE 0 <NA> 2014-10-10 23:59:59 FALSE <NA> 520725463044734976 <NA>
# statusSource screenName retweetCount
# 1 Twitter for Android JaMeseKisses 684
# 2 Twitter for Android patillagrande 2059
# 3 Twitter Web Client AriiFranciscovi 2
# 4 twitterfeed ColnRos 0
# 5 Twitter for iPhone ZoeVignieri 0
# 6 Twitter Web Client felicitasalbano 10
# isRetweet retweeted longitude latitude
# 1 TRUE FALSE <NA> <NA>
# 2 TRUE FALSE <NA> <NA>
# 3 TRUE FALSE <NA> <NA>
# 4 FALSE FALSE <NA> <NA>
# 5 FALSE FALSE <NA> <NA>
# 6 TRUE FALSE <NA> <NA>

Read_json populates with empty lists; how to remove those rows

I've got a Pandas dataframe created with pd.read_json(). When I read it in, I get a few cells that have just an empty list or None, and I want to detect the rows with those [], None in certain columns. For example:
feat 1 feat 2 feat 3
0 [] [] 5
1 6 8 3
2 None 10 NaN
I want to remove rows 0 and 2 because they have None/NaN/empty lists. How can I do this with Pandas?
You can applymap the [] and None to NaN:
Note: replace works for the None but not the []... this solution seems to be a little sensitive (hence the use of negation ~)...
In [11]: df.applymap(lambda x: x == [] or x is None)
Out[11]:
feat 1 feat 2 feat 3
0 True True False
1 False False False
2 True False False
In [12]: df.where(~df.applymap(lambda x: x == [] or x is None))
Out[12]:
feat 1 feat 2 feat 3
0 NaN NaN 5
1 6 8 3
2 NaN 10 NaN