importing a table from a webpage using pd.read_html - html

I am trying to use pd.read_html to import the table under "Daily Observations" from https://www.wunderground.com/history/monthly/us/mi/ann-arbor/date/2020-1
I tried this but the error of "HTTPError: HTTP Error 403: Forbidden" showed.
Jan = pd.read_html('https://www.wunderground.com/history/monthly/us/mi/ann-arbor/date/2020-1')
Alternatively, I copied the source code of the table and save it as an html file.
The html file looks like this:
https://i.stack.imgur.com/YTkF9.jpg
When I use pd.read_html to import this html file, it seems like the imported dataset is not a dataframe. The rows and columns gone messy like this:
[ Time \
0 Jan 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
1 Jan
2 1
3 2
4 3
.. ...
220 0.02
221 0.00
222 0.00
223 0.00
224 0.00
Temperature (° F) \
0 Max Avg Min 38 30.8 26 47 41.8 35 45 42.8 39 3...
1 NaN
2 NaN
3 NaN
4 NaN
.. ...
220 NaN
221 NaN
222 NaN
223 NaN
224 NaN
How can I solve it?

Related

Remove uneven blank spaces from a txt file in pig

I have a text file with uneven blank spaces and I want to store it as a csv file using pig.My file is of the format
2013 210 0 2878 -7543 4 29 20 116
2013 210 10 2875 -7538 4 32 20 116
2013 210 20 2872 -7533 4 29 20 116
2013 210 30 2870 -7527 4 29 20 115
2013 210 40 2867 -7522 4 30 20 115
2013 210 50 2864 -7516 4 29 20 115
2013 210 60 2861 -7511 4 29 20 115
If you are having uneven spaces, read the values as a single line first then squeeze the data with a regular expression then use STRSPLIT to split the single space separated data.
text_data = load 'file.txt' as line;
squeezed_data = foreach text_data generate REPLACE(line, '\\s+', ' ');

How to scrape using rvest in pages with multiple tables

I'm trying to scrape the data from every table at the hockey-reference awards page. I can scrape the first table for the Hart Memorial Trophy, but when I try the rest of them, I end up with empty vectors. I used Selector Gadget and the rvest package to produce the following code.
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
byng_node<-html_nodes(byng, "#byng_stats .right , #byng_stats a")
byng_text<-html_text(byng_node)
However, once I run this code, I get no data in the byng variables:
> byng_node
{xml_nodeset (0)}
> byng_text
character(0)
What's happening here? Does selector gadget not work for pages with multiple tables? Does it have nothing to do with that and there's something HTMLy I don't understand? Any help is greatly appreciated!
#neilfws was right: if you look at the source code of the HTML page, you see that all but the first table are commented so rvest thinks they are comments, not part of source code itself. Let's do a dirty hack and remove these characters that are used to comment our precious tables:
library(rvest)
url="https://www.hockey-reference.com/awards/voting-2017.html"
byng<-read_html(url)
# Remove commenting sequences
byng <- gsub("<!--", "", byng)
byng <- gsub("-->", "", byng)
byng<-read_html(byng)
#Get tables as a list of dataframes
tables <- html_table(byng)
# Last table
tables[7]
[[1]]
Scoring Scoring Scoring Scoring Goalie Stats Goalie Stats
1 Place Player Age Tm Pos Votes Vote% 1st 2nd 3rd 4th 5th G A PTS +/- W L
2 1 Connor McDavid 20 EDM C 762 94.07 141 18 3 0 0 30 70 100 27
3 2 Sidney Crosby 29 PIT C 526 64.94 20 142 0 0 0 44 45 89 17
4 3 Nicklas Backstrom 29 WSH C 127 15.68 1 2 116 0 0 23 63 86 17
5 4 Mark Scheifele 23 WPG C 21 2.59 0 0 21 0 0 32 50 82 18
6 5 Auston Matthews 19 TOR C 10 1.23 0 0 10 0 0 40 29 69 2
7 6 Evgeni Malkin 30 PIT C 4 0.49 0 0 4 0 0 33 39 72 18
8 7 John Tavares 26 NYI C 2 0.25 0 0 2 0 0 28 38 66 4
9 8 Jonathan Toews 28 CHI C 1 0.12 0 0 1 0 0 21 37 58 7
10 8 Brad Marchand 28 BOS C 1 0.12 0 0 1 0 0 39 46 85 18
11 8 Ryan Kesler 32 ANA C 1 0.12 0 0 1 0 0 22 36 58 8
12 8 Ryan Getzlaf 31 ANA C 1 0.12 0 0 1 0 0 15 58 73 7

Percentage by Row Group

I have a matrix with rows grouped by Dept (Department). I am trying to get the actual hours / required hours percentage in a column for each row group, but I can only get the total %, not the % by group. Ex:
I should get this:
Total Employee Req Hrs Rep Hrs % Billable hrs % NonBill Hrs % Time Off %
Dept A Total 672 680 101 575 85 140 21 8 1
Emp1 168 170 101 150 89 50 29 0 0
Emp2 168 165 98 120 71 20 12 8 4
Emp3 168 175 104 155 92 20 12 0 0
Emp4 168 170 101 150 89 50 29 0 0
Dept B Total 420 428 102 365 87 80 19 4 .1
Emp5 168 170 101 150 89 50 29 0 0
Emp6 84 84 98 60 71 10 12 4 4
Emp7 168 175 104 155 92 20 12 0 0
G Total 1092 1108 101 940 86 190 17 12 1
But I get this:
Total Employee Req Hrs Rep Hrs % Billable hrs % NonBill Hrs % Time Off %
Dept A Total 1684 1675 101 1250 86 225 17 12 1
Emp1 168 170 101 150 89 50 29 0 0
Emp2 168 165 98 120 71 20 12 8 4
Emp3 168 175 104 155 92 20 12 0 0
Emp4 168 170 101 150 89 50 29 0 0
Dept B Total 1092 1108 101 1250 86 225 17 12 1
Emp5 168 170 101 150 89 50 29 0 0
Emp6 84 84 98 60 71 10 12 4 4
Emp7 168 175 104 155 92 20 12 0 0
G Total 1092 1108 101 940 86 190 17 12 1
The totals are correct but the % is wrong.
I have several Datasets because the report only runs the department you are in, except for the VPs who can see all departments.
I Insert the percentage columns into the matrix and have tried several expressions with no results including:
=Fields!ActHrs.Value/Fields!ReqHrs.Value
=Sum(Fields!ActHrs.Value, "Ut_Query")/Sum(Fields!ReqHrs.Value, "Ut_Query")
=Sum(Fields!ActHrs.Value, "Ut_Query","Dept")/Sum(Fields!ReqHrs.Value,
"Ut_Query","Dept")
=Sum(Fields!ActHrs.Value,"Dept", "Ut_Query")/Sum(Fields!ReqHrs.Value,
"Dept","Ut_Query")
Plus more I can't even remember.
I tried creating new groups, and even a new matrix.
There must be a simple way to get the percentage by group but I have not found an answer on any of the interned boards.
OK, I figured this out, but it doesn't make much sense. If I try:
=Textbox29/TextBox28 I get error messages about undefined variables.
If I go the the textbox properties and rename the textboxes to Act and Req and use:
=Act/Req I get the right answer.

Selecting duplicate entries from a table and getting all the fields from the duplicate rows

I have a table like this.
id day1 day2 day3
1 411 523 223
2 413 554 245
3 417 511 209
4 420 515 232
5 422 522 212
6 483 567 212
7 456 512 256
8 433 578 209
9 438 532 234
10 418 555 223
11 460 510 263
12 453 509 245
13 441 524 233
14 430 543 261
15 456 582 222
16 444 524 241
17 478 511 211
18 421 583 222
I want to select all the IDs that have duplicate values in day2.
I'm doing
select day2,count(*) from resultater group by day having count(*)>1;
Is it possible to list all the IDs within the groups?
select day2,count(*), group_concat(id)
from resultater
group by day
having count(*)>1;
should do the trick.

Code-golf: Output multiplication table to the Console

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I recently pointed a student doing work experience to an article about dumping a multiplication table to the console. It used a nested for loop and multiplied the step value of each.
This looked like a .NET 2.0 approach. I was wondering, with the use of Linq and extension methods,for example, how many lines of code it would take to achieve the same result.
Is the stackoverflow community up to the challenge?
The challenge:
In a console application, write code to generate a table like this example:
01 02 03 04 05 06 07 08 09
02 04 06 08 10 12 14 16 18
03 06 09 12 15 18 21 24 27
04 08 12 16 20 24 28 32 36
05 10 15 20 25 30 35 40 45
06 12 18 24 30 36 42 48 54
07 14 21 28 35 42 49 56 63
08 16 24 32 40 48 56 64 72
09 18 27 36 45 54 63 72 81
As this turned into a language-agnostic code-golf battle, I'll go with the communities decision about which is the best solution for the accepted answer.
There's been alot of talk about the spec and the format that the table should be in, I purposefully added the 00 format but the double new-line was originally only there because I didn't know how to format the text when creating the post!
J - 8 chars - 24 chars for proper format
*/~1+i.9
Gives:
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
This solution found by #earl:
'r(0)q( )3.'8!:2*/~1+i.9
Gives:
01 02 03 04 05 06 07 08 09
02 04 06 08 10 12 14 16 18
03 06 09 12 15 18 21 24 27
04 08 12 16 20 24 28 32 36
05 10 15 20 25 30 35 40 45
06 12 18 24 30 36 42 48 54
07 14 21 28 35 42 49 56 63
08 16 24 32 40 48 56 64 72
09 18 27 36 45 54 63 72 81
MATLAB - 10 characters
a=1:9;a'*a
... or 33 characters for stricter output format
a=1:9;disp(num2str(a'*a,'%.2d '))
Brainf**k - 185 chars
>---------[++++++++++>---------[+<[-<+>>+++++++++[->+>>---------[>-<++++++++++<]<[>]>>+<<<<]>[-<+>]<---------<]<[->+<]>>>>++++[-<++++>]<[->++>+++>+++<<<]>>>[.[-]<]<]++++++++++.[-<->]<+]
cat - 252 characters
01 02 03 04 05 06 07 08 09
02 04 06 08 10 12 14 16 18
03 06 09 12 15 18 21 24 27
04 08 12 16 20 24 28 32 36
05 10 15 20 25 30 35 40 45
06 12 18 24 30 36 42 48 54
07 14 21 28 35 42 49 56 63
08 16 24 32 40 48 56 64 72
09 18 27 36 45 54 63 72 81
Assuming that a trailing newline is wanted; otherwise, 251 chars.
* runs *
Python - 61 chars
r=range(1,10)
for y in r:print"%02d "*9%tuple(y*x for x in r)
C#
This is only 2 lines. It uses lambdas not extension methods
var nums = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
nums.ForEach(n => { nums.ForEach(n2 => Console.Write((n * n2).ToString("00 "))); Console.WriteLine(); });
and of course it could be done in one long unreadable line
new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 }.ForEach(n => { new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 }.ForEach(n2 => Console.Write((n * n2).ToString("00 "))); Console.WriteLine(); });
all of this is assuming you consider a labmda one line?
K - 12 characters
Let's take the rosetta-stoning seriously, and compare Kdb+'s K4 with the canonical J solution (*/~1+i.9):
a*/:\:a:1+!9
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
J's "table" operator (/) equals the K "each-left each-right" (/:\:) idiom. We don't have J's extremely handy "reflexive" operator (~) in K, so we have to pass a as both left and right argument.
Fortran95 - 40 chars (beating perl by 4 chars!)
This solution does print the leading zeros as per the spec.
print"(9(i3.2))",((i*j,i=1,9),j=1,9);end
Oracle SQL, 103 characters:
select n, n*2, n*3, n*4, n*5, n*6, n*7, n*8, n*9 from (select rownum n from dual CONNECT BY LEVEL < 10)
C# - 117, 113, 99, 96, 95 89 characters
updated based on NickLarsen's idea
for(int x=0,y;++x<10;)
for(y=x;y<x*10;y+=x)
Console.Write(y.ToString(y<x*9?"00 ":"00 \n"));
99, 85, 82 81 characters
... If you don't care about the leading zeros and would allow tabs for alignment.
for(int x=0,y;++x<10;)
{
var w="";
for(y=1;++y<10;)
w+=x*y+" ";
Console.WriteLine(w);
}
COBOL - 218 chars -> 216 chars
PROGRAM-ID.P.DATA DIVISION.WORKING-STORAGE SECTION.
1 I PIC 9.
1 N PIC 99.
PROCEDURE DIVISION.PERFORM 9 TIMES
ADD 1 TO I
SET N TO I
PERFORM 9 TIMES
DISPLAY N' 'NO ADVANCING
ADD I TO N
END-PERFORM
DISPLAY''
END-PERFORM.
Edit
216 chars (probably a different compiler)
PROGRAM-ID.P.DATA DIVISION.WORKING-STORAGE SECTION.
1 I PIC 9.
1 N PIC 99.
PROCEDURE DIVISION.
PERFORM B 9 TIMES
STOP RUN.
B.
ADD 1 TO I
set N to I
PERFORM C 9 TIMES
DISPLAY''.
C.
DISPLAY N" "NO ADVANCING
Add I TO N.
Not really a one-liner, but the shortest linq i can think of:
var r = Enumerable.Range(1, 9);
foreach (var z in r.Select(n => r.Select(m => n * m)).Select(a => a.Select(b => b.ToString("00 "))))
{
foreach (var q in z)
Console.Write(q);
Console.WriteLine();
}
In response to combining this and SRuly's answer
Enumberable.Range(1,9).ToList.ForEach(n => Enumberable.Range(1,9).ToList.ForEach(n2 => Console.Write((n * n2).ToString("00 "))); Console.WriteLine(); });
Ruby - 42 Chars (including one linebreak, interactive command line only)
This method is two lines of input and only works in irb (because irb gives us _), but shortens the previous method by a scant 2 charcters.
1..9
_.map{|y|puts"%02d "*9%_.map{|x|x*y}}
Ruby - 44 Chars (tied with perl)
(a=1..9).map{|y|puts"%02d "*9%a.map{|x|x*y}}
Ruby - 46 Chars
9.times{|y|puts"%02d "*9%(1..9).map{|x|x*y+x}}
Ruby - 47 Chars
And back to a double loop
(1..9).map{|y|puts"%02d "*9%(1..9).map{|x|x*y}}
Ruby - 54 chars!
Using a single loop saves a couple of chars!
(9..89).map{|n|print"%02d "%(n/9*(x=n%9+1))+"\n"*(x/9)}
Ruby - 56 chars
9.times{|x|puts (1..9).map{|y|"%.2d"%(y+x*y)}.join(" ")}
Haskell — 85 84 79 chars
r=[1..9]
s x=['0'|x<=9]++show x
main=mapM putStrLn[unwords[s$x*y|x<-r]|y<-r]
If double spacing is required (89 81 chars),
r=[1..9]
s x=['0'|x<=9]++show x
main=mapM putStrLn['\n':unwords[s$x*y|x<-r]|y<-r]
F# - 61 chars:
for y=1 to 9 do(for x=1 to 9 do printf"%02d "(x*y));printfn""
If you prefer a more applicative/LINQ-y solution, then in 72 chars:
[1..9]|>Seq.iter(fun y->[1..9]|>Seq.iter((*)y>>printf"%02d ");printfn"")
c# - 125, 123 chars (2 lines):
var r=Enumerable.Range(1,9).ToList();
r.ForEach(n=>{var s="";r.ForEach(m=>s+=(n*m).ToString("00 "));Console.WriteLine(s);});
C - 97 79 characters
#define f(i){int i=0;while(i++<9)
main()f(x)f(y)printf("%.2d ",x*y);puts("");}}
Perl, 44 chars
(No hope of coming anywhere near J, but languages with matrix ops are in a class of their own here...)
for$n(1..9){printf"%3d"x9 .$/,map$n*$_,1..9}
R (very similar to Matlab on this level): 12 characters.
> 1:9%*%t(1:9)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 2 4 6 8 10 12 14 16 18
[3,] 3 6 9 12 15 18 21 24 27
[4,] 4 8 12 16 20 24 28 32 36
[5,] 5 10 15 20 25 30 35 40 45
[6,] 6 12 18 24 30 36 42 48 54
[7,] 7 14 21 28 35 42 49 56 63
[8,] 8 16 24 32 40 48 56 64 72
[9,] 9 18 27 36 45 54 63 72 81
PHP, 71 chars
for($x=0;++$x<10;print"\n"){for($y=0;++$y<10;){printf("%02d ",$x*$y);}}
Output:
$ php -r 'for($x=0;++$x<10;print"\n"){for($y=0;++$y<10;){printf("%02d ",$x*$y);}}'
01 02 03 04 05 06 07 08 09
02 04 06 08 10 12 14 16 18
03 06 09 12 15 18 21 24 27
04 08 12 16 20 24 28 32 36
05 10 15 20 25 30 35 40 45
06 12 18 24 30 36 42 48 54
07 14 21 28 35 42 49 56 63
08 16 24 32 40 48 56 64 72
09 18 27 36 45 54 63 72 81
C#, 135 chars, nice and clean:
var rg = Enumerable.Range(1, 9);
foreach (var rc in from r in rg
from c in rg
select (r * c).ToString("D2") + (c == 9 ? "\n\n" : " "))
Console.Write(rc);
PostgreSQL: 81 74 chars
select array(select generate_series(1,9)*x)from generate_series(1,9)as x;
Ruby - 56 chars :D
9.times{|a|9.times{|b|print"%02d "%((a+1)*(b+1))};puts;}
C - 66 Chars
This resolves the complaint about the second parameter of main :)
main(x){for(x=8;x++<89;)printf("%.2d%c",x/9*(x%9+1),x%9<8?32:10);}
C - 77 chars
Based on dreamlax's 97 char answer. His current answer somewhat resembles this one now :)
Compiles ok with gcc, and main(x,y) is fair game for golf i reckon
#define f(i){for(i=0;i++<9;)
main(x,y)f(x)f(y)printf("%.2d ",x*y);puts("");}}
XQuery 1.0 (96 bytes)
string-join(for$x in 1 to 9 return(for$y in 1 to 9 return concat(0[$x*$y<10],$x*$y,' '),'
'),'')
Run (with XQSharp) with:
xquery table.xq !method=text
Scala - 77 59 58 chars
print(1 to 9 map(p=>1 to 9 map(q=>"%02d "format(p*q))mkString)mkString("\n"))
Sorry, I had to do this, the Scala solution by Malax was way too readable...
[Edit] For comprehension seems to be the better choice:
for(p<-1 to 9;q<-{println;1 to 9})print("%02d "format p*q)
[Edit] A much longer solution, but without multiplication, and much more obfuscated:
val s=(1 to 9).toSeq
(s:\s){(p,q)=>println(q.map("%02d "format _)mkString)
q zip(s)map(t=>t._1+t._2)}
PHP, 62 chars
for(;$x++<9;print"\n",$y=0)while($y++<9)printf("%02d ",$x*$y);
Java - 155 137 chars
Update 1: replaced string building by direct printing. Saved 18 chars.
class M{public static void main(String[]a){for(int x,y=0,z=10;++y<z;System.out.println())for(x=0;++x<z;System.out.printf("%02d ",x*y));}}
More readable format:
class M{
public static void main(String[]a){
for(int x,y=0,z=10;++y<z;System.out.println())
for(x=0;++x<z;System.out.printf("%02d ",x*y));
}
}
Another attempt using C#/Linq with GroupJoin:
Console.Write(
String.Join(
Environment.NewLine,
Enumerable.Range(1, 9)
.GroupJoin(Enumerable.Range(1, 9), y => 0, x => 0, (y, xx) => String.Join(" ", xx.Select(x => x * y)))
.ToArray()));
Ruby — 47 chars
puts (a=1..9).map{|i|a.map{|j|"%2d"%(j*i)}*" "}
Output
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
(If we ignore spacing, it becomes 39: puts (a=1..9).map{|i|a.map{|j|j*i}*" "} And anyway, I feel like there's a bit of room for improvement with the wordy map stuff.)