Webscraping Pokemon Data - html

I am trying to find out the number of moves each Pokemon (first generation) could learn.
I found the following website that contains this information: https://pokemondb.net/pokedex/game/red-blue-yellow
There are 151 Pokemon listed here - and for each of them, their move set is listed on a template page like this: https://pokemondb.net/pokedex/bulbasaur/moves/1
Since I am using R, I tried to get the website addresses for each of these 150 Pokemon (https://docs.google.com/document/d/1fH_n_BPbIk1bZCrK1hLAJrYPH2d5RTy9IgdR5Ck_lNw/edit#):
names = c("Bulbasaur","Ivysaur","Venusaur","Charmander","Charmeleon","Charizard","Squirtle","Wartortle","Blastoise","Caterpie","Metapod","Butterfree","Weedle","Kakuna","Beedrill",
"Pidgey","Pidgeotto","Pidgeot","Rattata","Raticate","Spearow","Fearow","Ekans","Arbok","Pikachu","Raichu","Sandshrew","Sandslash","Nidoran","Nidorina","Nidoqueen","Nidorino","Nidoking",
"Clefairy","Clefable","Vulpix","Ninetales","Jigglypuff","Wigglytuff","Zubat","Golbat","Oddish","Gloom","Vileplume","Paras","Parasect","Venonat","Venomoth","Diglett","Dugtrio","Meowth","Persian",
"Psyduck","Golduck","Mankey","Primeape","Growlithe","Arcanine","Poliwag","Poliwhirl","Poliwrath","Abra","Kadabra","Alakazam","Machop","Machoke","Machamp","Bellsprout","Weepinbell","Victreebel","Tentacool",
"Tentacruel","Geodude","Graveler","Golem","Ponyta","Rapidash","Slowpoke","Slowbro","Magnemite","Magneton","Farfetch’d","Doduo","Dodrio","Seel","Dewgong","Grimer","Muk","Shellder","Cloyster","Gastly","Haunter",
"Gengar","Onix","Drowzee","Hypno","Krabby","Kingler","Voltorb","Electrode","Exeggcute","Exeggutor","Cubone","Marowak","Hitmonlee","Hitmonchan","Lickitung","Koffing","Weezing","Rhyhorn","Rhydon","Chansey","Tangela",
"Kangaskhan","Horsea","Seadra","Goldeen","Seaking","Staryu","Starmie","Mr.Mime","Scyther","Jynx","Electabuzz","Magmar","Pinsir","Tauros","Magikarp","Gyarados","Lapras","Ditto"
,"Eevee","Vaporeon","Jolteon","Flareon","Porygon","Omanyte","Omastar","Kabuto","Kabutops","Aerodactyl","Snorlax","Articuno","Zapdos","Moltres","Dratini","Dragonair","Dragonite","Mewtwo","Mew")
template_1 = rep("https://pokemondb.net/pokedex/",150)
template_2 = rep("/moves/1",150)
pokemon_websites = data.frame(template_1, names, template_2)
pokemon_websites$full_website = paste(pokemon_websites$template_1, pokemon_websites$names, pokemon_websites$template_2)
Next, I remove all spaces:
library(stringr)
pokemon_websites$full_website = str_remove_all( pokemon_websites$full_website," ")
Now, I have a column with all the website names:
head(pokemon_websites)
template_1 names template_2 full_website
1 https://pokemondb.net/pokedex/ Bulbasaur /moves/1 https://pokemondb.net/pokedex/Bulbasaur/moves/1
2 https://pokemondb.net/pokedex/ Ivysaur /moves/1 https://pokemondb.net/pokedex/Ivysaur/moves/1
3 https://pokemondb.net/pokedex/ Venusaur /moves/1 https://pokemondb.net/pokedex/Venusaur/moves/1
4 https://pokemondb.net/pokedex/ Charmander /moves/1 https://pokemondb.net/pokedex/Charmander/moves/1
5 https://pokemondb.net/pokedex/ Charmeleon /moves/1 https://pokemondb.net/pokedex/Charmeleon/moves/1
6 https://pokemondb.net/pokedex/ Charizard /moves/1 https://pokemondb.net/pokedex/Charizard/moves/1
I would like to count the number of moves each of these 150 Pokemon can learn. For example, the first Pokemon "Bulbasaur" can learn 24 moves:
In the end, I would like to add a column to the earlier data frame that contains the number of moves each Pokemon can learn. For example, something that looks like this:
> head(pokemon_websites)
template_1 names template_2 full_website number_of_moves
1 https://pokemondb.net/pokedex/ Bulbasaur /moves/1 https://pokemondb.net/pokedex/Bulbasaur/moves/1 24
2 https://pokemondb.net/pokedex/ Ivysaur /moves/1 https://pokemondb.net/pokedex/Ivysaur/moves/1 ???
3 https://pokemondb.net/pokedex/ Venusaur /moves/1 https://pokemondb.net/pokedex/Venusaur/moves/1 ???
4 https://pokemondb.net/pokedex/ Charmander /moves/1 https://pokemondb.net/pokedex/Charmander/moves/1 ???
5 https://pokemondb.net/pokedex/ Charmeleon /moves/1 https://pokemondb.net/pokedex/Charmeleon/moves/1 ???
6 https://pokemondb.net/pokedex/ Charizard /moves/1 https://pokemondb.net/pokedex/Charizard/moves/1 ???
Is there a way to webscrape this data in R, count the number of moves for each of the 150 Pokemon, and then place this move count into a column?
Right now I am doing this by hand and it is taking a long time! Also, I have heard some websites do not allow for automated webscraping - if this website (https://pokemondb.net/pokedex/game/red-blue-yellow) does not allow webscraping, I can try to find another website that might allow it.
Thank you!

You can scrape all the tables for each of the pokemen using something like this:
tables =lapply(pokemon_websites$full_website,function(link) {
tryCatch(
read_html(link) %>% html_nodes("table") %>% html_table(),
error = function(e) {}, warning=function(w) {}
)
})
However, note that the number of tables returned differs for each of the pokemon. For example the first has 6 tables - the first three of those are for Red/Blue, the second three of those are for Yellow.
lengths(tables)
[1] 6 6 6 6 6 6 6 6 6 2 4 7 2 4 8 6 6 6 4 4 6 6 6 6 6 8 6 6 0 4 8 4 8 6 8 4 6 6 8 4 4 6 6 8 6 6 5 5 5 5 4 4 6 6 6
[56] 6 4 6 6 6 8 6 6 6 6 6 6 6 6 8 6 6 6 6 6 4 4 6 6 6 6 0 6 6 6 6 4 4 6 8 4 4 6 6 6 6 6 6 6 6 4 8 6 7 6 6 6 4 4 6
[111] 6 6 6 6 6 6 6 6 6 8 0 6 4 6 6 6 6 2 8 6 2 4 8 8 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

Since the OP wants to count only the moves in Red/Blue tab we can do the following, (If you need moves from both the tabs follow #langtang answer)
tables1 =lapply(pokemon_websites$full_website, function(x){
tryCatch( x %>% read_html() %>% html_nodes('.active') %>% html_nodes('.resp-scroll') %>% html_table(),
error = function(e) NULL
)
})
moves= lapply(tables1, function(x) lapply(x, function(x) dim(x)[1]))
moves = lapply(moves, unlist, use.names=FALSE)
moves = lapply(moves, sum) %>% unlist()
[1] 24 25 27 32 33 37 32 33 37 2 3 30 2 3 26 22 23 25 24 27 21 23 24 26 29 30 27 29 0 28 43 28 44 41 42 22 23 40 41 19 22 21 23 25 23 26 22 29 20 23 24
[52] 27 31 34 31 34 23 25 25 36 37 25 34 35 29 31 32 23 24 26 28 31 30 31 33 22 26 37 46 23 26 0 23 26 25 28 22 24 27 29 20 20 32 25 33 36 25 27 24 27 24 29
[103] 32 35 26 26 37 19 21 27 42 44 24 36 22 24 25 27 33 34 0 21 34 35 30 23 27 2 34 34 1 19 32 32 28 30 22 28 22 30 24 43 25 25 22 30 32 36 45 60

Related

Can you explain this function and how does this work with examples?

I can't understand this function and I have checked the tutorial website for those unknown function
here is the code:
def print_formatted(number):
# your code goes here
for i in range(1,n + 1):
pad = n.bit_length()
dec = str(i).rjust(pad)
octs = str(oct(i)[2:]).rjust(pad)
hexx = str(hex(i)[2:]).rjust(pad).upper()
bina = str(bin(i)[2:]).rjust(pad)
print(f'{dec} {octs} {hexx} {bina}')
Thanking you in advance!
This is the output it gave when called upon with n = 17
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
10 12 A 1010
11 13 B 1011
12 14 C 1100
13 15 D 1101
14 16 E 1110
15 17 F 1111
16 20 10 10000
17 21 11 10001
so how does this above code makes sure that the output gives not the output like this:
1 1 1 1
2 2 2 10
3 3 3 11
4 4 4 100
5 5 5 101
6 6 6 110
7 7 7 111
8 10 8 1000
9 11 9 1001
10 12 A 1010
11 13 B 1011
12 14 C 1100
13 15 D 1101
14 16 E 1110
15 17 F 1111
16 20 10 10000
17 21 11 10001

Find if a value exists in a Google sheet on a certain column, in all the rows above the current row based on 2 criterias

I have the following scenario:
columns from A-Z and 100 rows
in each row for the Z column I want to find if the value in A column from the current row exists in the rows above in A column
then if exists, I would like to find if the B column for the matching rows have the cell completed with a value
for all the rows that are matching I would to receive the matching rows in an array list, not as rows or at least to be able to put a value like "mathing"/"not matching"
this should be an array formula
I've tried something like this, only for the first criteria, but somehow it checks only the current row.
=ARRAYFORMULA( IF(ROW(Z2:Z)>2, IF(MATCH(A2:A,$A$2:A&ROW(A2:A)-1),"matching","not matching"),"not matching"))
I check to see if it's the first row (as it has headers), and if it's the first row, then surely it can't have any data matching above
It will be great to have it as a google sheet formula but if it's not possible it could also be a google app script
Try this:
function myfunk() {
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
const osh = ss.getSheetByName("Sheet1");
osh.clearContents();
const dsr = 2;
const vs = sh.getRange(dsr, 1, sh.getLastRow() - dsr + 1, sh.getLastColumn()).getDisplayValues();
let o = [];
vs.forEach((r, i) => {
if (i > 0) {
let as = vs.map(r => r[0]).slice(0, i);// suggested by DoubleUnary
let bs = vs.map(r => r[1]).slice(0, i);//suggested by DoubleUnary
let idx = as.indexOf(r[25]);
if (~idx && bs[idx]) {
o.push(['yes', dsr + i, dsr + idx, r[25], bs[idx]])
} else {
o.push(['no', dsr + i, ~idx ? as[idx] : '', r[25], ~idx ? bs[idx] : '']);
}
}
});
o.unshift(['Value', 'Test Row', 'Result Row', 'Z value', 'B value'])
Logger.log(JSON.stringify(o));
osh.getRange(1, 1, o.length, o[0].length).setValues(o);
}
My Data:
COL1
COL2
COL3
COL4
COL5
COL6
COL7
COL8
COL9
COL10
COL11
COL12
COL13
COL14
COL15
COL16
COL17
COL18
COL19
COL20
COL21
COL22
COL23
COL24
COL25
COL26
4
4
8
18
3
15
15
6
6
18
2
10
19
14
5
16
3
6
0
13
15
14
10
13
19
7
14
5
18
12
12
3
5
5
12
0
0
4
19
17
13
14
2
6
2
0
18
15
16
1
1
15
14
8
18
19
18
19
14
11
9
2
12
4
19
8
7
17
2
5
17
12
3
18
6
15
12
17
12
15
1
11
2
14
4
12
15
4
2
7
13
12
4
10
0
2
9
2
15
12
18
7
10
6
15
8
3
11
3
11
8
2
0
12
18
12
17
3
3
10
5
18
0
6
19
12
11
2
3
5
16
16
7
14
12
3
1
9
0
1
9
4
17
11
18
2
4
16
13
4
1
3
4
13
9
8
11
18
9
9
10
17
6
16
8
10
15
10
18
1
2
9
10
18
13
0
11
4
7
2
0
18
3
5
1
5
18
17
4
8
2
4
10
13
7
10
9
6
3
7
5
7
12
12
6
0
3
7
3
3
19
4
2
5
0
9
5
14
0
2
15
9
18
6
1
15
5
5
1
12
4
7
9
3
19
19
15
16
12
18
13
0
12
4
12
4
1
8
19
2
1
1
8
14
6
10
0
16
14
14
10
8
3
15
5
13
9
13
10
6
16
2
15
3
2
16
19
2
14
1
10
1
1
5
5
10
8
3
8
17
13
15
8
9
6
4
2
14
6
4
1
6
14
8
9
11
12
3
18
5
14
9
18
2
12
17
2
17
10
0
11
7
11
2
0
11
15
6
7
13
10
18
17
6
19
12
14
15
7
12
5
0
17
15
2
2
18
6
7
13
1
10
19
9
7
13
15
13
7
18
11
13
10
8
1
10
5
17
9
9
5
14
3
3
1
19
7
13
0
5
10
2
12
17
3
12
9
0
10
9
15
6
14
18
1
3
6
4
9
19
4
9
15
11
0
3
10
19
5
18
16
10
4
4
4
1
1
6
8
10
9
8
19
4
11
18
12
14
8
4
5
11
8
17
5
7
13
13
16
14
8
7
14
7
18
9
3
11
0
1
7
19
8
6
3
4
4
2
4
11
3
7
5
5
9
16
15
7
6
4
6
7
17
8
13
10
2
9
18
0
13
12
4
13
9
4
19
4
7
10
17
1
5
5
3
7
12
3
19
19
7
1
11
9
9
9
7
5
6
8
7
0
11
19
6
17
12
1
18
Results:
Value
Test Row
Result Row
Z value
B value
no
3
15
no
4
17
no
5
6
no
6
5
no
7
8
no
8
18
no
9
7
yes
10
9
3
5
yes
11
3
14
5
no
12
10
yes
13
3
14
5
yes
14
3
14
5
yes
15
12
10
8
yes
16
12
10
8
yes
17
2
4
4
no
18
8
8
yes
19
6
15
8
no
20
5
no
21
18

Google script to append multiple rows - select columns

How can I append multiple rows from source sheet to destination sheet; select columns like A,C,H ?
source sheet columns A-Z, with multiple rows.
destination sheet - append rows from source sheet - select columns A, C and H only.
Thank you
Transfer Data from one sheet to another selecting only columns A,C and H
function xferdata() {
const ss = SpreadsheetApp.getActive();
const ssh = ss.getSheetByName("Sheet0");
const dsh = ss.getSheetByName("Sheet1");
const vs = ssh.getRange(2,1,ssh.getLastRow() - 1, 8).getValues();
let o = vs.map(([a,,c,,,,,h]) => [a,c,h]);
//Logger.log(JSON.stringify(o));
dsh.getRange(dsh.getLastRow() + 1,1,o.length,o[0].length).setValues(o);
}
Sheet0:
COL1
COL2
COL3
COL4
COL5
COL6
COL7
COL8
1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
9
3
4
5
6
7
8
9
10
4
5
6
7
8
9
10
11
5
6
7
8
9
10
11
12
6
7
8
9
10
11
12
13
7
8
9
10
11
12
13
14
8
9
10
11
12
13
14
15
9
10
11
12
13
14
15
16
10
11
12
13
14
15
16
17
11
12
13
14
15
16
17
18
12
13
14
15
16
17
18
19
13
14
15
16
17
18
19
20
14
15
16
17
18
19
20
21
15
16
17
18
19
20
21
22
16
17
18
19
20
21
22
23
17
18
19
20
21
22
23
24
18
19
20
21
22
23
24
25
19
20
21
22
23
24
25
26
20
21
22
23
24
25
26
27
Sheet1:
A
B
C
1
3
8
2
4
9
3
5
10
4
6
11
5
7
12
6
8
13
7
9
14
8
10
15
9
11
16
10
12
17
11
13
18
12
14
19
13
15
20
14
16
21
15
17
22
16
18
23
17
19
24
18
20
25
19
21
26
20
22
27

Timed out script

I'm using the following code in order to avoid using an IMPORTRANGE() formula. The data I'm getting is over 50k rows so that's why I'm using App Script.
But now I'm getting the following error:
Exception: Service Spreadsheets timed out while accessing document with id
function Live_Data_importing() {
var raw_live = SpreadsheetApp.openByUrl("someURL").getSheetByName("Sheet1");
var selected_columns = raw_live.getRange("A:Q").getValues().map(([a,,,,e,f,g,,i,j,,l,,n,o,]) => [a,e,f,g,i,j,l,n,o] );
var FF_filtered = selected_columns.filter(row=>row[8]=="somedata");
var FF_Hourly_live_data = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("somesheet");
FF_Hourly_live_data.getRange(2, 1, FF_filtered.length, FF_filtered[0].length).setValues(FF_filtered);
}
How can I fix this error? I also made a copy of the file but script is still running the same error.
Tested this on smaller set of data
It works okay
function Live_Data_importing() {
const ss = SpreadsheetApp.getActive();
var raw_live = ss.getSheetByName("Sheet3");
var selected_columns = raw_live.getRange("A1:Q" + raw_live.getLastRow()).getValues().map(([a, , , , e, f, g, , i, j, , l, , n, o,]) => [a, e, f, g, i, j, l, n, o]);//fix this range
var FF_filtered = selected_columns.filter(row => row[8] == 0);//My data is all integers
var FF_Hourly_live_data = ss.getSheetByName("Sheet4");//output sheet
FF_Hourly_live_data.getRange(2, 1, FF_filtered.length, FF_filtered[0].length).setValues(FF_filtered);
}
Data:
COL1
COL2
COL3
COL4
COL5
COL6
COL7
COL8
COL9
COL10
COL11
COL12
COL13
COL14
COL15
COL16
COL17
COL18
COL19
COL20
0
1
18
13
0
7
5
11
15
2
1
17
0
8
11
18
0
16
1
8
0
16
16
9
5
0
16
8
14
0
7
4
3
15
8
19
18
4
18
10
9
14
7
16
17
13
17
4
8
6
19
18
9
18
12
1
8
19
12
10
13
17
4
2
10
3
7
16
14
13
1
5
16
12
2
3
14
3
0
9
4
17
14
5
9
13
18
0
4
1
14
1
12
7
10
1
15
4
14
5
1
17
13
17
3
1
17
8
19
7
14
18
2
17
1
16
2
19
13
15
7
4
9
0
11
14
11
7
12
14
3
19
14
13
18
12
16
19
19
6
7
6
10
5
15
8
0
8
2
16
4
14
18
14
2
16
16
5
15
16
4
5
9
6
6
2
1
15
14
8
19
8
4
10
12
12
4
6
10
12
11
15
3
4
1
3
17
19
10
4
11
2
10
16
12
1
6
3
0
3
11
0
14
8
13
13
4
2
16
18
10
14
5
3
7
4
7
9
5
15
0
6
5
1
2
18
1
1
11
13
7
13
5
15
5
13
17
18
14
19
0
17
10
6
10
16
16
0
18
19
12
8
15
1
11
4
19
4
17
14
4
14
14
6
16
8
15
4
5
2
4
5
14
14
16
9
0
16
0
4
8
3
8
5
12
15
5
19
14
0
1
6
12
2
19
10
10
19
13
19
0
5
14
7
2
17
3
10
2
14
2
5
0
18
5
1
13
13
3
13
10
18
18
4
18
11
0
7
13
9
18
13
9
2
0
16
15
3
9
14
12
15
15
7
16
13
14
1
4
12
18
9
6
14
18
11
16
2
18
15
15
0
2
4
6
16
5
18
2
6
14
18
11
1
9
15
13
8
8
19
11
19
14
0
7
15
1
10
8
9
14
14
15
3
11
13
4
10
5
16

Selecting or Updating with 2 group criteria for a max value on mysql [duplicate]

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 3 years ago.
I'm trying to figure out how to use group and max() on joins correctly, I'm doing some parsing of a moodle(Open source school software) mysql database. Students are allowed to retake the quizes indefinatly for this particular program, but I need to be able to update the course completion date to reflect the last time they took the test because a lot of other things depend on the completion fields.
The mdl_quiz_attempts table stores all attempts for all quizes, the userid will have many of the same entries, but the attempt number is not unique to the table, but instead unique to both the student AND the key for the row. Meaning students have multiple entries. On the mdl_course_modules table, The instance field is the key for the mdl_quiz table, and the mdl_course_modules_completion coursemoduleid field is the key for mdl_course_modules.
So what I want to do is this:
given a student id
UPDATE mdl_course_completion.timemodified to mdl_quize_attempts.timemodified
WHERE the row on mdl_quiz_attempts is the max attempt by userid for each quiz.(the quiz field on the quiz_attempts has to be looked up through in course module instance table to get the, instance id for course completion module id)
Here are example partial tables.
mdl_quiz_attempts
id quiz userid attempt timemodified
2 1 3 6 1365408901
6 1 4 1 1369873688
7 2 4 1 1369877532
8 7 4 1 1369881431
9 7 4 2 1369882897
12 5 4 1 1505165504
13 6 4 1 1369887643
17 8 4 1 1369958105
18 1 4 2 1374557701
22 7 4 3 1374639901
23 6 4 7 1374640202
24 5 4 2 1374639901
25 8 4 2 1374639901
26 2 4 2 1374639301
27 2 6 1 1376620469
29 2 12 1 1389915486
30 1 23 1 1390978667
31 1 23 2 1391030924
32 2 23 1 1392113103
33 2 23 2 1392696602
34 2 23 3 1392767435
35 7 12 1 1398914256
36 8 43 1 1405281193
37 1 50 1 1405522411
38 5 43 1 1505165504
mdl_course_modules
id course module instance section
3 2 9 2 3
5 2 17 2 4
7 2 17 3 5
8 2 17 4 6
9 2 17 5 7
10 2 17 6 8
11 2 17 7 9
12 2 17 8 10
13 2 17 9 11
14 2 17 10 12
15 2 17 11 13
25 2 16 1 14
26 2 23 1 4
28 2 7 1 14
30 4 9 4 26
42 4 23 3 33
45 4 23 6 38
46 4 23 7 37
47 4 23 8 36
48 4 23 9 35
49 4 23 10 32
50 4 23 11 34
51 5 9 5 27
53 5 23 12 43
55 5 23 13 44
mdl_quiz
id name
10 Unit 10 Quiz
11 Unit 2 Quiz
12 Unit 3 Quiz
13 Unit 5 Quiz
14 Unit 1 Quiz
15 Unit 8 Quiz
16 Unit 9 Quiz
17 Unit 7 Quiz
18 Unit 4 Quiz
mdl_course_modules_completion
id coursemoduleid userid completionstate viewed timemodified
14 25 2 0 1 0
15 25 6 0 1 0
67 25 4 1 1 1369873688
68 28 4 1 0 1369874483
69 192 4 1 0 1369875233
70 184 4 1 1 1369877532
Something like this ?
update mdl_course_modules_completion c
join mdl_quiz_attempts a on a.userid = c.userid
join (select max(attempt) max_attempts from mdl_quiz_attempts group by userid) max on max.max_attempts = a.attempt
set c.timemodified = a.timemodified
where c.userid = :<USER_ID>