I have the following csv structure:
'Country' '1960' '1961' '1962'
AUS 450 567 723
NZ 125 320
IND 350 375 395
SL
PAK 100 115 218
Using Python Pandas how do I convert(transpose) the above structure to the following?
'Country' 'Year' 'Value'
AUS 1960 450
AUS 1961 567
AUS 1962 723
NZ 1960
NZ 1961 125
...
My attempts at using pivot have been futile.
In [19]: df
Out[19]:
year 1960 1961 1962
Country
AUS 450 567 723
NZ NaN 125 320
IND 350 375 395
SL NaN NaN NaN
PAK 100 115 218
In [20]: df.stack().reset_index()
Out[20]:
Country year 0
0 AUS 1960 450
1 AUS 1961 567
2 AUS 1962 723
3 NZ 1961 125
4 NZ 1962 320
5 IND 1960 350
6 IND 1961 375
7 IND 1962 395
8 PAK 1960 100
9 PAK 1961 115
10 PAK 1962 218
Apparently NaN are dropped along the way.
Related
my table look like following
id person counter
1 Ona 4946
2 Mayra 15077
3 Claire 496
4 Rita 13929
5 Demond 579
6 Winnifred 13580
7 Green 1734
8 Jacquelyn 19092
9 Aisha 5572
10 Kian 8826
11 Alexandrea 7514
12 Dalton 14151
13 Rossie 18403
14 Carson 19537
15 Mason 2022
16 Emie 2394
17 Jonatan 6655
18 June 5037
19 Jazmyn 10856
20 Mittie 18928
here is the fiddle
i would like to select the top 5 by counter and group by first character, here is the sql that i tried:
SELECT SUBSTR(person,1,1) AS Alpha, person, counter
FROM myTable
GROUP BY SUBSTR(person,1,1)
ORDER BY SUBSTR(person,1,1) ASC, counter DESC;
how to select desired result as following:
alpha person counter
a Arvid 9236
a Aisha 5572
a Alf 4000
a Ahmad 3500
a Alvin 2100
b Brandon 13000
b Ben 8230
b Bonny 7131
b Bella 4120
b Bun 1200
c Connie 9320
c Calvin 8310
c Camalia 6123
c Cimon 3419
c Clay 2515
im using mysql 8.0
You can do:
select *
from (
select *, row_number() over(partition by substr(person, 1, 1)
order by counter desc) as rn
from myTable
) x
where rn <= 5
order by substr(person, 1, 1), rn
Result:
id person counter rn
---- ---------- -------- --
153 Alf 19758 1
283 Alycia 19706 2
260 Abe 19463 3
223 Assunta 18808 4
300 Ari 18031 5
210 Bennie 18309 1
159 Barry 18281 2
128 Beulah 18080 3
314 Benny 16795 4
474 Barry 15789 5
342 Casandra 19656 1
14 Carson 19537 2
67 Chaim 19429 3
280 Colin 18507 4
500 Corbin 18433 5
380 Daphney 19138 1
234 Dejah 18781 2
241 Derrick 18722 3
49 Dasia 18562 4
312 Darrel 17903 5
163 Evalyn 19847 1
79 Ernestine 19523 2
344 Emilie 19520 3
371 Eva 19119 4
469 Emma 18403 5
140 Fiona 19522 1
216 Flo 18314 2
356 Frieda 16082 3
254 Floy 15942 4
54 Florencio 12739 5
447 Geoffrey 19858 1
327 Geoffrey 19223 2
335 Grant 19100 3
454 Giuseppe 16175 4
83 Gardner 15235 5
373 Hilario 19507 1
35 Hanna 19276 2
200 Halle 18150 3
491 Hailee 17521 4
411 Hermann 17018 5
21 Idella 7440 1
177 Izabella 5536 2
115 Isai 4164 3
412 Izabella 2112 4
275 Imani 573 5
195 Joannie 19374 1
8 Jacquelyn 19092 2
48 Jalon 18861 3
251 Jamie 18768 4
367 Joanny 17600 5
282 Kendra 19278 1
421 Kendra 19213 2
363 Kaylin 18977 3
96 Kaylie 18423 4
310 Katrine 17754 5
146 Lonzo 19778 1
194 Leonora 18258 2
399 Laurine 16847 3
137 Leslie 16718 4
190 Luther 16318 5
87 Maegan 19112 1
20 Mittie 18928 2
271 Mariana 18149 3
317 Mary 18043 4
305 Maybelle 17666 5
281 Noelia 19203 1
176 Nickolas 19047 2
408 Nelson 15901 3
142 Nasir 13700 4
366 Nicole 10694 5
423 Ova 19759 1
487 Osborne 19539 2
438 Ozella 18911 3
375 Ora 18270 4
414 Onie 17358 5
52 Pascale 19658 1
39 Pearlie 17621 2
364 Price 14177 3
161 Precious 10337 4
294 Paula 9162 5
70 Quincy 18343 1
73 Quincy 16631 2
192 Quentin 13578 3
131 Rodger 19776 1
231 Royal 19033 2
313 Rocky 19008 3
13 Rossie 18403 4
45 Rosanna 15992 5
418 Sydnee 19810 1
470 Sadie 19189 2
123 Shanna 18862 3
485 Savanah 18664 4
302 Steve 16412 5
406 Toney 18283 1
28 Tremaine 16400 2
98 Taurean 15911 3
278 Tremaine 14391 4
311 Treva 14026 5
239 Ubaldo 11630 1
78 Valentina 17736 1
458 Vita 17527 2
170 Vergie 16971 3
158 Vance 15089 4
272 Veronica 12027 5
102 Willis 18155 1
329 Ward 14919 2
156 Westley 14867 3
136 Winnifred 14315 4
6 Winnifred 13580 5
323 Yolanda 17920 1
155 Yesenia 6164 2
402 Zachary 19129 1
37 Zaria 5398 2
See running example at DB Fiddle.
need to replace duplicates in column market_offers with value that can be SUM for all entries that give main value.
but there is a case that there are different rank and country codes so
input
country_code rank store_id category_id offers market_offers
se 1 14582 1106 410 504860
se 1 1955 1294 2 504860
se 1 9831 1158 151 504860
se 2 666 11158 536 4000
se 2 6587 25863 6586 4000
se 2 6666 158 536 4000
se 5 65853 76722 1521 302
se 5 6587 25863 6586 302
expected result
country_code rank store_id category_id offers market_offers
se 1 14582 1106 410 168 286
se 1 1955 1294 2 168 286
se 1 9831 1158 151 168 286
se 2 666 11158 536 1333
se 2 6587 25863 6586 1333
se 2 6666 158 536 1333
se 5 65853 76722 1521 151
se 5 6587 25863 6586 151
Consider below
select * except(market_offers),
round(market_offers / count(1) over(partition by market_offers, rank), 2) as market_offers
from `project.dataset.table`
if applied to sample data in your question - output is
How can get this : the last name (nom), first name (prenom) and age of competitors that participated at all competitons. I have difficulties with count and join.
my user table :
id
nom
prenom
login
age
1
Wehner
Einar
kleinviola
79
2
Beer
Cierra
earnestinelebsa
71
3
Gina
Lucien
cassindagmar
97
4
Maybelle
Delphine
haleypredovic
91
5
Upton
Elwyn
sstreich
63
6
Irwin
Prof.
christopframi
25
7
Ernser
Clint
cesar65
83
8
Bechtelar
Sheila
sofiasawayn
77
9
Simonis
Remington
christafahey
35
10
Parisian
Octavia
swiftsage
89
11
Predovic
Rory
bartolettisabri
78
12
Will
Sven
price66
20
13
O'Hara
Zoey
tiffanywillms
96
14
McGlynn
Julie
gkoss
74
15
Walter
Maximus
amandajenkins
63
16
Hahn
Andrew
drutherford
77
17
Kunze
Elinore
ziemanntheron
95
18
Ursula
Evelyne
collierodessa
64
19
Klein
Kirsten
darrellrunolfss
96
20
Chester
Lucien
jamey55
24
21
Darron
Antoine
justina27
60
22
Boyer
Harvey
hesseljameson
45
23
Jade
Lucien
kpagac
29
24
Eliane
Delphine
delphahessel
75
25
Lang
Shanna
sophia73
23
26
Wilderman
Fredrick
shaina75
34
27
Daniel
Emie
alene73
86
28
Daniel
Rhoda
foster22
63
29
Trantow
Tommie
boconner
40
30
Kerluke
Adolf
vstanton
74
31
Sehoubo
David
davidshbo
20
32
dfglskdsklj
dfvdvf
dfgdfg
0
my competitors table :
id_competitor
id_concours
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
31
1
9
2
10
2
11
2
12
2
13
2
14
2
15
2
16
2
17
2
18
2
31
2
1
3
2
3
3
3
4
3
5
3
19
3
20
3
31
3
2
4
4
4
6
4
8
4
10
4
12
4
14
4
16
4
18
4
20
4
1
5
3
5
5
5
7
5
9
5
11
5
13
5
15
5
17
5
19
5
my competitons table:
id
date_debut
date_fin
descriptif
theme
etat
1
2019-01-01 00:00:00
2019-03-01 00:00:00
Le premier concours de la plateforme
Les zinzins de l'espace
4
2
2018-01-01 00:00:00
2018-02-01 00:00:00
Le deuxième concours de la plateforme
Outils
4
3
2020-04-01 00:00:00
2020-05-01 00:00:00
Le troisième concours de la plateforme
Voiture sur autoroute
2
4
2018-07-01 00:00:00
2018-08-11 00:00:00
Le quatrième concours de la plateforme
Naruto Uzumaki
3
5
2018-10-01 00:00:00
2018-11-01 00:00:00
Le cinquième concours de la plateforme
Le grand peuple au dessus de la mer
4
This should return the name, first name and age of all users that participated in ALL competitions:
SELECT nom, prenom, age
FROM user
WHERE (SELECT count(DISTINCT id_concours) FROM competitors WHERE id_competitor = user.id) = (SELECT count(*) FROM competitons);
I am trying to parse 1st table located here using BeautifulSoup in Python. It parsed my First column but for some reason It didn't parsed entire table. Any help is appreciated!
Note: I am trying to parse entire table and convert into pandas dataframe
My Code:
import requests
from bs4 import BeautifulSoup
WIKI_URL = requests.get("https://en.wikipedia.org/wiki/NCAA_Division_I_FBS_football_win-loss_records").text
soup = BeautifulSoup(WIKI_URL, features="lxml")
print(soup.prettify())
my_table = soup.find('table',{'class':'wikitable sortable'})
links=my_table.findAll('a')
print(links)
It only parsed one column because you did a findall for only the items in the first column. To parse the entire table you'd have to do a findall for the table rows <tr> and then a findall within each row for the table divides <td>. Right now you are just doing a findall for the links and then printing the links.
my_table = soup.find('table',{'class':'wikitable sortable'})
for row in mytable.findAll('tr'):
print(','.join([td.get_text(strip=True) for td in row.findAll('td')]))
NOTE: Accept B.Adler's solution as it is good work and sound advice. This solution is simply so you can see some alternatives as you are learning.
Whenever I see <table> tags, I'll usually check out pandas first to see if I can find what I need from the tables that way. pd.read_html() will return a list of dataframes, and you can work/manipulate those to extract what you need.
import pandas as pd
WIKI_URL = "https://en.wikipedia.org/wiki/NCAA_Division_I_FBS_football_win-loss_records"
tables = pd.read_html(WIKI_URL)
You can also look through the dataframes to see which has the data you want.
I just used dataframe in index position 2 for this one, which is the first table you were looking for
table = tables[2]
Output:
print (table)
0 1 ... 6 7
0 Team Won ... Total Games Conference
1 Michigan 953 ... 1331 Big Ten
2 Ohio State 1 911 ... 1289 Big Ten
3 Notre Dame 2 897 ... 1263 Independent
4 Boise State 448 ... 618 Mountain West
5 Alabama 3 905 ... 1277 SEC
6 Oklahoma 896 ... 1274 Big 12
7 Texas 908 ... 1311 Big 12
8 USC 4 839 ... 1239 Pac-12
9 Nebraska 897 ... 1325 Big Ten
10 Penn State 887 ... 1319 Big Ten
11 Tennessee 838 ... 1281 SEC
12 Florida State 5 544 ... 818 ACC
13 Georgia 819 ... 1296 SEC
14 LSU 797 ... 1259 SEC
15 Appalachian State 617 ... 981 Sun Belt
16 Georgia Southern 387 ... 616 Sun Belt
17 Miami (FL) 630 ... 1009 ACC
18 Auburn 759 ... 1242 SEC
19 Florida 724 ... 1182 SEC
20 Old Dominion 76 ... 121 C-USA
21 Coastal Carolina 112 ... 180 Sun Belt
22 Washington 735 ... 1234 Pac-12
23 Clemson 744 ... 1248 ACC
24 Virginia Tech 743 ... 1262 ACC
25 Arizona State 614 ... 1032 Pac-12
26 Texas A&M 741 ... 1270 SEC
27 Michigan State 701 ... 1204 Big Ten
28 West Virginia 750 ... 1292 Big 12
29 Miami (OH) 690 ... 1195 MAC
.. ... ... ... ... ...
101 Memphis 482 ... 1026 The American
102 Kansas 582 ... 1271 Big 12
103 Wyoming 526 ... 1122 Mountain West
104 Louisiana 510 ... 1098 Sun Belt
105 Colorado State 520 ... 1124 Mountain West
106 Connecticut 508 ... 1107 The American
107 SMU 489 ... 1083 The American
108 Oregon State 530 ... 1173 Pac-12
109 UTSA 38 ... 82 C-USA
110 Kansas State 526 ... 1207 Big 12
111 New Mexico 483 ... 1103 Mountain West
112 Temple 468 ... 1094 The American
113 Iowa State 524 ... 1214 Big 12
114 Tulane 520 ... 1197 The American
115 Northwestern 535 ... 1240 Big Ten
116 UAB 126 ... 284 C-USA
117 Rice 470 ... 1108 C-USA
118 Eastern Michigan 453 ... 1089 MAC
119 Louisiana-Monroe 304 ... 727 Sun Belt
120 Florida Atlantic 87 ... 205 C-USA
121 Indiana 479 ... 1195 Big Ten
122 Buffalo 370 ... 922 MAC
123 Wake Forest 450 ... 1136 ACC
124 New Mexico State 430 ... 1090 Independent
125 UTEP 390 ... 1005 C-USA
126 UNLV11 228 ... 574 Mountain West
127 Kent State 341 ... 922 MAC
128 FIU 64 ... 191 C-USA
129 Charlotte 20 ... 65 C-USA
130 Georgia State 27 ... 94 Sun Belt
[131 rows x 8 columns]
I have a table like this.
id day1 day2 day3
1 411 523 223
2 413 554 245
3 417 511 209
4 420 515 232
5 422 522 212
6 483 567 212
7 456 512 256
8 433 578 209
9 438 532 234
10 418 555 223
11 460 510 263
12 453 509 245
13 441 524 233
14 430 543 261
15 456 582 222
16 444 524 241
17 478 511 211
18 421 583 222
I want to select all the IDs that have duplicate values in day2.
I'm doing
select day2,count(*) from resultater group by day having count(*)>1;
Is it possible to list all the IDs within the groups?
select day2,count(*), group_concat(id)
from resultater
group by day
having count(*)>1;
should do the trick.