Using Beautifulsoup to return the value being shown on the table in the webpage (Pandas read html) - html

I want to return only the price being shown on a grocery retailers website.
I have web scraped the table on the website but I want to only have the price for delivery in each cell in the dataframe. My idea is to filter each cell and return a regex match for a price within the string in the cell. I'm not sure if there's a simpler way I can do this, perhaps with pd.read_html?
import requests
import pandas as pd
from bs4 import BeautifulSoup
postcode = 'l4 0th'
payload = {'postcode': postcode}
putUrl = 'https://www.sainsburys.co.uk/gol-api/v1/customer/postcode'
Sains_url = 'https://www.sainsburys.co.uk/shop/PostCodeCheckSuccessView'
Sains_url2 = 'https://www.sainsburys.co.uk/shop/BookingDeliverySlotDisplayView'
client = requests.Session()
PutReq = client.put(putUrl, data=payload)
rget = client.get(Sains_url)
r2 = client.get(Sains_url2)
soup = BeautifulSoup(r2.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table), skiprows=([1]))[0]
df = df[~df.Time.str.contains("Afternoon delivery")]
df = df[~df.Time.str.contains("Evening delivery")]
My dataframe should look like this:
+-------------+----------------+-------------+-------------+
| Time | Today | Wed 26 June | Thu 27 June |
+-------------+----------------+-------------+-------------+
| 7.30-8:30am | Not Available | £3 | £5 |
+-------------+----------------+-------------+-------------+

IIUC, you could do some post-processing with regex and applymap:
import re
pat = re.compile('£\S+')
# Where this regex will extract '£' and every proceeding character
# upto the next whitespace
df.applymap(lambda x: re.findall(pat, str(x))[0] if '£' in str(x) else x)
[out]
Time Today Wed 26 Jun Thu 27 Jun Fri 28 Jun \
0 7:30am - 8:30am Not Available Not Available £4.50 £7
1 8:00am - 9:00am Not Available £3 £5.50 £6
2 8:30am - 9:30am Not Available £3 £5.50 £6
3 9:00am - 10:00am Not Available £3 £4.50 £6
4 9:30am - 10:30am Not Available £3 £4.50 £6
5 10:00am - 11:00am Not Available £2.50 £3.50 £5
6 11:00am - 12:00pm Not Available £1.50 £2.50 £4
8 12:00pm - 1:00pm Not Available £1 £2 £3
9 1:00pm - 2:00pm Not Available £0.50 £2 £2.50
10 2:00pm - 3:00pm Not Available £0.50 £3 £2.50
11 3:00pm - 4:00pm Not Available £0.50 £3 £3.50
12 4:00pm - 5:00pm Not Available £1 £3 £4.50
13 4:30pm - 5:30pm Not Available £1 £3 £4.50
15 5:00pm - 6:00pm Not Available £1 £3.50 £4.50
16 5:30pm - 6:30pm Not Available £1 £3.50 £4.50
17 6:00pm - 7:00pm Not Available Not Available £2.50 £4
18 6:30pm - 7:30pm Not Available Not Available £2.50 £4
19 7:00pm - 8:00pm Not Available Not Available £2.50 £4
20 7:30pm - 8:30pm Not Available Not Available £2.50 £4
21 8:00pm - 9:00pm Not Available Not Available £1.50 £2
22 9:00pm - 10:00pm Not Available £1.50 £1 £1.50
23 10:00pm - 11:00pm Not Available £1 £0.50 £1.50
Sat 29 Jun Sun 30 Jun Mon 1 Jul
0 £6.50 Not Available £5.50
1 £7 £7 £5.50
2 £7 £7 £5.50
3 £7 £7 £5
4 £7 £7 £5
5 £5.50 £5.50 £4.50
6 £5.50 £5 £2.50
8 £3.50 £3.50 £2
9 £3 £3.50 £1.50
10 £3 £2.50 £3
11 £3.50 £3 £2.50
12 £3.50 £3.50 £4
13 £3.50 £3.50 £4
15 £3 £2.50 £4
16 £3 £2.50 £4
17 £3 £3 £3
18 £3 £3 £3
19 £3 £3 £3
20 £3 £3 £3
21 £2 £2 £1
22 £2 £2 £1
23 Not Available Not Available £0.50
If lambdas aren't your thing, this would be akin to the more explicit:
def extract_cost(string):
if '£' in string:
return re.findall('£\S+', string)[0]
else:
return string
df.applymap(extract_cost)
Where applymap here is just 'applying' the function extract_cost to every value in DataFrame

Related

MySQL Get data by previous quarter

I am trying to find the last entry for the previous years quarter.
All I can access is year i.e 2021 and quarter i.e 1
Here is the data in my database:
id
name
start
end
16
April 2021
2021-04-01
2021-04-30
15
March 2021
2021-03-01
2021-03-31
14
February 2021
2021-02-01
2021-02-28
57
November 2020
2020-11-01
2020-11-30
55
October 2020
2020-10-01
2020-10-31
29
September 2020
2020-09-01
2020-09-30
27
July 2020
2020-07-01
2020-07-31
24
April 2020
2020-04-01
2020-04-30
23
March 2020
2020-03-01
2020-03-31
22
February 2020
2020-02-01
2020-02-29
21
January 2020
2020-01-01
2020-01-31
Using the MySQL quarter function I can get it to print out the quarter as an integer in another column:
SET #given_year = 2021;
SET #given_quarter = 1;
SELECT
id, name, start, end, QUARTER(end) as "Q"
FROM
submissions
id
name
start
end
Q
16
April 2021
2021-04-01
2021-04-30
2
15
March 2021
2021-03-01
2021-03-31
1
14
February 2021
2021-02-01
2021-02-28
1
57
November 2020
2020-11-01
2020-11-30
4
55
October 2020
2020-10-01
2020-10-31
4
29
September 2020
2020-09-01
2020-09-30
3
27
July 2020
2020-07-01
2020-07-31
3
24
April 2020
2020-04-01
2020-04-30
2
23
March 2020
2020-03-01
2020-03-31
1
22
February 2020
2020-02-01
2020-02-29
1
21
January 2020
2020-01-01
2020-01-31
1
I tried using WHERE and LIKE but it is returning 0 rows:
SELECT * FROM (
SELECT
id, name, start, end, QUARTER(end) as "Q"
FROM
submissions as s
) AS vs
WHERE
vs.end
LIKE
#given_year
AND
vs.Q < #given_quarter
I also need to account for the possibility that there may be no rows this year and I need to find the previous year.
For example with these two rows, if I was passed the year 2021 and quarter 1 I would need to return November of the previous year and a different quarter.
id
name
start
end
Q
14
February
2021
2021-02-01
2021-02-28
57
November
2020
2020-11-01
2020-11-30
If I understand correctly, you want all the rows from the quarter in the data before a given quarter. You can filter and use dense_rank():
select s.*
from (select s.*,
dense_rank() over (order by year(start) desc, quarter(start) desc) as seqnum
from submissions s
where year(start) < #given_year or
(year(start) = #given_year and quarter(start) < #given_quarter)
) s
where seqnum = 1;
The above returns all rows from the previous quarter (which is what I thought you wanted). If you want only one row:
select s.*
from submissions s
where year(start) < #given_year or
(year(start) = #given_year and quarter(start) < #given_quarter)
order by start desc
limit 1;

MySQL String to DATE / TIME or TIMESTAMP

In my data set, the start and end time for a task is given as a string. The string contains:
'Day, Date Month YYYY HH:MM:SS GMT'
'Wed, 18 Oct 2017 10:11:03 GMT'
The previous questions on Stack Overflow do not have data in this format and I have been struggling how to convert it into DATE/TIME or TIMESTAMP. Any advice on this would be greatly appreciated!
This post was quite relevant but still does not meet my needs, as the format of the string is different in both cases:
Converting date/time string to unix timestamp in MySQL
Overall, I want to achieve a variable 'time_on_task' which takes the difference per person between their start_time and end_time. Thus, for the following data:
Person TaskID Start_time End_time
Alpha 1 'Wed, 18 Oct 2017 10:10:03 GMT' 'Wed. 18 Oct 2017 10:10:36 GMT'
Alpha 2 'Wed, 18 Oct 2017 10:11:16 GMT' 'Wed, 18 Oct 2017 10:11:28 GMT'
Beta 1 'Wed, 18 Oct 2017 10:12:03 GMT' 'Wed, 18 Oct 2017 10:12:49 GMT'
Alpha 3 'Wed, 18 Oct 2017 10:12:03 GMT' 'Wed, 18 Oct 2017 10:13:13 GMT'
Gamma 1 'Fri, 27 Oct 2017 22:57:12 GMT' 'Sat, 28 Oct 2017 02:00:54 GMT'
Beta 2 'Wed, 18 Oct 2017 10:13:40 GMT' 'Wed, 18 Oct 2017 10:14:03 GMT'
The required output would be something like this:
Person TaskID Time_on_task
Alpha 1 0:00:33 #['Wed, 18 Oct 2017 10:10:36 GMT' - 'Wed, 18 Oct 2017 10:10:03 GMT']
Alpha 2 0:00:12 #['Wed, 18 Oct 2017 10:11:28 GMT' - 'Wed, 18 Oct 2017 10:11:16 GMT']
Beta 1 0:00:46 #['Wed, 18 Oct 2017 10:12:49 GMT' - 'Wed, 18 Oct 2017 10:12:03 GMT']
Alpha 3 0:01:10 #['Sat, 18 Nov 2017 10:13:13 GMT' - 'Sat, 18 Nov 2017 10:12:03 GMT']
Gamma 1 3:03:42 #['Sat, 28 Oct 2017 02:00:54 GMT' - 'Fri, 27 Oct 2017 22:57:12 GMT']
Beta 2 0:00:23 #['Wed, 18 Oct 2017 10:14:03 GMT' - 'Wed, 18 Oct 2017 10:13:40 GMT']
You need STR_TO_DATE() to convert the string to a date. Consider:
select str_to_date(
'Wed, 18 Oct 2017 10:11:03 GMT',
'%a, %d %b %Y %T GMT'
)
Yields:
2017-10-18 10:11:03
Once you strings are converted to dates, you can use timestampdiff() to compute the difference between them, and turn the result back to a time using sec_to_time():
select
person,
taskID,
sec_to_time(
timestampdiff(
second,
str_to_date(Start_time, '%a, %d %b %Y %T GMT'),
str_to_date(End_time, '%a, %d %b %Y %T GMT')
)
) time_on_task
from mytable
Demo on DB Fiddlde:
| person | taskID | time_on_task |
| ------ | ------ | ------------ |
| Alpha | 1 | 00:00:33 |
| Alpha | 2 | 00:00:12 |
| Beta | 1 | 00:00:46 |
| Alpha | 3 | 00:01:10 |
| Gamma | 1 | 03:03:42 |
| Beta | 2 | 00:00:23 |

RDLC report, generate a Pivot chart

I have the following data table format generated from SQL. Users can select which car models to be populated in the graph. There will be different number of Car models appearing in different reports. So there will be variable number of line charts appearing in the graph.(One line for one car model series)
Jan Feb Mar Apr May Jun
Honda 12 17 24 18 30 13
Toyota 15 20 10 15 30 40
Yamaha 30 25 30 15 13 40
Suzuki 35 15 13 40 45 45
Nissan 15 35 40 40 50 50
Kia 13 21 23 15 25 30
Mazda 25 25 30 32 15 40
How can I create a graph like this with RDLC reports?
This can be easily done using series groups in rdlc charting. The chart data source should be structured like this.
Brand Month Value
Honda Jan 10
Honda Feb 15
Honda Mar 20
Toyota Jan 20
Toyota Feb 21
Toyota Mar 22
Yamaha Jan 10
Yamaha Feb 11
Yamaha Mar 12
Add "Value" column as chart Values.
Add "Month" column as chart Categories.
Add "Brand" column as chartSeries Groups.

Sorting table based on Year and Year+Months

I have a table with a column Year and ID
YEAR ID
1988 29
1989 89
1990 22
1992 9
1994 8
1998 23
1922 20
August 1990 12
September 2009 14
August 1991 11
November 2009 33
October 1990 30
January 1990 55
March 2001 24
Is there way I can sort the table in such a way that my final result is in Order.. I am looking for the result like
YEAR ID
1922 20
1988 29
1989 89
1990 22
1992 9
1994 8
1998 23
January 1990 55
August 1990 12
October 1990 30
August 1991 11
March 2001 24
September 2009 14
November 2009 33
Thank you in advance
Just replace myYear with your table name...
select year,id
from
(select year,id,
case when STR_TO_DATE(year,'%Y') is not null then STR_TO_DATE(year,'%Y') else STR_TO_DATE(year,'%M %Y') end as d,
case when STR_TO_DATE(year,'%Y') is not null then 0 else 1 end as ob
from myYear
) y
order by ob asc,d asc;

Find first and last business day in R or Mysql

I'm looking to get a list of the first and last business days of the month.
Its basically a list of business days:
2009-01-03
2009-01-04
2009-01-05
...
I just want to get a list of the first and last days, basically and max and min day(date) for each year-month combination.
Any suggestions?
Your question states that you already have a list of business days and that you need a way of finding the minimum and maximum for each year-month combination.
You can use ddply in package plyr to do this. I also make use of package lubridate because it has some convenience functions to extract the year and month from a date.
Create some data:
library(lubridate)
x <- sample(seq(as.Date("2011-01-01"), by="1 day", length.out=365), 100)
df <- data.frame(date=x, year=year(x), month=month(x))
Now extract the min and max for each month:
library(plyr)
ddply(df, .(year, month), summarize, first=min(date), last=max(date))
year month first last
1 2011 1 2011-01-03 2011-01-30
2 2011 2 2011-02-03 2011-02-19
3 2011 3 2011-03-06 2011-03-29
4 2011 4 2011-04-09 2011-04-30
5 2011 5 2011-05-01 2011-05-29
6 2011 6 2011-06-04 2011-06-28
7 2011 7 2011-07-02 2011-07-29
8 2011 8 2011-08-10 2011-08-30
9 2011 9 2011-09-01 2011-09-28
10 2011 10 2011-10-07 2011-10-31
11 2011 11 2011-11-01 2011-11-28
12 2011 12 2011-12-01 2011-12-30