Should sequence nature of input data within a mini-batch be maintained? - deep-learning

Suppose a multivariate time series prediction problem for the following data
rows = 2000
cols = 6
data = np.arange(int(rows*cols)).reshape(-1,rows).transpose()
print(data[0:20])
print('\n {} \n'.format(data.shape))
print(data[-20:])
which prints out following data
[[ 0 2000 4000 6000 8000 10000]
[ 1 2001 4001 6001 8001 10001]
[ 2 2002 4002 6002 8002 10002]
[ 3 2003 4003 6003 8003 10003]
[ 4 2004 4004 6004 8004 10004]
[ 5 2005 4005 6005 8005 10005]
[ 6 2006 4006 6006 8006 10006]
[ 7 2007 4007 6007 8007 10007]
[ 8 2008 4008 6008 8008 10008]
[ 9 2009 4009 6009 8009 10009]
[ 10 2010 4010 6010 8010 10010]
[ 11 2011 4011 6011 8011 10011]
[ 12 2012 4012 6012 8012 10012]
[ 13 2013 4013 6013 8013 10013]
[ 14 2014 4014 6014 8014 10014]
[ 15 2015 4015 6015 8015 10015]
[ 16 2016 4016 6016 8016 10016]
[ 17 2017 4017 6017 8017 10017]
[ 18 2018 4018 6018 8018 10018]
[ 19 2019 4019 6019 8019 10019]]
(2000, 6)
[[ 1980 3980 5980 7980 9980 11980]
[ 1981 3981 5981 7981 9981 11981]
[ 1982 3982 5982 7982 9982 11982]
[ 1983 3983 5983 7983 9983 11983]
[ 1984 3984 5984 7984 9984 11984]
[ 1985 3985 5985 7985 9985 11985]
[ 1986 3986 5986 7986 9986 11986]
[ 1987 3987 5987 7987 9987 11987]
[ 1988 3988 5988 7988 9988 11988]
[ 1989 3989 5989 7989 9989 11989]
[ 1990 3990 5990 7990 9990 11990]
[ 1991 3991 5991 7991 9991 11991]
[ 1992 3992 5992 7992 9992 11992]
[ 1993 3993 5993 7993 9993 11993]
[ 1994 3994 5994 7994 9994 11994]
[ 1995 3995 5995 7995 9995 11995]
[ 1996 3996 5996 7996 9996 11996]
[ 1997 3997 5997 7997 9997 11997]
[ 1998 3998 5998 7998 9998 11998]
[ 1999 3999 5999 7999 9999 11999]]
Consider the last column is the target time series to be predicted. If the batch-size is 20, Can I randomly skip some points e.g. 1009, 1012, 1015 from the first batch during training?
If yes, this mean we can randomly choose points within a time series to be used as training and test?

Related

How can I parse this html?

I'm trying to scrape https://PickleballBrackets.com using Selenium and BeautifulSoup with this code:
browser = webdriver.Safari()
browser.get('https://pickleballbrackets.com')
soup = BeautifulSoup(browser.page_source, 'lxml')
If I look at browser.page_source after I get the html, I can see 50 instances of
<div class="browse-row-box">
but after I create a soup object, they are lost. I believe that means that I have poorly formed html. I've tried all three parsers ('lxml', 'html5lib', 'html.parser') without any luck.
Suggestions on how to proceed?
Lot easier to get the data from the source.
import pandas as pd
import requests
url = 'https://pickleballbrackets.com/Json.asmx/EventsSearch_PublicUI'
payload = {
'AgeIDs': "",
'Alpha': "All",
'ClubID': "",
'CountryID': "",
'DateFilter': "future",
'EventTypeIDs': "1",
'FormatIDs': "",
'FromDate': "",
'IncludeTestEvents': "0",
'OrderBy': "EventActivityFirstDate",
'OrderDirection': "Asc",
'PageNumber': "1",
'PageSize': 9999,
'PlayerGroupIDs': "",
'PrizeMoney': "All",
'RankIDs': "",
'ReturnType': "json",
'SearchWord': "",
'ShowOnCalendar': "0",
'SportIDs': "dc1894c6-7e85-43bc-bfa2-3993b0dd630f",
'StateIDs': "",
'ToDate': "",
'prt': ""}
jsonData = requests.post(url, json=payload).json()
df = pd.DataFrame(jsonData['d'])
Output:
print(df.head(2).to_string())
RowNumber RecordCount PageCount CurrPage EventID ClubID Title TimeZoneAbbreviation UTCOffset HasDST StartTimesPosted Logo OnlineRegistration_Active Registration_DateOpen Registration_DateClosed IsSanctioned CancelTourney LocationOfEvent_Venue LocationOfEvent_StreetAddress LocationOfEvent_City LocationOfEvent_CountryTitle LocationOfEvent_StateTitle LocationOfEvent_Zip ShowDraws IsFavorite IsPrizeMoney MaxRegistrationsForEntireEvent Sanction_PCO SanctionLevelAppovedStatus_PCO SanctionLevelID_PCO Sanction_SSIPA SanctionLevelAppovedStatus_SSIPA SanctionLevelID_SSIPA Sanction_USAPA SanctionLevelAppovedStatus_USAP SanctionLevelID_USAP Sanction_WPF SanctionLevelAppovedStatus_WPF SanctionLevelID_WPF Sanction_GPA SanctionLevelAppovedStatus_GPA SanctionLevelID_GPA EventActivityFirstDate EventActivityLastDate IsRegClosed Cost_Registration_Current Cost_FeeOnEvents RegistrationCount_InAtLeastOneLiveEvent showResultsButton SantionLevels_PCO_Title SantionLevels_PCO_LevelLogo SantionLevels_SSIPA_Title SantionLevels_SSIPA_LevelLogo SantionLevels_USAP_Title SantionLevels_USAP_LevelLogo SantionLevels_WPF_Title SantionLevels_WPF_LevelLogo SantionLevels_GPA_Title SantionLevels_GPA_LevelLogo mng
0 1 152 1 1 410d04c2-49c5-48a4-847f-0f0ac0aa92f7 91c83e9c-c8e3-460d-b124-52f5c1036336 Cincinnati Pickleball Club 2022 March Mania EST -5 True False 410d04c2-49c5-48a4-847f-0f0ac0aa92f7_Logo.png True 1/24/2022 7:30:00 AM 3/22/2022 5:00:00 PM False False Five Seasons Ohio 11790 Snider Road Cincinnati United States Ohio 45249 -1 0 False 0 False False False False False 3/25/2022 4:00:00 PM 3/27/2022 2:00:00 PM 1 50.0 225.0 238 1 0
1 2 152 1 1 9f0c5976-94e9-4d58-a273-774744bdacec e5cd380b-fe72-4ef4-89e8-5053e94587a3 Flash Fridays Slam Series - March 25th EST -5 True False 9f0c5976-94e9-4d58-a273-774744bdacec_Logo.png True 3/1/2022 5:00:00 PM 3/23/2022 11:45:00 PM False False Holbrook Park 100 Sherwood Dr Huntersville United States North Carolina 28078 1 0 False 6 False False False False False 3/25/2022 4:00:00 PM 3/25/2022 4:00:00 PM 1 25.0 0.0 0 0 0
....
[152 rows x 60 columns]

How to use macro function for multiple <p> in xml with xPath?

I need to extract address, telephone no using xPath from my html page. My
address is sometimes within one `<p>`, else within two `<p>`. I have 11
stores.
This is the html tag <p> in my xml. (Just an example)
<div class="info-block-value"> ==$0
<p>36 rue de la Verrerie 75004 PARIS</p>
<p>Tél : 0111 222 222</p>
</div>
<div class="info-block-value"> ==$0
<p>11 rue des archives</p>
<p>75004 PARIS</p>
<p>Tél : 01 11 11 11 11</p>
</div>
1st shop: P1 =address P2= tel
2nd shop P1= address P2 = tel P3 = fax
3rd shop P1=address line 1 P2 = address line 2 P3= tel
4th : P1 = address P2 = tel
5th : P1= add P2 = tel
Shops 6,7,8,9,11 : P1 = add line 1 P2 = add line 2 ( they have no
telephone)
10th shop : P1= add line 1 P2= addline 2, P3= tel, P4= space P5 = email
I tried with,
{
"name": "store Addr",
"key": "Address",
"xPath": "(//div[#class='info-block-value']/p)[1] |
(//div[#class='info-block-value']/p)[2]",
"level": 0,
"enabled": true,
"values": []
},
{
"name": "Tel No",
"key": "TelephoneNumber",
"xPath": "(//div[#class='info-block-value']/p)[2]|
(//div[#class='info-
block-value']/p)[3]",
"regex": "Tél : ((\d+\s*)+)+",
"level": 0,
"enabled": true,
"values": []
}
But I'm not getting the correct results. Can someone help me this?
Results:
id name address phone
1 a 36 rue de la Verrerie 75004 PARIS 0111 222 222
2 b 11 rue des archives 01 11 11 11 11
Expecting results
id name address phone
1 a 36 rue de la Verrerie 75004 PARIS 0111 222 222
2 b 11 rue des archives 75004 PARIS 01 11 11 11 11
You basically need all the <p> but the last one for the address, and the last one for the telephone number.
So, get the address with
//div[#class="info-block-value"]/p[position() < last()]
and similarly, the telephone number is under
//div[#class="info-block-value"]/p[last()]

converting R dataframes to json object

Say I have the following dataframes:
df1 <- data.frame(Name = c("Harry","George"), color=c("#EA0001", "#EEEEEE"))
Name color
1 Harry #EA0001
2 George #EEEEEE
df.details <- data.frame(Name = c(rep("Harry",each=3), rep("George", each=3)),
age=21:23,
total=c(14,19,24,1,9,4)
)
Name age total
1 Harry 21 14
2 Harry 22 19
3 Harry 23 24
4 George 21 1
5 George 22 9
6 George 23 4
I know how to convert each df to json like this:
library(jsonlite)
toJSON(df.details)
[{"Name":"Harry","age":21,"total":14},{"Name":"Harry","age":22,"total":19},{"Name":"Harry","age":23,"total":24},{"Name":"George","age":21,"total":1},{"Name":"George","age":22,"total":9},{"Name":"George","age":23,"total":4}]
However, I am looking to get the following structure to my JSON data:
{
"myjsondata": [
{
"Name": "Harry",
"color": "#EA0001",
"details": [
{
"age": 21,
"total": 14
},
{
"age": 22,
"total": 19
},
{
"age": 23,
"total": 24
}
]
},
{
"Name": "George",
"color": "#EEEEEE",
"details": [
{
"age": 21,
"total": 1
},
{
"age": 22,
"total": 9
},
{
"age": 23,
"total": 4
}
]
}
]
}
I think the answer may be in how I store the data in a list in R before converting, but not sure.
Try this format:
df1$details <- split(df.details[-1], df.details$Name)[df1$Name]
df1
# Name color details
#1 Harry #EA0001 21, 22, 23, 14, 19, 24
#2 George #EEEEEE 21, 22, 23, 1, 9, 4
toJSON(df1)
#[{
#"Name":"Harry",
#"color":"#EA0001",
#"details":[
# {"age":21,"total":14},
# {"age":22,"total":19},
# {"age":23,"total":24}]},
#{
#"Name":"George",
#"color":"#EEEEEE",
#"details":[
# {"age":21,"total":1},
# {"age":22,"total":9},
# {"age":23,"total":4}]}
#]

Constructing request payload in R using rjson/jsonlite

My current code as seen below attempts to construct a request payload (body), but isn't giving me the desired result.
library(df2json)
library(rjson)
y = rjson::fromJSON((df2json::df2json(dataframe)))
globalparam = ""
req = list(
Inputs = list(
input1 = y
)
,GlobalParameters = paste("{",globalparam,"}",sep="")#globalparam
)
body = enc2utf8((rjson::toJSON(req)))
body currently turns out to be
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": "{}"
}
However, I need it to look like this:
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": {}
}
So basically global parameters have to be {}, but not hardcoded. It seemed like a fairly simple problem, but I couldn't fix it. Please help!
EDIT:
This is the dataframe
X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0
2 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0
3 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0
4 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0
This is an example of another data frame
> a = data.frame("col1" = c(81, 81, 81, 81), "col2" = c(72, 69, 79, 84))
Using this sample data
dd<-read.table(text=" X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0", header=T)
You can do
globalparam = setNames(list(), character(0))
req = list(
Inputs = list(
input1 = dd
)
,GlobalParameters = globalparam
)
body = enc2utf8((rjson::toJSON(req)))
Note that globalparam looks a bit funny because we need to force it to a named list for rjson to treat it properly. We only have to do this when it's empty.

R data.frame to JSON with child nodes / hierarchical

I am trying to write a data.frame from R into a JSON file, but in a hierarchical structure with child nodes within them. I found examples and JSONIO but I wasn't able to apply it to my case.
This is the data.frame in R
> DF
Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
1 2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2 2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
3 2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
4 2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
5 2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
6 2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25
In addtion to writing the data by month, I would also like to create an aggregate child, the 'yearly' one, which holds the sum (for example) of all the months that fall in this year. This is how I would like the JSON file to look like:
[
{
"ccg":"MyTown",
"data":[
{"period":"yearly",
"scores":[
{"name":"2009/2010","refYear":"2009","OC_5a":2, "OC_5b": 35, "OC_5c": 59},
{"name":"2010/2011","refYear":"2010","OC_5a":1, "OC_5b": 39, "OC_5c": 42},
]
},
{"period":"monthly",
"scores":[
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 15, "OC_5c": 27},
{"name":"2009/2010","refYear":"2009","month":"02","year":"2010","OC_5a":1, "OC_5b": 14, "OC_5c": 22},
{"name":"2009/2010","refYear":"2009","month":"03","year":"2010","OC_5a":1, "OC_5b": 6, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"04","year":"2010","OC_5a":0, "OC_5b": 10, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"05","year":"2010","OC_5a":1, "OC_5b": 16, "OC_5c": 7},
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 13, "OC_5c": 25}
]
}
]
},
]
Thank you so much for your help!
Expanding on my comment:
The jsonlite package has a lot of features, but what you're describing doesn't really map to a data frame anymore so I doubt any canned routine has this functionality. Your best bet is probably to convert the data frame to a more general list (FYI data frames are stored internally as lists of columns) with a structure that matches the structure of the JSON exactly, then just use the converter to translate
This is complicated in general but in your case should be fairly simple. The list will be structured exactly like the JSON data:
list(
list(
ccg = "Town1",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town1
),
list(
period = "monthly",
scores = monthly_data_frame_town1
)
)
),
list(
ccg = "Town2",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town2
),
list(
period = "monthly",
scores = monthly_data_frame_town2
)
)
)
)
Constructing this list should be a straightforward case of looping over unique(DF$CCG) and using aggregate at each step, to construct the yearly data.
If you need performance, look to either the data.table or dplyr packages to do the looping and aggregating all at once. The former is flexible and performant but a little esoteric. The latter has relatively easy syntax and is similarly performant, but is designed specifically around building pipelines for data frames so it might take some hacking to get it to produce the right output format.
Looks like ssdecontrol has you covered... but here's my solution. Need to loop over unique CCG and Years to create the entire data set...
df <- read.table(textConnection("Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25"), stringsAsFactors=F, header=T)
library(RJSONIO)
to_list <- function(ccg, year){
df_monthly <- subset(df, CCG==ccg & Year==year)
df_yearly <- aggregate(df[,c("OC_5a", "OC_5b", "OC_5c")] ,df[,c("name", "refYear")], sum)
l <- list("ccg"=ccg,
data=list(list("period" = "yearly",
"scores" = as.list(df_yearly)
),
list("period" = "monthly",
"scores" = as.list(df[,c("name", "refYear", "OC_5a", "OC_5b", "OC_5c")])
)
)
)
return(l)
}
toJSON(to_list("MyTown", "2010"), pretty=T)
Which returns this:
{
"ccg" : "MyTown",
"data" : [
{
"period" : "yearly",
"scores" : {
"name" : [
"2009/2010",
"2010/2011"
],
"refYear" : [
2009,
2010
],
"OC_5a" : [
2,
1
],
"OC_5b" : [
35,
39
],
"OC_5c" : [
59,
42
]
}
},
{
"period" : "monthly",
"scores" : {
"name" : [
"2009/2010",
"2009/2010",
"2009/2010",
"2010/2011",
"2010/2011",
"2010/2011"
],
"refYear" : [
2009,
2009,
2009,
2010,
2010,
2010
],
"OC_5a" : [
0,
1,
1,
0,
1,
0
],
"OC_5b" : [
15,
14,
6,
10,
16,
13
],
"OC_5c" : [
27,
22,
10,
10,
7,
25
]
}
}
]
}