How can I parse this html? - html

I'm trying to scrape https://PickleballBrackets.com using Selenium and BeautifulSoup with this code:
browser = webdriver.Safari()
browser.get('https://pickleballbrackets.com')
soup = BeautifulSoup(browser.page_source, 'lxml')
If I look at browser.page_source after I get the html, I can see 50 instances of
<div class="browse-row-box">
but after I create a soup object, they are lost. I believe that means that I have poorly formed html. I've tried all three parsers ('lxml', 'html5lib', 'html.parser') without any luck.
Suggestions on how to proceed?

Lot easier to get the data from the source.
import pandas as pd
import requests
url = 'https://pickleballbrackets.com/Json.asmx/EventsSearch_PublicUI'
payload = {
'AgeIDs': "",
'Alpha': "All",
'ClubID': "",
'CountryID': "",
'DateFilter': "future",
'EventTypeIDs': "1",
'FormatIDs': "",
'FromDate': "",
'IncludeTestEvents': "0",
'OrderBy': "EventActivityFirstDate",
'OrderDirection': "Asc",
'PageNumber': "1",
'PageSize': 9999,
'PlayerGroupIDs': "",
'PrizeMoney': "All",
'RankIDs': "",
'ReturnType': "json",
'SearchWord': "",
'ShowOnCalendar': "0",
'SportIDs': "dc1894c6-7e85-43bc-bfa2-3993b0dd630f",
'StateIDs': "",
'ToDate': "",
'prt': ""}
jsonData = requests.post(url, json=payload).json()
df = pd.DataFrame(jsonData['d'])
Output:
print(df.head(2).to_string())
RowNumber RecordCount PageCount CurrPage EventID ClubID Title TimeZoneAbbreviation UTCOffset HasDST StartTimesPosted Logo OnlineRegistration_Active Registration_DateOpen Registration_DateClosed IsSanctioned CancelTourney LocationOfEvent_Venue LocationOfEvent_StreetAddress LocationOfEvent_City LocationOfEvent_CountryTitle LocationOfEvent_StateTitle LocationOfEvent_Zip ShowDraws IsFavorite IsPrizeMoney MaxRegistrationsForEntireEvent Sanction_PCO SanctionLevelAppovedStatus_PCO SanctionLevelID_PCO Sanction_SSIPA SanctionLevelAppovedStatus_SSIPA SanctionLevelID_SSIPA Sanction_USAPA SanctionLevelAppovedStatus_USAP SanctionLevelID_USAP Sanction_WPF SanctionLevelAppovedStatus_WPF SanctionLevelID_WPF Sanction_GPA SanctionLevelAppovedStatus_GPA SanctionLevelID_GPA EventActivityFirstDate EventActivityLastDate IsRegClosed Cost_Registration_Current Cost_FeeOnEvents RegistrationCount_InAtLeastOneLiveEvent showResultsButton SantionLevels_PCO_Title SantionLevels_PCO_LevelLogo SantionLevels_SSIPA_Title SantionLevels_SSIPA_LevelLogo SantionLevels_USAP_Title SantionLevels_USAP_LevelLogo SantionLevels_WPF_Title SantionLevels_WPF_LevelLogo SantionLevels_GPA_Title SantionLevels_GPA_LevelLogo mng
0 1 152 1 1 410d04c2-49c5-48a4-847f-0f0ac0aa92f7 91c83e9c-c8e3-460d-b124-52f5c1036336 Cincinnati Pickleball Club 2022 March Mania EST -5 True False 410d04c2-49c5-48a4-847f-0f0ac0aa92f7_Logo.png True 1/24/2022 7:30:00 AM 3/22/2022 5:00:00 PM False False Five Seasons Ohio 11790 Snider Road Cincinnati United States Ohio 45249 -1 0 False 0 False False False False False 3/25/2022 4:00:00 PM 3/27/2022 2:00:00 PM 1 50.0 225.0 238 1 0
1 2 152 1 1 9f0c5976-94e9-4d58-a273-774744bdacec e5cd380b-fe72-4ef4-89e8-5053e94587a3 Flash Fridays Slam Series - March 25th EST -5 True False 9f0c5976-94e9-4d58-a273-774744bdacec_Logo.png True 3/1/2022 5:00:00 PM 3/23/2022 11:45:00 PM False False Holbrook Park 100 Sherwood Dr Huntersville United States North Carolina 28078 1 0 False 6 False False False False False 3/25/2022 4:00:00 PM 3/25/2022 4:00:00 PM 1 25.0 0.0 0 0 0
....
[152 rows x 60 columns]

Related

Dynamically Flatten JSON response from API gives one Huge row

I am trying to dynamically flatten a json response for an API request but getting only one row with all the record back. kindly assist or point me in the right direction.
My json response looks like this
import requests, json
URL='https://data.calgary.ca/resource/848s-4m4z.json'
data = json.loads(requests.get(URL).text)
data
[{'sector': 'NORTH',
'community_name': 'THORNCLIFFE',
'group_category': 'Crime',
'category': 'Theft FROM Vehicle',
'count': '9',
'resident_count': '8474',
'date': '2018-03-01T12:00:00.000',
'year': '2018',
'month': 'MAR',
'id': '2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9',
'geocoded_column': {'latitude': '51.103099554741',
'longitude': '-114.068779421169',
'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
':#computed_region_4a3i_ccfj': '2',
':#computed_region_p8tp_5dkv': '4',
':#computed_region_4b54_tmc4': '2',
':#computed_region_kxmf_bzkv': '192'},
{'sector': 'SOUTH',
'community_name': 'WOODBINE',
'group_category': 'Crime',
'category': 'Theft FROM Vehicle',
'count': '3',
'resident_count': '8866',
'date': '2019-11-01T00:00:00.000',
'year': '2019',
'month': 'NOV',
'id': '2019-NOV-WOODBINE-Theft FROM Vehicle-3',
'geocoded_column': {'latitude': '50.939610852207664',
'longitude': '-114.12962865374453',
'human_address': '{"address": "", "city": "", "state": "", "zip": ""}'},
':#computed_region_4a3i_ccfj': '1',
':#computed_region_p8tp_5dkv': '6',
':#computed_region_4b54_tmc4': '5',
':#computed_region_kxmf_bzkv': '43'}
]
Here is my code
``
`# Function for flattening
# json
def flatten_json(y):
out = {}
def flatten(x, name=''):
# If the Nested key-value
# pair is of dict type
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
# If the Nested key-value
# pair is of list type
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
# Driver code
# print(flatten_json(data))
newf=flatten_json(data)
pd.json_normalize(newf)`
``
It returns
[enter image description here](https://i.stack.imgur.com/i6mUe.png)
While am expecting the data in the following format
[enter image description here](https://i.stack.imgur.com/mXNtU.png).
json_normalize gives me the data in expected format but I need a way to dynamically parse different json request format (programmatically).
To get your dataframe in correct form you can use this example (data is your list from the question):
import requests
import pandas as pd
from ast import literal_eval
url = "https://data.calgary.ca/resource/848s-4m4z.json"
df = pd.DataFrame(requests.get(url).json())
df = pd.concat(
[
df,
df.pop("geocoded_column")
.apply(pd.Series)
.add_prefix("geocoded_column_"),
],
axis=1,
)
df["geocoded_column_human_address"] = df["geocoded_column_human_address"].apply(
literal_eval
)
df = pd.concat(
[
df,
df.pop("geocoded_column_human_address")
.apply(pd.Series)
.add_prefix("addr_"),
],
axis=1,
)
print(df.head().to_markdown(index=False))
Prints:
sector
community_name
group_category
category
count
resident_count
date
year
month
id
:#computed_region_4a3i_ccfj
:#computed_region_p8tp_5dkv
:#computed_region_4b54_tmc4
:#computed_region_kxmf_bzkv
geocoded_column_latitude
geocoded_column_longitude
addr_address
addr_city
addr_state
addr_zip
NORTH
THORNCLIFFE
Crime
Theft FROM Vehicle
9
8474
2018-03-01T12:00:00.000
2018
MAR
2018-MAR-THORNCLIFFE-Theft FROM Vehicle-9
2
4
2
192
51.1031
-114.069
SOUTH
WOODBINE
Crime
Theft FROM Vehicle
3
8866
2019-11-01T00:00:00.000
2019
NOV
2019-NOV-WOODBINE-Theft FROM Vehicle-3
1
6
5
43
50.9396
-114.13
SOUTH
WILLOW PARK
Crime
Theft FROM Vehicle
4
5328
2019-11-01T00:00:00.000
2019
NOV
2019-NOV-WILLOW PARK-Theft FROM Vehicle-4
3
5
6
89
50.9566
-114.056
SOUTH
WILLOW PARK
Crime
Commercial Robbery
1
5328
2019-11-01T00:00:00.000
2019
NOV
2019-NOV-WILLOW PARK-Commercial Robbery-1
3
5
6
89
50.9566
-114.056
WEST
LINCOLN PARK
Crime
Commercial Break & Enter
5
2617
2019-11-01T00:00:00.000
2019
NOV
2019-NOV-LINCOLN PARK-Commercial Break & Enter-5
1
2
8
42
51.0101
-114.13

pandas | Read json file with list/array-like fields to Boolean columns

Here is a JSON string that contains a list of objects with each having another list embedded.
[
{
"name": "Alice",
"hobbies": [
"volleyball",
"shopping",
"movies"
]
},
{
"name": "Bob",
"hobbies": [
"fishing",
"movies"
]
}
]
Using pandas.read_json() this turns into a DataFrame like this:
name hobbies
--------------------------------------
1 Alice [volleyball, shopping, movies]
2 Bob [fishing, movies]
However, I would like to flatten the lists into Boolean columns like this:
name volleyball shopping movies fishing
----------------------------------------------------
1 Alice True True True False
2 Bob False False True True
I.e. when the list contains a value, the field in the corresponding column is filled with a Boolean True, otherwise with False.
I have also looked into pandas.io.json.json_normalize(), but that does not seem support this idea either. Is there any built-in way (either Python3, or pandas) to do this?
(PS. I realize that you can cook up your own code to 'normalize' the dictionary objects before loading the whole list into a DataFrame, but I might be reinventing the wheel with that and probably in a very inefficient way).
you can do the following:
In [56]: data = [
....: {
....: "name": "Alice",
....: "hobbies": [
....: "volleyball",
....: "shopping",
....: "movies"
....: ]
....: },
....: {
....: "name": "Bob",
....: "hobbies": [
....: "fishing",
....: "movies"
....: ]
....: }
....: ]
In [57]: df = pd.io.json.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
In [59]: df['count'] = 1
In [60]: df
Out[60]:
hobby name count
0 volleyball Alice 1
1 shopping Alice 1
2 movies Alice 1
3 fishing Bob 1
4 movies Bob 1
In [61]: df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
Out[61]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
Or even better:
In [88]: r = df.pivot_table(index='name', columns='hobby', values='count').fillna(0)
In [89]: r
Out[89]:
hobby fishing movies shopping volleyball
name
Alice 0.0 1.0 1.0 1.0
Bob 1.0 1.0 0.0 0.0
let's generate list of 'boolean' columns dynamically
In [90]: cols_boolean = [c for c in r.columns.tolist() if c != 'name']
In [91]: r = r[cols_boolean].astype(bool)
In [92]: print(r)
hobby fishing movies shopping volleyball
name
Alice False True True True
Bob True True False False
You can use crosstab with cast to bool by astype:
df = pd.json_normalize(data, 'hobbies', ['name']).rename(columns={0:'hobby'})
print df
hobby name
0 volleyball Alice
1 shopping Alice
2 movies Alice
3 fishing Bob
4 movies Bob
print pd.crosstab(df.name, df.hobby).astype(bool)
hobby fishing movies shopping volleyball
name
Alice False True True True
Bob True True False False

Parsing json files

Below is the code for visual analysis of a set of tweets obtained in a .json file. Upon interpreting , an error is shown at the map() function. Any way to fix it?
import json
import pandas as pd
import matplotlib.pyplot as plt
tweets_data_path = 'import_requests.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
print(len(tweets_data))
tweets = pd.DataFrame()
tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
These are the lines leading up to the 'ValueError' message I am getting for the above code :
Traceback (most recent call last):
File "tweet_len.py", line 21, in
tweets['text'] = map(lambda tweet: tweet['text'], tweets_data)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1887, in setitem
self._set_item(key, value)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1966, in _set_item
self._ensure_valid_index(value)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 1943, in _ensure_valid_index
raise ValueError('Cannot set a frame with no defined index '
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
I am using Python3.
EDIT : Below is a sample of the twitter data collected ( .json format).
{
"created_at": "Sat Mar 05 05:47:23 +0000 2016",
"id": 705993088574033920,
"id_str": "705993088574033920",
"text": "Tumi Inc. civil war: Staff manning US ceasefire hotline 'can't speak Arabic' #fakeheadlinebot #learntocode #makeatwitterbot #javascript",
"source": "\u003ca href=\"http://javascriptiseasy.com\" rel=\"nofollow\"\u003eJavaScript is Easy\u003c/a\u003e",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 4382400263,
"id_str": "4382400263",
"name": "JavaScript is Easy",
"screen_name": "javascriptisez",
"location": "Your Console",
"url": "http://javascriptiseasy.com",
"description": "Get learning!",
"protected": false,
"verified": false,
"followers_count": 167,
"friends_count": 68,
"listed_count": 212,
"favourites_count": 11,
"statuses_count": 55501,
"created_at": "Sat Dec 05 11:18:00 +0000 2015",
"utc_offset": null,
"time_zone": null,
"geo_enabled": false,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "000000",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_link_color": "FFCC4D",
"profile_sidebar_border_color": "000000",
"profile_sidebar_fill_color": "000000",
"profile_text_color": "000000",
"profile_use_background_image": false,
"profile_image_url": "http://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/673099606348070912/xNxp4zOt_normal.jpg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/4382400263/1449314370",
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": null,
"contributors": null,
"is_quote_status": false,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [{
"text": "fakeheadlinebot",
"indices": [77, 93]
}, {
"text": "learntocode",
"indices": [94, 106]
}, {
"text": "makeatwitterbot",
"indices": [107, 123]
}, {
"text": "javascript",
"indices": [124, 135]
}],
"urls": [],
"user_mentions": [],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "en",
"timestamp_ms": "1457156843690"
}
I think you can use read_json:
import pandas as pd
df = pd.read_json('file.json')
print df.head()
contributors coordinates created_at entities \
contributors_enabled NaN NaN 2016-03-05 05:47:23 NaN
created_at NaN NaN 2016-03-05 05:47:23 NaN
default_profile NaN NaN 2016-03-05 05:47:23 NaN
default_profile_image NaN NaN 2016-03-05 05:47:23 NaN
description NaN NaN 2016-03-05 05:47:23 NaN
favorite_count favorited filter_level geo \
contributors_enabled 0 False low NaN
created_at 0 False low NaN
default_profile 0 False low NaN
default_profile_image 0 False low NaN
description 0 False low NaN
id id_str \
contributors_enabled 705993088574033920 705993088574033920
created_at 705993088574033920 705993088574033920
default_profile 705993088574033920 705993088574033920
default_profile_image 705993088574033920 705993088574033920
description 705993088574033920 705993088574033920
... is_quote_status lang \
contributors_enabled ... False en
created_at ... False en
default_profile ... False en
default_profile_image ... False en
description ... False en
place retweet_count retweeted \
contributors_enabled NaN 0 False
created_at NaN 0 False
default_profile NaN 0 False
default_profile_image NaN 0 False
description NaN 0 False
source \
contributors_enabled <a href="http://javascriptiseasy.com" rel="nof...
created_at <a href="http://javascriptiseasy.com" rel="nof...
default_profile <a href="http://javascriptiseasy.com" rel="nof...
default_profile_image <a href="http://javascriptiseasy.com" rel="nof...
description <a href="http://javascriptiseasy.com" rel="nof...
text \
contributors_enabled Tumi Inc. civil war: Staff manning US ceasefir...
created_at Tumi Inc. civil war: Staff manning US ceasefir...
default_profile Tumi Inc. civil war: Staff manning US ceasefir...
default_profile_image Tumi Inc. civil war: Staff manning US ceasefir...
description Tumi Inc. civil war: Staff manning US ceasefir...
timestamp_ms truncated \
contributors_enabled 2016-03-05 05:47:23.690 False
created_at 2016-03-05 05:47:23.690 False
default_profile 2016-03-05 05:47:23.690 False
default_profile_image 2016-03-05 05:47:23.690 False
description 2016-03-05 05:47:23.690 False
user
contributors_enabled False
created_at Sat Dec 05 11:18:00 +0000 2015
default_profile False
default_profile_image False
description Get learning!
[5 rows x 25 columns]

Constructing request payload in R using rjson/jsonlite

My current code as seen below attempts to construct a request payload (body), but isn't giving me the desired result.
library(df2json)
library(rjson)
y = rjson::fromJSON((df2json::df2json(dataframe)))
globalparam = ""
req = list(
Inputs = list(
input1 = y
)
,GlobalParameters = paste("{",globalparam,"}",sep="")#globalparam
)
body = enc2utf8((rjson::toJSON(req)))
body currently turns out to be
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": "{}"
}
However, I need it to look like this:
{
"Inputs": {
"input1": [
{
"X": 7,
"Y": 5,
"month": "mar",
"day": "fri",
"FFMC": 86.2,
"DMC": 26.2,
"DC": 94.3,
"ISI": 5.1,
"temp": 8.2,
"RH": 51,
"wind": 6.7,
"rain": 0,
"area": 0
}
]
},
"GlobalParameters": {}
}
So basically global parameters have to be {}, but not hardcoded. It seemed like a fairly simple problem, but I couldn't fix it. Please help!
EDIT:
This is the dataframe
X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0
2 7 4 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 0
3 7 4 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 0
4 8 6 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 0
This is an example of another data frame
> a = data.frame("col1" = c(81, 81, 81, 81), "col2" = c(72, 69, 79, 84))
Using this sample data
dd<-read.table(text=" X Y month day FFMC DMC DC ISI temp RH wind rain area
1 7 5 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 0", header=T)
You can do
globalparam = setNames(list(), character(0))
req = list(
Inputs = list(
input1 = dd
)
,GlobalParameters = globalparam
)
body = enc2utf8((rjson::toJSON(req)))
Note that globalparam looks a bit funny because we need to force it to a named list for rjson to treat it properly. We only have to do this when it's empty.

R data.frame to JSON with child nodes / hierarchical

I am trying to write a data.frame from R into a JSON file, but in a hierarchical structure with child nodes within them. I found examples and JSONIO but I wasn't able to apply it to my case.
This is the data.frame in R
> DF
Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
1 2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2 2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
3 2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
4 2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
5 2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
6 2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25
In addtion to writing the data by month, I would also like to create an aggregate child, the 'yearly' one, which holds the sum (for example) of all the months that fall in this year. This is how I would like the JSON file to look like:
[
{
"ccg":"MyTown",
"data":[
{"period":"yearly",
"scores":[
{"name":"2009/2010","refYear":"2009","OC_5a":2, "OC_5b": 35, "OC_5c": 59},
{"name":"2010/2011","refYear":"2010","OC_5a":1, "OC_5b": 39, "OC_5c": 42},
]
},
{"period":"monthly",
"scores":[
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 15, "OC_5c": 27},
{"name":"2009/2010","refYear":"2009","month":"02","year":"2010","OC_5a":1, "OC_5b": 14, "OC_5c": 22},
{"name":"2009/2010","refYear":"2009","month":"03","year":"2010","OC_5a":1, "OC_5b": 6, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"04","year":"2010","OC_5a":0, "OC_5b": 10, "OC_5c": 10},
{"name":"2009/2010","refYear":"2009","month":"05","year":"2010","OC_5a":1, "OC_5b": 16, "OC_5c": 7},
{"name":"2009/2010","refYear":"2009","month":"01","year":"2010","OC_5a":0, "OC_5b": 13, "OC_5c": 25}
]
}
]
},
]
Thank you so much for your help!
Expanding on my comment:
The jsonlite package has a lot of features, but what you're describing doesn't really map to a data frame anymore so I doubt any canned routine has this functionality. Your best bet is probably to convert the data frame to a more general list (FYI data frames are stored internally as lists of columns) with a structure that matches the structure of the JSON exactly, then just use the converter to translate
This is complicated in general but in your case should be fairly simple. The list will be structured exactly like the JSON data:
list(
list(
ccg = "Town1",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town1
),
list(
period = "monthly",
scores = monthly_data_frame_town1
)
)
),
list(
ccg = "Town2",
data = list(
list(
period = "yearly",
scores = yearly_data_frame_town2
),
list(
period = "monthly",
scores = monthly_data_frame_town2
)
)
)
)
Constructing this list should be a straightforward case of looping over unique(DF$CCG) and using aggregate at each step, to construct the yearly data.
If you need performance, look to either the data.table or dplyr packages to do the looping and aggregating all at once. The former is flexible and performant but a little esoteric. The latter has relatively easy syntax and is similarly performant, but is designed specifically around building pipelines for data frames so it might take some hacking to get it to produce the right output format.
Looks like ssdecontrol has you covered... but here's my solution. Need to loop over unique CCG and Years to create the entire data set...
df <- read.table(textConnection("Date_by_Month CCG Year Month refYear name OC_5a OC_5b OC_5c
2010-01-01 MyTown 2010 01 2009 2009/2010 0 15 27
2010-02-01 MyTown 2010 02 2009 2009/2010 1 14 22
2010-03-01 MyTown 2010 03 2009 2009/2010 1 6 10
2010-04-01 MyTown 2010 04 2010 2010/2011 0 10 10
2010-05-01 MyTown 2010 05 2010 2010/2011 1 16 7
2010-06-01 MyTown 2010 06 2010 2010/2011 0 13 25"), stringsAsFactors=F, header=T)
library(RJSONIO)
to_list <- function(ccg, year){
df_monthly <- subset(df, CCG==ccg & Year==year)
df_yearly <- aggregate(df[,c("OC_5a", "OC_5b", "OC_5c")] ,df[,c("name", "refYear")], sum)
l <- list("ccg"=ccg,
data=list(list("period" = "yearly",
"scores" = as.list(df_yearly)
),
list("period" = "monthly",
"scores" = as.list(df[,c("name", "refYear", "OC_5a", "OC_5b", "OC_5c")])
)
)
)
return(l)
}
toJSON(to_list("MyTown", "2010"), pretty=T)
Which returns this:
{
"ccg" : "MyTown",
"data" : [
{
"period" : "yearly",
"scores" : {
"name" : [
"2009/2010",
"2010/2011"
],
"refYear" : [
2009,
2010
],
"OC_5a" : [
2,
1
],
"OC_5b" : [
35,
39
],
"OC_5c" : [
59,
42
]
}
},
{
"period" : "monthly",
"scores" : {
"name" : [
"2009/2010",
"2009/2010",
"2009/2010",
"2010/2011",
"2010/2011",
"2010/2011"
],
"refYear" : [
2009,
2009,
2009,
2010,
2010,
2010
],
"OC_5a" : [
0,
1,
1,
0,
1,
0
],
"OC_5b" : [
15,
14,
6,
10,
16,
13
],
"OC_5c" : [
27,
22,
10,
10,
7,
25
]
}
}
]
}