Removing unnecessary sentences in a json file - json

I am trying to remove the lines that contain [CLS] and [SEP] in the following json file. Is there any way to do this in python? How to remove these lines with the given text?
"Tirukkollampudur Vilvavaneswarar -. Temple  - Shivastalam.txt": {
"context": " may be reproduced or used in any form without permission. This Shivastalam is located 5 km south of Kodavasal and Koradacheri on the Tiruvarur Thanjavur railroad. Koovilamputhur the original name became Kollampudur. Koovilam stands for Vilvam, hence Vilvavanam. This shrine is regarded as a Muktistalam. This shrine is regarded as the 113rd in the series of Tevara Stalams in the Chola Region south of the river Kaveri. Legends The Vilva trees are said to represent splashes of the celestial nectar Amritam, and this stalam is considered on par with Banares. Sundarar is believed to have floated across the river to this temple in a boatmanless raft in a river in spate singing a Patikam . This event is celebrated in a festival in the monsoon month of Libra. The Avimukteswarar temple nearby is also associated with this legend as is the Shivastalam at Kodavasal. Shiva is said to have blessed Durvasa Muni with a vision of the Cosmic Dance here. Legend also has it that Arjuna worshipped Shiva at this shrine. The Temple There are several inscriptions here, and the Cholas have made immense contributions here.to this temple which was built during the time of Kulottunga Chola I. This temple occupies an area of over 2 acres, and its second prakaram has a 5 tiered rajagopuram. The Vinayakar in this temple is also of great Festivals Six worship services are offered each day. Kartikai Deepam, Arudra Darisanam, Sivaratri, Skanda Sashti are some of the festivals celebrated here. ",
"answers": [
[
"5 km south of kodavasal and koradacheri on the tiruvarur thanjavur railroad"
],
[
"5 km south of kodavasal and koradacheri on the tiruvarur thanjavur railroad"
],
[
" "
],
[
" "
],
[
"during the time of kulottunga chola i"
],
[
"[CLS] what are the darshan hours ? [SEP] may be reproduced or used in any form without permission . this shivastalam is located 5 km south of kodavasal and koradacheri on the tiruvarur thanjavur railroad . koovilamputhur the original name became kollampudur . koovilam stands for vilvam , hence vilvavanam . this shrine is regarded as a muktistalam . this shrine is regarded as the 113rd in the series of tevara stalams in the chola region south of the river kaveri . legends the vilva trees are said to represent splashes of the celestial nectar amritam , and this stalam is considered on par with banares . sundarar is believed to have floated across the river to this temple in a boatmanless raft in a river in spate singing a patikam . this event is celebrated in a festival in the monsoon month of libra . the avimukteswarar temple nearby is also associated with this legend as is the shivastalam at kodavasal . shiva is said to have blessed durvasa muni with a vision of the cosmic dance here . legend also has it that arjuna worshipped shiva at this shrine . the temple there are several inscriptions here , and the cholas have made immense contributions here . to this temple which was built during the time of kulottunga chola i . this temple occupies an area of over 2 acres , and its second prakaram has a 5 tiered rajagopuram . the vinayakar in this temple is also of great festivals six worship services are offered each day . kartikai deepam , arudra darisanam"
],
[
"[CLS] what is the average darshan duration ? [SEP]"
],
[
" "
],
[
" "
],
[
" "
],
[
" "
],
[
" "
],
[
" "
]
]
},

you can try the following approach. Since you got lists of sublists we can do the following.
import json
def remove_from_sublists(the_list, to_be_removed):
for each_item in list(the_list):
if isinstance(each_item, list):
remove_from_sublists(each_item, to_be_removed)
elif to_be_removed in each_item :
the_list.remove(each_item)
return the_list
dic = {}
with open('WebTempleCorpus.json') as json_file:
data = json.load(json_file)
for (i, v) in data.items():
sub_dict = v
if(v.get("answers")):
sub_dict["answers"] = remove_from_sublists(v["answers"], "CLS")
sub_dict["answers"] = remove_from_sublists(v["answers"], "SEP")
dic[i] = sub_dict
with open('result.json', 'w') as fp:
json.dump(dic, fp)

Related

Concatenate values from non-adjacent objects based on multiple matching criteria

I received help on a related question previously on this forum and am wondering if there is a similarly straightforward way to resolve a more complex issue.
Given the following snippet, is there a way to merge the partial sentence (the one which does not end with a "[punctuation mark][white space]" pattern) with its remainder based on the matching TextSize? When I tried to adjust the answer from the related question I quickly ran into issues, but I am basically looking to translate a rule such as if .Text !endswith("[punctuation mark][white space]") then .Text + next .Text where .TextSize matches
{
"Text": "Was it political will that established social democratic policies in the 1930s and ",
"Path": "P",
"TextSize": 9
},
{
"Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
"Path": "Footnote",
"TextSize": 8
},
{
"Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
"Path": "P",
"TextSize": 8
},
{
"Text": "1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
"Path": "P",
"TextSize": 9
},
{
"Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
"Path": "P",
"TextSize": 9
}
The result I'm looking for would be as follows:
{
"Text": "Was it political will that established social democratic policies in the 1930s and 1940s, or that undermined them after the 1970s? Or was it abundant and cheap energy resources that enabled social democratic policies to work until the 1970s, and energy constraints that forced a restructuring of policy after the 1970s? ",
"Path": "P",
"TextSize": 9
},
{
"Text": "31 Lawrence Mishel and Jessica Schieder, Economic Policy Institute website, May 24, 2016 at (https://www.epi.org/publication/as-union-membership-has-fallen-the-top-10-percent-have-been-getting-a-larger-share-of-income/). ",
"Path": "Footnote",
"TextSize": 8
},
{
"Text": "Fig. 9.2 Higher union membership has been associated with a higher share of income to lower income brackets (the lower 90%) and a lower share of income to the top 10% of earners. ",
"Path": "P",
"TextSize": 8
},
{
"Text": "Recall that my economic modeling discussed in Chap. 6 shows that, even with no change in the assumption related to labor \u201cbargaining power,\u201d you can explain a shift from increasing to declining income equality (higher equality expressed as a higher wage share) by a corresponding shift from a period of rapidly increasing per capita resource consumption to one of constant per capita resource consumption. ",
"Path": "P",
"TextSize": 9
}
The following, which assumes the input is a valid JSON array, will merge every .Text with at most one successor, but can easily be modified to merge multiple .Text values together as shown in Part 2 below.
Part 1
# input and output: an array of {Text, Path, TextSize} objects.
# Attempt to merge the .Text of the $i-th object with the .Text of a subsequent compatible object.
# If a merge is successful, the subsequent object is removed.
def attempt_to_merge_next($i):
.[$i].TextSize as $class
| first( (range($i+1; length) as $j | select(.[$j].TextSize == $class) | $j) // null) as $j
| if $j then .[$i].Text += .[$j].Text | del(.[$j])
else .
end;
reduce range(0; length) as $i (.;
if .[$i] == null then .
elif .[$i].Text|test("[,.?:;]\\s*$")|not
then attempt_to_merge_next($i)
else .
end)
Part 2
Using the above def:
def merge:
def m($i):
if $i >= length then .
elif .[$i].Text|test("[,.?:;]\\s*$")|not
then attempt_to_merge_next($i) as $x
| if ($x|length) == length then m($i+1)
else $x|m($i)
end
else m($i+1)
end ;
m(0);
merge

Altering JSON Structure

So I am using a webscraper to pull information on sneakers from a website. The son data that comes back is structured like so
[
{
"web-scraper-order": "1554084909-97",
"web-scraper-start-url": "https://www.goat.com/sneakers",
"productlink": "$200AIR JORDAN 6 RETRO 'INFRARED' 2019",
"productlink-href": "https://www.goat.com/sneakers/air-jordan-6-retro-black-infrared-384664-060",
"name": "Air Jordan 6 Retro 'Infrared' 2019",
"price": "Buy New - $200",
"description": "The 2019 edition of the Air Jordan 6 Retro ‘Infrared’ is true to the original colorway, which Michael Jordan wore when he captured his first NBA title. Dressed primarily in black nubuck with a reflective 3M layer underneath, the mid-top features Infrared accents on the midsole, heel tab and lace lock. Nike Air branding adorns the heel and sockliner, an OG detail last seen on the 2000 retro.",
"releasedate": "2019-02-16",
"colorway": "Black/Infrared 23-Black",
"brand": "Air Jordan",
"designer": "Tinker Hatfield",
"technology": "Air",
"maincolor": "Black",
"silhouette": "Air Jordan 6",
"nickname": "Infrared",
"category": "lifestyle",
"image-src": "https://image.goat.com/crop/1250/attachments/product_template_additional_pictures/images/018/675/318/original/464372_01.jpg.jpeg"
},
{
"web-scraper-order": "1554084922-147",
"web-scraper-start-url": "https://www.goat.com/sneakers",
"productlink": "$190YEEZY BOOST 350 V2 'CREAM WHITE / TRIPLE WHITE'",
"productlink-href": "https://www.goat.com/sneakers/yeezy-boost-350-v2-cream-white-cp9366",
"name": "Yeezy Boost 350 V2 'Cream White / Triple White'",
"price": "Buy New - $220",
"description": "First released on April 29, 2017, the Yeezy Boost 350 V2 ‘Cream White’ combines a cream Primeknit upper with tonal cream SPLY 350 branding, and a translucent white midsole housing full-length Boost. Released again in October 2018, this retro helped fulfill Kanye West’s oft-repeated ‘YEEZYs for everyone’ Twitter mantra, as adidas organized the biggest drop in Yeezy history by promising pre-sale to anyone who signed up on the website. Similar to the first release, the ‘Triple White’ 2018 model features a Primeknit upper, a Boost midsole and custom adidas and Yeezy co-branding on the insole.",
"releasedate": "2017-04-29",
"colorway": "Cream White/Cream White/Core White",
"brand": "adidas",
"designer": "Kanye West",
"technology": "Boost",
"maincolor": "White",
"silhouette": "Yeezy Boost 350",
"nickname": "Cream White / Triple White",
"category": "lifestyle",
"image-src": "https://image.goat.com/crop/1250/attachments/product_template_additional_pictures/images/014/822/695/original/116662_03.jpg.jpeg"
},
However, I want to change it so that the top level node is sneakers and the next level down would be a specific sneaker brand ( Jordan, Nike, Adidas) and then the list of sneakers that belong to that brand. So my JSON structure would look something like this
Sneakers {
Adidas :{
[shoe1,
shoe2,
....
] },
Jordan: {
[shoe1,
shoe2,
....
]
}
}
I am not sure what tool I could use to do that. Any help would be greatly appreciated. All I have at the moment is the JSON file and it is not in the structure that I want it to be in.
One way of doing this would be to populate a dict whose keys are brand names and their values are lists of sneaker records. Assuming that data is your original list, here's the code:
sneakers_by_brand = {}
for record in data:
if sneakers_by_brand.get(record.get("brand")):
sneakers_by_brand[record.get("brand")].append(record)
else:
sneakers_by_brand[record.get("brand")] = [record]
print(sneakers_by_brand)

Converting JSON encoded in HTML to JSON using BeautifulSoup

I know similar questions have been asked here, but I'm still struggling to find a solution here. I'm able to parse raw HTML from the bandsintown website, using beautifulSoup, but my ultimate goal is to access the script on the page and access a JSON embedded in the script. Opening the page source, I can see that "eventsJsonLd" is what I need:
"jsonLdContainer":{"eventsJsonLd":[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-25","endDate":"2019-01-25","url":"https://www.bandsintown.com/e/100451456-pop-rocks-at-hopmonk-tavern-novato?came_from=244","location":{"#type":"Place","name":"HopMonk Tavern Novato","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.1074198,"longitude":-122.5697032}},"name":"Pop Rocks","performer":{"#type":"MusicGroup","name":"Pop Rocks","image":"https://photos.bandsintown.com/thumb/8532836.jpeg","url":"https://www.bandsintown.com/a/29109-pop-rocks?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8532836.jpeg"},
Here's my code:
#define url and build url array to cycle through webpages
page = 'https://www.bandsintown.com/?came_from=257&page='
urlBucket = []
for i in range (0,2):
uniqueUrl = page + str(i)
urlBucket.append(uniqueUrl)
# dump response into an array
responseBucket = []
for i in urlBucket:
uniqueResponse = requests.get(i)
responseBucket.append(uniqueResponse)
#Make the 'soup'
soupBucket = []
for i in responseBucket:
individualSoup = BeautifulSoup(i.text, 'html.parser')
soupBucket.append(individualSoup)
# Build an array to hold script
allScript = []
for i in soupBucket:
script = i.find_all("script")[4]
eventsJSON = json.loads(script)
print script
allScript.append(script)
print allScript
Print allScript gives me the following:
[<script type="application/ld+json">[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100653596-e.r.n.e.s.t.o-at-the-endup?came_from=244","location":{"#type":"Place","name":"The EndUp","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.7726402,"longitude":-122.4099154}},"name":"E.R.N.E.S.T.O","performer":{"#type":"MusicGroup","name":"E.R.N.E.S.T.O","image":"https://photos.bandsintown.com/thumb/8618862.jpeg","url":"https://www.bandsintown.com/a/4693798-e.r.n.e.s.t.o?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8618862.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239291-j.j.-grey-and-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey & Mofro","performer":{"#type":"MusicGroup","name":"J.J. Grey & Mofro","image":"https://photos.bandsintown.com/thumb/219177.jpeg","url":"https://www.bandsintown.com/a/2327212-j.j.-grey-and-mofro?came_from=244"},"image":"https://photos.bandsintown.com/thumb/219177.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239613-j.j.-grey-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey","performer":{"#type":"MusicGroup","name":"J.J. Grey","image":"","url":"https://www.bandsintown.com/a/12437162-j.j.-grey?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239435-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"Mofro","performer":{"#type":"MusicGroup","name":"Mofro","image":"","url":"https://www.bandsintown.com/a/71714-mofro?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100542800-brooke-heinichen-at-stuffed?came_from=244","location":{"#type":"Place","name":"Stuffed","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7485824,"longitude":-122.4184108}},"name":"Brooke Heinichen","performer":{"#type":"MusicGroup","name":"Brooke Heinichen","image":"https://photos.bandsintown.com/thumb/8921909.jpeg","url":"https://www.bandsintown.com/a/14944274-brooke-heinichen?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8921909.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012486121-william-fitzsimmons-at-hopmonk-tavern?came_from=244","location":{"#type":"Place","name":"Hopmonk Tavern","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.088489,"longitude":-122.553449}},"name":"William Fitzsimmons","performer":{"#type":"MusicGroup","name":"William Fitzsimmons","image":"https://photos.bandsintown.com/thumb/8852940.jpeg","url":"https://www.bandsintown.com/a/2450-william-fitzsimmons?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8852940.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100581554-kevin-paris-at-acoustic-yoga-#-yoga-source-los-gatos?came_from=244","location":{"#type":"Place","name":"Acoustic Yoga # Yoga Source Los Gatos","address":"Los Gatos, CA","geo":{"#type":"GeoCoordinates","latitude":37.2358078,"longitude":-121.9623751}},"name":"Kevin Paris","performer":{"#type":"MusicGroup","name":"Kevin Paris","image":"https://photos.bandsintown.com/thumb/8419497.jpeg","url":"https://www.bandsintown.com/a/1134314-kevin-paris?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8419497.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100692435-zak-fennie-at-black-stallion-winery?came_from=244","location":{"#type":"Place","name":"Black Stallion Winery","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.35983179999999,"longitude":-122.2906388}},"name":"Zak Fennie","performer":{"#type":"MusicGroup","name":"Zak Fennie","image":"https://photos.bandsintown.com/thumb/8851546.jpeg","url":"https://www.bandsintown.com/a/11843851-zak-fennie?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8851546.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100621943-frances-ancheta-at-off-the-grid-at-alameda-south-shore-center?came_from=244","location":{"#type":"Place","name":"Off the Grid at Alameda South Shore Center ","address":"Alameda, CA","geo":{"#type":"GeoCoordinates","latitude":37.7712165,"longitude":-122.2824021}},"name":"Frances Ancheta","performer":{"#type":"MusicGroup","name":"Frances Ancheta","image":"https://photos.bandsintown.com/thumb/8483059.jpeg","url":"https://www.bandsintown.com/a/7762254-frances-ancheta?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8483059.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013412612-pizza!-at-audio-nightclub?came_from=244","location":{"#type":"Place","name":"Audio Nightclub","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.771362,"longitude":-122.413795}},"name":"Pizza!","performer":{"#type":"MusicGroup","name":"Pizza!","image":"https://photos.bandsintown.com/thumb/161356.jpeg","url":"https://www.bandsintown.com/a/198680-pizza!?came_from=244"},"image":"https://photos.bandsintown.com/thumb/161356.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100372855-ryan-scott-long-at-drake's-barrel-house?came_from=244","location":{"#type":"Place","name":"Drake\u2019s barrel house ","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"Ryan Scott Long","performer":{"#type":"MusicGroup","name":"Ryan Scott Long","image":"https://photos.bandsintown.com/thumb/8671372.jpeg","url":"https://www.bandsintown.com/a/3168705-ryan-scott-long?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8671372.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012999412-come-from-away-at-golden-gate-theater?came_from=244","location":{"#type":"Place","name":"Golden Gate Theater","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7825715,"longitude":-122.4110742}},"name":"Come From Away","performer":{"#type":"MusicGroup","name":"Come From Away","image":"","url":"https://www.bandsintown.com/a/13889714-come-from-away?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100441096-and-then-came-humans-at-drake's-brewing-company?came_from=244","location":{"#type":"Place","name":"Drake\u2019s Brewing Company","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"And Then Came Humans","performer":{"#type":"MusicGroup","name":"And Then Came Humans","image":"https://photos.bandsintown.com/thumb/8897159.jpeg","url":"https://www.bandsintown.com/a/13151463-and-then-came-humans?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8897159.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1011601412-man-go-at-el-rio?came_from=244","location":{"#type":"Place","name":"El Rio","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7467828,"longitude":-122.4193922}},"name":"Man-Go","performer":{"#type":"MusicGroup","name":"Man-Go","image":"","url":"https://www.bandsintown.com/a/3238684-man-go?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013320819-paul-mehling-at-freight-and-salvage-coffeehouse?came_from=244","location":{"#type":"Place","name":"Freight & Salvage Coffeehouse","address":"Berkeley, CA","geo":{"#type":"GeoCoordinates","latitude":37.8708715,"longitude":-122.2695117}},"name":"Paul Mehling","performer":{"#type":"MusicGroup","name":"Paul Mehling","image":"","url":"https://www.bandsintown.com/a/3307749-paul-mehling?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100672210-dj-spooky-at-catharine-clark-gallery?came_from=244","location":{"#type":"Place","name":"Catharine Clark Gallery","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.76639,"longitude":-122.40704}},"name":"DJ Spooky","performer":{"#type":"MusicGroup","name":"DJ Spooky","image":"https://photos.bandsintown.com/thumb/7060233.jpeg","url":"https://www.bandsintown.com/a/64476-dj-spooky?came_from=244"},"image":"https://photos.bandsintown.com/thumb/7060233.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012003162-craig-ventresco-at-atlas-cafe?came_from=244","location":{"#type":"Place","name":"Atlas Cafe","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.73189,"longitude":-122.47615}},"name":"Craig Ventresco","performer":{"#type":"MusicGroup","name":"Craig Ventresco","image":"","url":"https://www.bandsintown.com/a/139634-craig-ventresco?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100555258-rusty-jackson-music-at-kawika's-ocean-beach-deli?came_from=244","location":{"#type":"Place","name":"Kawika's Ocean Beach Deli","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.774627,"longitude":-122.509993}},"name":"Rusty Jackson Music","performer":{"#type":"MusicGroup","name":"Rusty Jackson Music","image":"https://photos.bandsintown.com/thumb/8250003.jpeg","url":"https://www.bandsintown.com/a/9978762-rusty-jackson-music?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8250003.jpeg"}]</script>, <script type="application/ld+json">[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100653596-e.r.n.e.s.t.o-at-the-endup?came_from=244","location":{"#type":"Place","name":"The EndUp","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.7726402,"longitude":-122.4099154}},"name":"E.R.N.E.S.T.O","performer":{"#type":"MusicGroup","name":"E.R.N.E.S.T.O","image":"https://photos.bandsintown.com/thumb/8618862.jpeg","url":"https://www.bandsintown.com/a/4693798-e.r.n.e.s.t.o?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8618862.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239291-j.j.-grey-and-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey & Mofro","performer":{"#type":"MusicGroup","name":"J.J. Grey & Mofro","image":"https://photos.bandsintown.com/thumb/219177.jpeg","url":"https://www.bandsintown.com/a/2327212-j.j.-grey-and-mofro?came_from=244"},"image":"https://photos.bandsintown.com/thumb/219177.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239613-j.j.-grey-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey","performer":{"#type":"MusicGroup","name":"J.J. Grey","image":"","url":"https://www.bandsintown.com/a/12437162-j.j.-grey?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239435-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"Mofro","performer":{"#type":"MusicGroup","name":"Mofro","image":"","url":"https://www.bandsintown.com/a/71714-mofro?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100542800-brooke-heinichen-at-stuffed?came_from=244","location":{"#type":"Place","name":"Stuffed","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7485824,"longitude":-122.4184108}},"name":"Brooke Heinichen","performer":{"#type":"MusicGroup","name":"Brooke Heinichen","image":"https://photos.bandsintown.com/thumb/8921909.jpeg","url":"https://www.bandsintown.com/a/14944274-brooke-heinichen?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8921909.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012486121-william-fitzsimmons-at-hopmonk-tavern?came_from=244","location":{"#type":"Place","name":"Hopmonk Tavern","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.088489,"longitude":-122.553449}},"name":"William Fitzsimmons","performer":{"#type":"MusicGroup","name":"William Fitzsimmons","image":"https://photos.bandsintown.com/thumb/8852940.jpeg","url":"https://www.bandsintown.com/a/2450-william-fitzsimmons?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8852940.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100581554-kevin-paris-at-acoustic-yoga-#-yoga-source-los-gatos?came_from=244","location":{"#type":"Place","name":"Acoustic Yoga # Yoga Source Los Gatos","address":"Los Gatos, CA","geo":{"#type":"GeoCoordinates","latitude":37.2358078,"longitude":-121.9623751}},"name":"Kevin Paris","performer":{"#type":"MusicGroup","name":"Kevin Paris","image":"https://photos.bandsintown.com/thumb/8419497.jpeg","url":"https://www.bandsintown.com/a/1134314-kevin-paris?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8419497.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100692435-zak-fennie-at-black-stallion-winery?came_from=244","location":{"#type":"Place","name":"Black Stallion Winery","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.35983179999999,"longitude":-122.2906388}},"name":"Zak Fennie","performer":{"#type":"MusicGroup","name":"Zak Fennie","image":"https://photos.bandsintown.com/thumb/8851546.jpeg","url":"https://www.bandsintown.com/a/11843851-zak-fennie?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8851546.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100621943-frances-ancheta-at-off-the-grid-at-alameda-south-shore-center?came_from=244","location":{"#type":"Place","name":"Off the Grid at Alameda South Shore Center ","address":"Alameda, CA","geo":{"#type":"GeoCoordinates","latitude":37.7712165,"longitude":-122.2824021}},"name":"Frances Ancheta","performer":{"#type":"MusicGroup","name":"Frances Ancheta","image":"https://photos.bandsintown.com/thumb/8483059.jpeg","url":"https://www.bandsintown.com/a/7762254-frances-ancheta?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8483059.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013412612-pizza!-at-audio-nightclub?came_from=244","location":{"#type":"Place","name":"Audio Nightclub","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.771362,"longitude":-122.413795}},"name":"Pizza!","performer":{"#type":"MusicGroup","name":"Pizza!","image":"https://photos.bandsintown.com/thumb/161356.jpeg","url":"https://www.bandsintown.com/a/198680-pizza!?came_from=244"},"image":"https://photos.bandsintown.com/thumb/161356.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100372855-ryan-scott-long-at-drake's-barrel-house?came_from=244","location":{"#type":"Place","name":"Drake\u2019s barrel house ","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"Ryan Scott Long","performer":{"#type":"MusicGroup","name":"Ryan Scott Long","image":"https://photos.bandsintown.com/thumb/8671372.jpeg","url":"https://www.bandsintown.com/a/3168705-ryan-scott-long?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8671372.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012999412-come-from-away-at-golden-gate-theater?came_from=244","location":{"#type":"Place","name":"Golden Gate Theater","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7825715,"longitude":-122.4110742}},"name":"Come From Away","performer":{"#type":"MusicGroup","name":"Come From Away","image":"","url":"https://www.bandsintown.com/a/13889714-come-from-away?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100441096-and-then-came-humans-at-drake's-brewing-company?came_from=244","location":{"#type":"Place","name":"Drake\u2019s Brewing Company","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"And Then Came Humans","performer":{"#type":"MusicGroup","name":"And Then Came Humans","image":"https://photos.bandsintown.com/thumb/8897159.jpeg","url":"https://www.bandsintown.com/a/13151463-and-then-came-humans?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8897159.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1011601412-man-go-at-el-rio?came_from=244","location":{"#type":"Place","name":"El Rio","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7467828,"longitude":-122.4193922}},"name":"Man-Go","performer":{"#type":"MusicGroup","name":"Man-Go","image":"","url":"https://www.bandsintown.com/a/3238684-man-go?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013320819-paul-mehling-at-freight-and-salvage-coffeehouse?came_from=244","location":{"#type":"Place","name":"Freight & Salvage Coffeehouse","address":"Berkeley, CA","geo":{"#type":"GeoCoordinates","latitude":37.8708715,"longitude":-122.2695117}},"name":"Paul Mehling","performer":{"#type":"MusicGroup","name":"Paul Mehling","image":"","url":"https://www.bandsintown.com/a/3307749-paul-mehling?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100672210-dj-spooky-at-catharine-clark-gallery?came_from=244","location":{"#type":"Place","name":"Catharine Clark Gallery","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.76639,"longitude":-122.40704}},"name":"DJ Spooky","performer":{"#type":"MusicGroup","name":"DJ Spooky","image":"https://photos.bandsintown.com/thumb/7060233.jpeg","url":"https://www.bandsintown.com/a/64476-dj-spooky?came_from=244"},"image":"https://photos.bandsintown.com/thumb/7060233.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012003162-craig-ventresco-at-atlas-cafe?came_from=244","location":{"#type":"Place","name":"Atlas Cafe","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.73189,"longitude":-122.47615}},"name":"Craig Ventresco","performer":{"#type":"MusicGroup","name":"Craig Ventresco","image":"","url":"https://www.bandsintown.com/a/139634-craig-ventresco?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100555258-rusty-jackson-music-at-kawika's-ocean-beach-deli?came_from=244","location":{"#type":"Place","name":"Kawika's Ocean Beach Deli","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.774627,"longitude":-122.509993}},"name":"Rusty Jackson Music","performer":{"#type":"MusicGroup","name":"Rusty Jackson Music","image":"https://photos.bandsintown.com/thumb/8250003.jpeg","url":"https://www.bandsintown.com/a/9978762-rusty-jackson-music?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8250003.jpeg"}]</script>]
But, printing eventsJSON gives me an error:
TypeError: expected string or buffer
I want to be able to build a new JSON based on specific attributes in eventsJsonLd, ie "startDate", "name", etc. Can anyone tell me where I'm going wrong? Thanks in advance.
You are passing the script tag into json.loads, this is not string but an object of the bs4.element.Tag class.
script = i.find_all("script")[4]
print(type(script))
Output
<class 'bs4.element.Tag'>
You need to get the text from the tag and pass it to json.loads
eventsJSON = json.loads(script.text)
Note:
The current url you try (https://www.bandsintown.com/?came_from=257&page=0) has the contents of that script tag as empty, i was able to get an output for a different url (https://www.bandsintown.com/a/29109-pop-rocks) of the same domain.
print(eventsJSON[0])
Gave an output
{u'startDate': u'2019-02-15T21:00:00', u'performer': {u'url': u'https://www.bandsintown.com/a/29109-pop-rocks?came_from=244', u'image': u'https://photos.bandsintown.com/thumb/8532836.jpeg', u'#type': u'MusicGroup', u'name': u'Pop Rocks'}, u'name': u'Pop Rocks', u'url': u'https://www.bandsintown.com/e/100544648-pop-rocks-at-the-chapel?came_from=244', u'image': u'https://photos.bandsintown.com/thumb/8532836.jpeg', u'location': {u'address': u'San Francisco, CA', u'geo': {u'latitude': 37.7485824, u'#type': u'GeoCoordinates', u'longitude': -122.4184108}, u'#type': u'Place', u'name': u'The Chapel'}, u'#context': u'http://schema.org', u'#type': u'MusicEvent', u'description': u'Pop Rocks at The Chapel 2019-02-15T21:00:00'}

Error in fromJSON in R

I am trying to convert JSON file into data frame in R and had this error when I ran fromJSON in 'jsonlite' package.
Error in fromJSON(content, handler, default.size, depth, allowComments, :
Invalid JSON Node
I figured that the variable order are different in each row. In specific, for example, in the first row, "categories" is the second variable whereas it is the last variable in the third row (Please take a look at the data below). This is how the original JSON file is formatted, and I have several hundred thousand rows like this so can't manually change it.
Anyone can suggest how to fix this and successfully convert the JSON into data frame in R?
[{"asin": "0188399313", "categories": [["Baby"]], "description": "Wee-Go Glass baby bottles by LifeFactory (Babylife) are designed to grow with your child. The included clear cover can also serve as an easy to hold cup. Twist on the solid cap (sold separately) and use your bottles for storing juice or snacks. Perfect for a lunchbox or traveling. The bright colored silicone sleeve (patent pending) helps to protect the bottle from breakage and provides a great gripping surface and tactile experience during feeding. The bottle and sleeve can be boiled or put in the dishwasher together. They can also go in the freezer, making breast milk storage simple.", "title": "Lifefactory 4oz BPA Free Glass Baby Bottles - 4-pack-raspberry and Lilac", "price": 69.99, "imUrl": "http://ecx.images-amazon.com/images/I/41SwthpdD9L._SX300_.jpg", "brand": "Lifefactory", "related": {"also_bought": ["B002SG7K7A", "B003CJSXW8", "B004PW4186", "B002O3JH9Q", "B002O3NLIO", "B004HGSU28"], "also_viewed": ["B003CJSXW8", "B0052QOL1Q", "B004PW4186", "B00EN0OLZ8", "B00EN0OOQY", "B0049YS46K", "B00E64CBLM", "B00F9YOOS6", "B00AH9RPVQ", "B00BCU2R7G", "B002O3NLIO", "B008NZ4X2K", "B005NIDFEW", "B00DKPJCH4", "B00CZNGWWK", "B00DAKJIQ4", "B005CT55IQ", "B0049YRJM0", "B0071IEWD0", "B00E64CA68", "B00IUB3SKK", "B00A7AA6XY", "B001F50FFE", "B002HU9EO4", "B007HP11SQ", "B009WPUMX4", "B002O3JH9Q", "B00F2FT3K6", "B00I5CR35A", "B00BCTY5EK", "B002SG7K7A", "B00F2FLU2U", "B0062ZK0GQ", "B002UOFR66", "B0055LKQQ2", "B00A0FGN8I", "B00HMYCG2W", "B00DHFLUO0", "B0040HMPA2", "B00I5CT9XE", "B008B5MMNO", "B00BQYVNGO", "B00925WM28", "B00BGKC3EY", "B005Q3LSDO", "B0038JDVCE", "B0045I6IA4"], "bought_together": ["B002SG7K7A", "B003CJSXW8"], "buy_after_viewing": ["B003CJSXW8", "B0052QOL1Q", "B004PW4186", "B002SG7K7A"]},},
{"asin": "0188399518", "categories": [["Baby"]], "description": "The Planet Wise Flannel Wipes are 10 super soft, color coordinated, high-quality wipes that fit perfectly into the Planet Wise Wipe PouchTM.", "title": "Planetwise Flannel Wipes", "price": 15.95, "imUrl": "http://ecx.images-amazon.com/images/I/41otjnA4OGL._SY300_.jpg", "brand": "Planet Wise", "related": {"also_bought": ["B00G96N3YY", "B003XSEV2O", "B000138GNY", "B005WWIE3G", "B005WWI0DA", "B005WWIMGA", "B00DS4WPNK", "B0039VCRPI", "B001QIN6ME", "B004GMGLN8", "B00CJ2OWUG", "B004BDNJW8", "B003N0JXSO", "B003X3R6TO", "B002T5Q01C", "B006J2U4T0", "B00JBJDEC2", "B00380LVLG", "B00GJVM2NW", "B003AJHDQW", "B0043EDGP0", "B0012IJBUE", "B001OI0YWG", "B005DL5970", "B00305GSKS", "B00CQ9UUK8", "B002Y27PQ4", "B00AH8J448", "B00A3JXVZY", "B00483GAJU", "B001J6O6B8", "B0019ID6G2", "B005DL7LGM", "B00APSYDXM", "B004D5KJJA", "B0021HR94K", "0757302661", "B001HX4DNE"], "also_viewed": ["B000138GNY", "B00G96N3YY", "B006J2U4T0", "B00GLBR3C0", "B0039VCRPI", "B0012IJBUE", "B001IA2XGK", "B001R2L6PS", "B003X3R6TO", "B00DS4WPNK", "B005WWIE3G", "B004BDNJW8", "B005WWIMGA", "B005WWI0DA", "B003XSEV2O", "B004A5VGPE", "B004A5TTGC", "B00JBJDEC2", "B004GMGLN8", "B009W6PM9M", "B004RGL9I0", "B0040IWMTK", "B00J3C0XC6", "B00HO1HBC8", "B00G96N18M", "B00A3V0XDA", "B0032AMM9M", "B001QIN6ME", "B00A3V0XEY", "B00CIUAB40", "B00J3C0Y1G", "B005IPE0KI", "B00L0W5B7Y", "B00FZGLKQW", "B00HBYOE3W", "B004A5TT4O", "B00B6KS578", "B002HOQOMA", "B004G792KW", "B002UD6BZS", "B002XVRKHK", "B002HOQOUW", "B002UD6C1Q", "B00APSYDXM", "B00H07AZ42", "B002LZX2BG", "B00BFQVZNO", "B00EZKNVYS", "B00DCLYYJW", "B00AHG8H1Q", "B002QZ64T8", "B003N0JXSO", "B003AJHDQW", "B00DN9NW32", "B00CHPJ0I4", "B005WXMJUY", "B00D4LFC3W"], "bought_together": ["B00G96N3YY", "B003XSEV2O"], "buy_after_viewing": ["B000138GNY", "B006J2U4T0", "B00G96N3YY", "B005WWIMGA"]},},
{"asin": "0188399399", "description": "The Planet Wise Wipe PouchTM features our patent pending no-leak design so your wipes will stay moist and not wick into your diaper bag. Features a unique snap down design to eliminate wicking from the zipper area. Most standard size wipes will fit into our Wipe PouchTM perfectly! Pouches are made of the same high quality fabrics as our Wet Bags, this pouch will last and last. Pouches are anti-microbial too! After you are done with the baby years, keep this pouch to use for just about anything you need to transport: make-up, toiletries, anything. Made in the USA!", "title": "Planetwise Wipe Pouch", "price": 10.95, "imUrl": "http://ecx.images-amazon.com/images/I/61x8h9u6mxL._SY300_.jpg", "related": {"also_bought": ["B005WWI0DA", "B005WWIMGA", "B006J2U4T0", "B000138GNY", "B003XSEV2O", "B005WWIE3G", "B00DS4WPNK", "B0039VCRPI", "B003GSLCOG", "B0018B15FE", "B002MN3JY2"], "also_viewed": ["B005WWIMGA", "B00G96N3YY", "B00DAI76TC", "B0067GKHVS", "B005WWIE3G", "B004RGL9I0", "B000138GNY", "B00JBJDEC2", "B00CMCJ2AS", "B005WWI0DA", "B002UD6C16", "B00E401FVU", "B004GMGLN8", "B0012IJBUE", "B00BFQVZNO", "B006J2U4T0", "B000VV21G4", "B002HOQOUW", "B003X3R6TO", "B000GZJIVQ", "B0052AIF00", "B005GQ6606", "B003TZHCA4", "B001QIN6ME", "B003XSEV2O", "B00GLBR3C0", "B00DW9R03G", "B00J3C0XC6", "0188399542", "B00DQL6CIE", "B002UD6C1G", "B00A3V0XDA", "B0039VCRPI", "B00EZKNVYS", "B006QRFDHQ", "B00DS4WPNK", "B00CQ9UUK8", "B0087A2PKS", "B00GGNICSW", "B0037NXP18", "B000YCNLSC", "B00FJG945W", "B00FZGLKQW", "B004A5VGPE", "B00D4LFC3W", "B00G96N18M", "B00439UNR4", "B004A5TT4O", "B005DL7LGM", "B006SD2JUW", "B006Z6HXYY", "B006QRFDZI", "B00I0P7VYK", "B00AZWDM0I", "B000ZKHVMU", "B007799PGC", "B00H58UNMU", "B00JKUU3TE"], "buy_after_viewing": ["B005WWIMGA", "B00G96N3YY", "B00DAI76TC", "B0067GKHVS"]},, "categories": [["Baby"]]},
{"asin": "0316967297", "description": "Hand crafted set includes 1 full quilt (76x86 inches) and 2 standard shams (20x26 inches). Face cloth and fill are 100 natural cotton. Prewashed for out of the bag comfort. Hand crafted with embroidery. Machine washable. Made in China", "title": "Annas Dream Full Quilt with 2 Shams", "price": 109.95, "imUrl": "http://ecx.images-amazon.com/images/I/51%2BZ1%2BNeukL._SY300_.jpg", "related": {"also_viewed": ["B009LTER3W", "B00575TI5Q", "B004NSYYJI"], "buy_after_viewing": ["B009LTER3W", "B001MX5EE6", "B00575TI5Q", "B0029009TG"]},, "categories": [["Baby"]]},
{"asin": "0615447279", "categories": [["Baby"]], "description": "Thumbuddy To Love- The Binky Fairy helps children give up pacifier sucking without the fuss and tears. The adorable book comes with a matching Binky Fairy puppet and just like the Tooth Fairy, the book is read the night before so children understand where thier pacifiers will go. They awake to find an adorable Binky Fairy puppet under thier pillow knowing that the Binky Fairy came! Each book comes with a success chart and stickers. Winner of the PTPA Awards (Parent Tested, Parent Approved). Recommended for ages 2-4.", "title": "Stop Pacifier Sucking without tears with Thumbuddy To Love s Binky Fairy Puppet and Adorable Book", "price": 16.95, "imUrl": "http://ecx.images-amazon.com/images/I/51RKKENlq%2BL._SY300_.jpg", "brand": "", "related": {"also_bought": ["0979670004", "1601310234", "B005G172KE", "1575422573", "1905417896", "B0044D0HA2", "B001GQ2CPI"], "also_viewed": ["0979670004", "1575422573", "1493535943", "0615273645", "1601310234", "0992616727", "0615273793", "0763623644", "1907152962", "0992616719", "B0016ZX7AI", "0698400488", "1608442713", "0451416058", "1581176848", "B0072CTECY", "B005G172KE", "B0016P8K3W", "0375822704", "0979201004", "0961478020", "160131048X", "1572245859"], "bought_together": ["B005G172KE"], "buy_after_viewing": ["0979670004", "1575422573", "B005G172KE", "1493535943"]},}]

What are language codes in Chrome's implementation of the HTML5 speech recognition API?

Chrome implemented the HTML5 speech recognition API. Many languages are supported. I wanna know which languages are supported and each language's corresponding code which is used in the HTML element's lang attribute.
For instance:
Polish (pl-PL)
Turkish (tr-TR)
Thank you!
Ok, if it is not published, we can try to at least figure this out.
Let me put this table for the beginning and we will refine it if someone has more information.
I'm making assumption that supported languages shall be similar to those supported by voice search and that google uses standard language codes and does that consistently across its services.
I've looked up languages supported by voice search on wikipedia
I've found language codes here, on google language settings page and here
EDIT:
I've experimented with backend voice recognition service. I've run a series of tests where I've passed the same english speech sample to the API but specified different dialect every time. It looks like:
If a language is not supported, recognition falls back to en-US (looks like it recognizes that the sample is in english)
If a dialect is not supported (or doesn't exist) recognition falls back to main dialect or en-US in some cases
Main dialect can be specified just as first part of identifier. So 'en-US' and 'en' gives same results.
Recognition for some languages, like chinese and japanese gives results in english, different from en-US though, which is strange. Probably the sample is different very much from chinese and the service is clever to figure that out.
I treat a dialect as supported if recognition gives a different result from en-US and from main dialect for the language. Still, to verify it 100% we need to run samples for each language.
Legend
+ Most of all supported, because test gives a result different from en-US and main dialect.
.+ Absent on wikipedia but most of all supported, because test gives result different from en-US and main dialect.
+? Most of all supported because it is listed on wikipedia. But test on my sample gives result identical to the main dialect. So either this is a coinsidense or language code is wrong.
.+? Not listed on wikipedia but looks like supported, because test gives result different from en-US and main dialect.
Languages
+ Afrikaans af
+ Basque eu
+ Bulgarian bg
+ Catalan ca
+ Arabic (Egypt) ar-EG
+? Arabic (Jordan) ar-JO
+ Arabic (Kuwait) ar-KW
+? Arabic (Lebanon) ar-LB
+ Arabic (Qatar) ar-QA
+ Arabic (UAE) ar-AE
.+ Arabic (Morocco) ar-MA
.+ Arabic (Iraq) ar-IQ
.+ Arabic (Algeria) ar-DZ
.+ Arabic (Bahrain) ar-BH
.+ Arabic (Lybia) ar-LY
.+ Arabic (Oman) ar-OM
.+ Arabic (Saudi Arabia) ar-SA
.+ Arabic (Tunisia) ar-TN
.+ Arabic (Yemen) ar-YE
+ Czech cs
+ Dutch nl-NL
+ English (Australia) en-AU
+? English (Canada) en-CA
+ English (India) en-IN
+ English (New Zealand) en-NZ
+ English (South Africa) en-ZA
+ English(UK) en-GB
+ English(US) en-US
+ Finnish fi
+ French fr-FR
+ Galician gl
+ German de-DE
+ Hebrew he
+ Hungarian hu
+ Icelandic is
+ Italian it-IT
+ Indonesian id
+ Japanese ja
+ Korean ko
+ Latin la
+ Mandarin Chinese zh-CN
+ Traditional Taiwan zh-TW
+? Simplified China zh-CN ?
+ Simplified Hong Kong zh-HK
+ Yue Chinese (Traditional Hong Kong) zh-yue
+ Malaysian ms-MY
+ Norwegian no-NO
+ Polish pl
+? Pig Latin xx-piglatin
+ Portuguese pt-PT
.+ Portuguese (brasil) pt-BR
+ Romanian ro-RO
+ Russian ru
+ Serbian sr-SP
+ Slovak sk
+ Spanish (Argentina) es-AR
+ Spanish(Bolivia) es-BO
+? Spanish( Chile) es-CL
+? Spanish (Colombia) es-CO
+? Spanish(Costa Rica) es-CR
+ Spanish(Dominican Republic) es-DO
+ Spanish(Ecuador) es-EC
+ Spanish(El Salvador) es-SV
+ Spanish(Guatemala) es-GT
+ Spanish(Honduras) es-HN
+ Spanish(Mexico) es-MX
+ Spanish(Nicaragua) es-NI
+ Spanish(Panama) es-PA
+ Spanish(Paraguay) es-PY
+ Spanish(Peru) es-PE
+ Spanish(Puerto Rico) es-PR
+ Spanish(Spain) es-ES
+ Spanish(US) es-US
+ Spanish(Uruguay) es-UY
+ Spanish(Venezuela) es-VE
+ Swedish sv-SE
+ Turkish tr
+ Zulu zu
I know this is an old post, but since this information is annoyingly hard to find I thought I'd post a list for anyone who might be looking. Please leave a note if you find any errors or omissions.
{
"Afrikaans": [
["South Africa", "af-ZA"]
],
"Arabic" : [
["Algeria","ar-DZ"],
["Bahrain","ar-BH"],
["Egypt","ar-EG"],
["Israel","ar-IL"],
["Iraq","ar-IQ"],
["Jordan","ar-JO"],
["Kuwait","ar-KW"],
["Lebanon","ar-LB"],
["Morocco","ar-MA"],
["Oman","ar-OM"],
["Palestinian Territory","ar-PS"],
["Qatar","ar-QA"],
["Saudi Arabia","ar-SA"],
["Tunisia","ar-TN"],
["UAE","ar-AE"]
],
"Basque": [
["Spain", "eu-ES"]
],
"Bulgarian": [
["Bulgaria", "bg-BG"]
],
"Catalan": [
["Spain", "ca-ES"]
],
"Chinese Mandarin": [
["China (Simp.)", "cmn-Hans-CN"],
["Hong Kong SAR (Trad.)", "cmn-Hans-HK"],
["Taiwan (Trad.)", "cmn-Hant-TW"]
],
"Chinese Cantonese": [
["Hong Kong", "yue-Hant-HK"]
],
"Croatian": [
["Croatia", "hr_HR"]
],
"Czech": [
["Czech Republic", "cs-CZ"]
],
"Danish": [
["Denmark", "da-DK"]
],
"English": [
["Australia", "en-AU"],
["Canada", "en-CA"],
["India", "en-IN"],
["Ireland", "en-IE"],
["New Zealand", "en-NZ"],
["Philippines", "en-PH"],
["South Africa", "en-ZA"],
["United Kingdom", "en-GB"],
["United States", "en-US"]
],
"Farsi": [
["Iran", "fa-IR"]
],
"French": [
["France", "fr-FR"]
],
"Filipino": [
["Philippines", "fil-PH"]
],
"Galician": [
["Spain", "gl-ES"]
],
"German": [
["Germany", "de-DE"]
],
"Greek": [
["Greece", "el-GR"]
],
"Finnish": [
["Finland", "fi-FI"]
],
"Hebrew" :[
["Israel", "he-IL"]
],
"Hindi": [
["India", "hi-IN"]
],
"Hungarian": [
["Hungary", "hu-HU"]
],
"Indonesian": [
["Indonesia", "id-ID"]
],
"Icelandic": [
["Iceland", "is-IS"]
],
"Italian": [
["Italy", "it-IT"],
["Switzerland", "it-CH"]
],
"Japanese": [
["Japan", "ja-JP"]
],
"Korean": [
["Korea", "ko-KR"]
],
"Lithuanian": [
["Lithuania", "lt-LT"]
],
"Malaysian": [
["Malaysia", "ms-MY"]
],
"Dutch": [
["Netherlands", "nl-NL"]
],
"Norwegian": [
["Norway", "nb-NO"]
],
"Polish": [
["Poland", "pl-PL"]
],
"Portuguese": [
["Brazil", "pt-BR"],
["Portugal", "pt-PT"]
],
"Romanian": [
["Romania", "ro-RO"]
],
"Russian": [
["Russia", "ru-RU"]
],
"Serbian": [
["Serbia", "sr-RS"]
],
"Slovak": [
["Slovakia", "sk-SK"]
],
"Slovenian": [
["Slovenia", "sl-SI"]
],
"Spanish": [
["Argentina", "es-AR"],
["Bolivia", "es-BO"],
["Chile", "es-CL"],
["Colombia", "es-CO"],
["Costa Rica", "es-CR"],
["Dominican Republic", "es-DO"],
["Ecuador", "es-EC"],
["El Salvador", "es-SV"],
["Guatemala", "es-GT"],
["Honduras", "es-HN"],
["México", "es-MX"],
["Nicaragua", "es-NI"],
["Panamá", "es-PA"],
["Paraguay", "es-PY"],
["Perú", "es-PE"],
["Puerto Rico", "es-PR"],
["Spain", "es-ES"],
["Uruguay", "es-UY"],
["United States", "es-US"],
["Venezuela", "es-VE"]
],
"Swedish": [
["Sweden", "sv-SE"]
],
"Thai": [
["Thailand", "th-TH"]
],
"Turkish": [
["Turkey", "tr-TR"]
],
"Ukrainian": [
["Ukraine", "uk-UA"]
],
"Vietnamese": [
["Viet Nam", "vi-VN"]
],
"Zulu": [
["South Africa", "zu-ZA"]
]
}
Edit: I also found this list, which is probably more current:
https://cloud.google.com/speech-to-text/docs/languages
Edit 2: Adding this list of sample voices as well: https://cloud.google.com/text-to-speech/docs/voices
Use the following code to get all available voices for the speech API in your browser:
var voices = speechSynthesis.getVoices();
for(var i = 0; i < voices.length; i++ ) {
console.log("Voice " + i.toString() + ' ' + voices[i].name + ' ' + voices[i].uri);
}
At this time only Chrome and Safari support the Web Speech API (although Safari only supports the Text to Speech functionalities). Curiously Firefox OS supports TTS but the browser version does not.
The list of languages depends on what browser you are on according to both the documentation and my tests (user agent dependent).
In Safari you also get lots of languages available (I believe over 40). In Chrome, at this time you get the following list:
Voice 0 Google US English undefined
Voice 1 Google UK English Male undefined
Voice 2 Google UK English Female undefined
Voice 3 Google Español undefined
Voice 4 Google Français undefined
Voice 5 Google Italiano undefined
Voice 6 Google Deutsch undefined
Voice 7 Google 日本人 undefined
Voice 8 Google 한국의 undefined
Voice 9 Google 中国的 undefined
Voice 10 native undefined
Here you have #TimHayes in a LinkedHashMap where you can fetch the values. Im using LinkedHashMap so I can get the position of the map.
LinkedHashMap<String,String> country = new LinkedHashMap<String,String>();
country.put("South Africa", "af-ZA");
country.put("Algeria", "ar-DZ");
country.put("Bahrain", "ar-BH");
country.put("Egypt", "ar-EG");
country.put("Israel", "ar-IL");
country.put("Iraq", "ar-IQ");
country.put("Jordan", "ar-JO");
country.put("Kuwait", "ar-KW");
country.put("Lebanon", "ar-LB");
country.put("Morocco", "ar-MA");
country.put("Oman", "ar-OM");
country.put("Palestinian Territory", "ar-PS");
country.put("Qatar", "ar-QA");
country.put("Saudi Arabia", "ar-SA");
country.put("Tunisia", "ar-TN");
country.put("UAE", "ar-AE");
country.put("Spain", "eu-ES");
country.put("Bulgaria", "bg-BG");
country.put("Spain", "ca-ES");
country.put("China (Simp.)", "cmn-Hans-CN");
country.put("Hong Kong SAR (Trad.)", "cmn-Hans-HK");
country.put("Taiwan (Trad.)", "cmn-Hant-TW");
country.put("Hong Kong", "yue-Hant-HK");
country.put("Croatia", "hr_HR");
country.put("Czech Republic", "cs-CZ");
country.put("Denmark", "da-DK");
country.put("Australia", "en-AU");
country.put("Canada", "en-CA");
country.put("India", "en-IN");
country.put("Ireland", "en-IE");
country.put("New Zealand", "en-NZ");
country.put("Philippines", "en-PH");
country.put("South Africa", "en-ZA");
country.put("United Kingdom", "en-GB");
country.put("United States", "en-US");
country.put("Iran", "fa-IR");
country.put("France", "fr-FR");
country.put("Philippines", "fil-PH");
country.put("Spain", "gl-ES");
country.put("Germany", "de-DE");
country.put("Greece", "el-GR");
country.put("Finland", "fi-FI");
country.put("Israel", "he-IL");
country.put("India", "hi-IN");
country.put("Hungary", "hu-HU");
country.put("Indonesia", "id-ID");
country.put("Iceland", "is-IS");
country.put("Italy", "it-IT");
country.put("Switzerland", "it-CH");
country.put("Japan", "ja-JP");
country.put("Korea", "ko-KR");
country.put("Lithuania", "lt-LT");
country.put("Malaysia", "ms-MY");
country.put("Netherlands", "nl-NL");
country.put("Norway", "nb-NO");
country.put("Poland", "pl-PL");
country.put("Brazil", "pt-BR");
country.put("Portugal", "pt-PT");
country.put("Romania", "ro-RO");
country.put("Russia", "ru-RU");
country.put("Serbia", "sr-RS");
country.put("Slovakia", "sk-SK");
country.put("Slovenia", "sl-SI");
country.put("Argentina", "es-AR");
country.put("Bolivia", "es-BO");
country.put("Chile", "es-CL");
country.put("Colombia", "es-CO");
country.put("Costa Rica", "es-CR");
country.put("Dominican Republic", "es-DO");
country.put("Ecuador", "es-EC");
country.put("El Salvador", "es-SV");
country.put("Guatemala", "es-GT");
country.put("Honduras", "es-HN");
country.put("México", "es-MX");
country.put("Nicaragua", "es-NI");
country.put("Panamá", "es-PA");
country.put("Paraguay", "es-PY");
country.put("Perú", "es-PE");
country.put("Puerto Rico", "es-PR");
country.put("Spain", "es-ES");
country.put("Uruguay", "es-UY");
country.put("United States", "es-US");
country.put("Venezuela", "es-VE");
country.put("Sweden", "sv-SE");
country.put("Thailand", "th-TH");
country.put("Turkey", "tr-TR");
country.put("Ukraine", "uk-UA");
country.put("Viet Nam", "vi-VN");
country.put("South Africa", "zu-ZA");