Key Information Extraction models for text to text - deep-learning

I'm willing to get a structured text such as xml as an output. And i got unscructed texts as an input. I do little search but all i found is some key information extraction models about pdfs. Is there any model that suits my problem or if i need to create a custom model what should be my start point?
Some example of my inputs and outputs:
input:
...
1111 007XXXXXL 007 BOND LLC 1429 CIERRA ST. RICHBURG SC 29729-9367
1112 321XXXXXM 321 EQUIPMENT COMPANY PO BOX 2105 GASTONIA NC 28053
1113 360XXXXXS 360BRANDS, INC. PO BOX 2478 MT. PLEASANT SC 29465
1114 3IXXXXXG 3iD MANAGEMENT 9634 BOCA GARDENS CRL N #D BOCA RATON FL 33496
1115 4XXXXXI 4IMPRINT INC 25303 NETWORK PLACE CHICAGO IL 60673-1253
1116 911XXXC 911 C&E. L.L.C. 1513 BRIARCLIFF DRIVE ASHEBORO NC 27205
...
expected output:
<?xml version="1.0" encoding="utf-8"?>
<PO_Data>
<Vendors>
<Vendor><
Vendor_Number>1111</Vendor_Number>
<Name1>007XXXXXL</Name1>
<Address1>1429 CIERRA ST.</Address1>
<City>RICHBURG</City>
<State>SC</State>
<Zip>29729-9367</Zip>
</Vendor><Vendor>
...
</PO_Data>

Related

Beautiful soup - find_all function is returning returning only 20 items from the page. The actual results are around 250

I am using find_all in beautiful soup library to parse the HTML text.
code
headers = ({'User-Agent':
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})
URL = "https://housing.com/in/buy/searches/M1Pmp1mc1ak4wflhbs_735yq6kvim3c7hqz_3g8uxzo18sqqdcuwU2yr9t"
response = get(URL, headers=headers)
html_soup = BeautifulSoup(response.text, 'lxml')
len(html_soup)
This is returning only 20 items even though the page shows 250 results. What am I doing wrong here ?
Try (This takes all (291)):
from selenium import webdriver
import time
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
URL = "https://housing.com/in/buy/searches/M1Pmp1mc1ak4wflhbs_735yq6kvim3c7hqz_3g8uxzo18sqqdcuwU2yr9t"
driver.get(URL)
driver.maximize_window()
PAUSE_TIME = 2
lh = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(PAUSE_TIME)
nh = driver.execute_script("return document.body.scrollHeight")
if nh == lh:
break
lh = nh
articles = driver.find_elements_by_css_selector('.css-h7k7mr')
for article in articles:
print(article.text)
print('-' * 80)
driver.close()
prints:
₹45.11 L
EMI starts at ₹28.13 K
3 BHK Apartment
Bachupally, Nizampet, Hyderabad
Build Up Area
1556 sq.ft
Avg. Price
₹2.90 K/sq.ft
Special Highlights
24x7 Security
Badminton Court
Cycling & Jogging Track
Gated Community
3 BHK Apartment available for sale in Bachapally,hyderabad,beside Mama Medical College, Nizampet, Hyderabad. Available amenities are: Gym, Swimming pool, Garden, Kids area, Sports facility, Lift. Apartment has 3 bedroom, 2 bathroom.
Read more
M Srikanth
Housing Prime Agent
Contact
--------------------------------------------------------------------------------
₹37.96 L - 62.05 L
EMI starts at ₹23.67 K
Bhuvanteza Evk Aura
Marketed by Sri Avani Infra Projects
Kollur, Hyderabad
Configurations
2, 3 BHK Apartments
Possession Starts
Nov, 2022
Avg. Price
₹3.65 K/sq.ft
Real estate developer Bhuvanteza Infrastructures has launched prime housing project Evk Aura in Kollur, Hyderabad. The project is offering beautiful and comfortable 2 and 3 BHK apartments for sale. Built-up area for 2 BHK apartments is in the range of 1040 to 1185 sq ft. and for 3 BHK apartments it is 1700 sq ft. Amenities which are required for a comfortable living will be available in the complex, they are car parking, club house, swimming pool, children play area, power backup and others. Developer Bhuvanteza Infrastructures can be contacted for owning an apartment in Evk Aura. Kollur is a ...
Read more
SA
Sri Avani Infra Projects
Seller
Contact
--------------------------------------------------------------------------------
and so on....
Note selenium: You need selenium and geckodriver and in this code geckodriver is set to be imported from c:/program/geckodriver.exe
you're not reading right, there are 250 results in total but only 20 are shown, that's why you get 20 in python

Web-scraping in IBM Watson Studio Jupyter Notebook using BeautifulSoup not working

I'm looking to scrape data in an IBM Watson Studio Jupyter Notebook from this search result page:
https://www.aspc.co.uk/search/?PrimaryPropertyType=Rent&SortBy=PublishedDesc&LastUpdated=AddedAnytime&SearchTerm=&PropertyType=Residential&PriceMin=&PriceMax=&Bathrooms=&OrMoreBathrooms=true&Bedrooms=&OrMoreBedrooms=true&HasCentralHeating=false&HasGarage=false&HasDoubleGarage=false&HasGarden=false&IsNewBuild=false&IsDevelopment=false&IsParkingAvailable=false&IsPartExchangeConsidered=false&PublicRooms=&OrMorePublicRooms=true&IsHmoLicense=false&IsAllowPets=false&IsAllowSmoking=false&IsFullyFurnished=false&IsPartFurnished=false&IsUnfurnished=false&ExcludeUnderOffer=false&IncludeClosedProperties=true&ClosedDatesSearch=14&MapSearchType=EDITED&ResultView=LIST&ResultMode=NONE&AreaZoom=13&AreaCenter[lat]=57.14955426557916&AreaCenter[lng]=-2.0927401123046785&EditedZoom=13&EditedCenter[lat]=57.14955426557916&EditedCenter[lng]=-2.0927401123046785
I've tried BeautifulSoup and attempted Selenium (full disclosure: I am a beginner) over multiple variations of codes. I've gone over dozens of questions on Stack Overflow, Medium articles, etc and I cannot understand what I'm doing wrong.
The latest one I'm doing is:
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
properties_containers = html_soup.find_all('div', class_ = 'information-card property-card col ')
print(type(properties_containers))
print(len(properties_containers))
This returns 0.
<class 'bs4.element.ResultSet'>
0
Can someone please guide me in the right direction as to what I'm doing wrong/ missing?
The data you see is loaded via JavaScript. BeautifulSoup cannot execute it, but you can use requests module to load the data from their API.
For example:
import json
import requests
url = 'https://www.aspc.co.uk/search/?PrimaryPropertyType=Rent&SortBy=PublishedDesc&LastUpdated=AddedAnytime&SearchTerm=&PropertyType=Residential&PriceMin=&PriceMax=&Bathrooms=&OrMoreBathrooms=true&Bedrooms=&OrMoreBedrooms=true&HasCentralHeating=false&HasGarage=false&HasDoubleGarage=false&HasGarden=false&IsNewBuild=false&IsDevelopment=false&IsParkingAvailable=false&IsPartExchangeConsidered=false&PublicRooms=&OrMorePublicRooms=true&IsHmoLicense=false&IsAllowPets=false&IsAllowSmoking=false&IsFullyFurnished=false&IsPartFurnished=false&IsUnfurnished=false&ExcludeUnderOffer=false&IncludeClosedProperties=true&ClosedDatesSearch=14&MapSearchType=EDITED&ResultView=LIST&ResultMode=NONE&AreaZoom=13&AreaCenter[lat]=57.14955426557916&AreaCenter[lng]=-2.0927401123046785&EditedZoom=13&EditedCenter[lat]=57.14955426557916&EditedCenter[lng]=-2.0927401123046785'
api_url = 'https://api.aspc.co.uk/Property/GetProperties?{}&Sort=PublishedDesc&Page=1&PageSize=12'
params = url.split('?')[-1]
data = requests.get(api_url.format(params)).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4)) # <-- uncomment this to see all data received from server
# print some data to screen:
for property_ in data:
print(property_['Location']['AddressLine1'])
print(property_['CategorisationDescription'])
print('Bedrooms:', property_["Bedrooms"]) # <-- print number of Bedrooms
print('Bathrooms:', property_["Bathrooms"]) # <-- print number of Bathrooms
print('PublicRooms:', property_["PublicRooms"]) # <-- print number of PublicRooms
# .. etc.
print('-' * 80)
Prints:
44 Roslin Place
Fully furnished 2 Bdrm 1st flr Flat. Hall. Lounge. Dining kitch. 2 Bdrms. Bathrm (CT band - C). Deposit 1 months rent. Parking. No pets. No smokers. Rent £550 p.m Entry by arr. Viewing contact solicitors. Landlord reg: 871287/100/26061. (EPC band - B).
Bedrooms: 2
Bathrooms: 1
PublicRooms: 1
--------------------------------------------------------------------------------
Second Floor Left, 173 Victoria Road
Unfurnished 1 Bdrm 2nd flr Flat. Hall. Lounge. Dining kitch. Bdrm. Bathrm (CT Band - A). Deposit 1 months rent. No pets. No smokers. Rent £375 p.m Immed entry. Viewing contact solicitors. Landlord reg: 1261711/100/09072. (EPC band - D).
Bedrooms: 1
Bathrooms: 1
PublicRooms: 1
--------------------------------------------------------------------------------
102 Bedford Road
Fully furnished 3 Bdrm 1st flr Flat. Hall. Lounge. Kitch. 3 Bdrms. Bathrm (CT band - B). Deposit 1 months rent. Garden. HMO License. No pets. No smokers. Rent £750 p.m Entry by arr. Viewing contact solicitors. Landlord reg: 49171/100/27130. (EPC band - D).
Bedrooms: 3
Bathrooms: 1
PublicRooms: 1
--------------------------------------------------------------------------------
... and so on.

Converting JSON encoded in HTML to JSON using BeautifulSoup

I know similar questions have been asked here, but I'm still struggling to find a solution here. I'm able to parse raw HTML from the bandsintown website, using beautifulSoup, but my ultimate goal is to access the script on the page and access a JSON embedded in the script. Opening the page source, I can see that "eventsJsonLd" is what I need:
"jsonLdContainer":{"eventsJsonLd":[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-25","endDate":"2019-01-25","url":"https://www.bandsintown.com/e/100451456-pop-rocks-at-hopmonk-tavern-novato?came_from=244","location":{"#type":"Place","name":"HopMonk Tavern Novato","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.1074198,"longitude":-122.5697032}},"name":"Pop Rocks","performer":{"#type":"MusicGroup","name":"Pop Rocks","image":"https://photos.bandsintown.com/thumb/8532836.jpeg","url":"https://www.bandsintown.com/a/29109-pop-rocks?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8532836.jpeg"},
Here's my code:
#define url and build url array to cycle through webpages
page = 'https://www.bandsintown.com/?came_from=257&page='
urlBucket = []
for i in range (0,2):
uniqueUrl = page + str(i)
urlBucket.append(uniqueUrl)
# dump response into an array
responseBucket = []
for i in urlBucket:
uniqueResponse = requests.get(i)
responseBucket.append(uniqueResponse)
#Make the 'soup'
soupBucket = []
for i in responseBucket:
individualSoup = BeautifulSoup(i.text, 'html.parser')
soupBucket.append(individualSoup)
# Build an array to hold script
allScript = []
for i in soupBucket:
script = i.find_all("script")[4]
eventsJSON = json.loads(script)
print script
allScript.append(script)
print allScript
Print allScript gives me the following:
[<script type="application/ld+json">[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100653596-e.r.n.e.s.t.o-at-the-endup?came_from=244","location":{"#type":"Place","name":"The EndUp","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.7726402,"longitude":-122.4099154}},"name":"E.R.N.E.S.T.O","performer":{"#type":"MusicGroup","name":"E.R.N.E.S.T.O","image":"https://photos.bandsintown.com/thumb/8618862.jpeg","url":"https://www.bandsintown.com/a/4693798-e.r.n.e.s.t.o?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8618862.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239291-j.j.-grey-and-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey & Mofro","performer":{"#type":"MusicGroup","name":"J.J. Grey & Mofro","image":"https://photos.bandsintown.com/thumb/219177.jpeg","url":"https://www.bandsintown.com/a/2327212-j.j.-grey-and-mofro?came_from=244"},"image":"https://photos.bandsintown.com/thumb/219177.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239613-j.j.-grey-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey","performer":{"#type":"MusicGroup","name":"J.J. Grey","image":"","url":"https://www.bandsintown.com/a/12437162-j.j.-grey?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239435-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"Mofro","performer":{"#type":"MusicGroup","name":"Mofro","image":"","url":"https://www.bandsintown.com/a/71714-mofro?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100542800-brooke-heinichen-at-stuffed?came_from=244","location":{"#type":"Place","name":"Stuffed","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7485824,"longitude":-122.4184108}},"name":"Brooke Heinichen","performer":{"#type":"MusicGroup","name":"Brooke Heinichen","image":"https://photos.bandsintown.com/thumb/8921909.jpeg","url":"https://www.bandsintown.com/a/14944274-brooke-heinichen?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8921909.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012486121-william-fitzsimmons-at-hopmonk-tavern?came_from=244","location":{"#type":"Place","name":"Hopmonk Tavern","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.088489,"longitude":-122.553449}},"name":"William Fitzsimmons","performer":{"#type":"MusicGroup","name":"William Fitzsimmons","image":"https://photos.bandsintown.com/thumb/8852940.jpeg","url":"https://www.bandsintown.com/a/2450-william-fitzsimmons?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8852940.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100581554-kevin-paris-at-acoustic-yoga-#-yoga-source-los-gatos?came_from=244","location":{"#type":"Place","name":"Acoustic Yoga # Yoga Source Los Gatos","address":"Los Gatos, CA","geo":{"#type":"GeoCoordinates","latitude":37.2358078,"longitude":-121.9623751}},"name":"Kevin Paris","performer":{"#type":"MusicGroup","name":"Kevin Paris","image":"https://photos.bandsintown.com/thumb/8419497.jpeg","url":"https://www.bandsintown.com/a/1134314-kevin-paris?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8419497.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100692435-zak-fennie-at-black-stallion-winery?came_from=244","location":{"#type":"Place","name":"Black Stallion Winery","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.35983179999999,"longitude":-122.2906388}},"name":"Zak Fennie","performer":{"#type":"MusicGroup","name":"Zak Fennie","image":"https://photos.bandsintown.com/thumb/8851546.jpeg","url":"https://www.bandsintown.com/a/11843851-zak-fennie?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8851546.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100621943-frances-ancheta-at-off-the-grid-at-alameda-south-shore-center?came_from=244","location":{"#type":"Place","name":"Off the Grid at Alameda South Shore Center ","address":"Alameda, CA","geo":{"#type":"GeoCoordinates","latitude":37.7712165,"longitude":-122.2824021}},"name":"Frances Ancheta","performer":{"#type":"MusicGroup","name":"Frances Ancheta","image":"https://photos.bandsintown.com/thumb/8483059.jpeg","url":"https://www.bandsintown.com/a/7762254-frances-ancheta?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8483059.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013412612-pizza!-at-audio-nightclub?came_from=244","location":{"#type":"Place","name":"Audio Nightclub","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.771362,"longitude":-122.413795}},"name":"Pizza!","performer":{"#type":"MusicGroup","name":"Pizza!","image":"https://photos.bandsintown.com/thumb/161356.jpeg","url":"https://www.bandsintown.com/a/198680-pizza!?came_from=244"},"image":"https://photos.bandsintown.com/thumb/161356.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100372855-ryan-scott-long-at-drake's-barrel-house?came_from=244","location":{"#type":"Place","name":"Drake\u2019s barrel house ","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"Ryan Scott Long","performer":{"#type":"MusicGroup","name":"Ryan Scott Long","image":"https://photos.bandsintown.com/thumb/8671372.jpeg","url":"https://www.bandsintown.com/a/3168705-ryan-scott-long?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8671372.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012999412-come-from-away-at-golden-gate-theater?came_from=244","location":{"#type":"Place","name":"Golden Gate Theater","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7825715,"longitude":-122.4110742}},"name":"Come From Away","performer":{"#type":"MusicGroup","name":"Come From Away","image":"","url":"https://www.bandsintown.com/a/13889714-come-from-away?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100441096-and-then-came-humans-at-drake's-brewing-company?came_from=244","location":{"#type":"Place","name":"Drake\u2019s Brewing Company","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"And Then Came Humans","performer":{"#type":"MusicGroup","name":"And Then Came Humans","image":"https://photos.bandsintown.com/thumb/8897159.jpeg","url":"https://www.bandsintown.com/a/13151463-and-then-came-humans?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8897159.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1011601412-man-go-at-el-rio?came_from=244","location":{"#type":"Place","name":"El Rio","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7467828,"longitude":-122.4193922}},"name":"Man-Go","performer":{"#type":"MusicGroup","name":"Man-Go","image":"","url":"https://www.bandsintown.com/a/3238684-man-go?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013320819-paul-mehling-at-freight-and-salvage-coffeehouse?came_from=244","location":{"#type":"Place","name":"Freight & Salvage Coffeehouse","address":"Berkeley, CA","geo":{"#type":"GeoCoordinates","latitude":37.8708715,"longitude":-122.2695117}},"name":"Paul Mehling","performer":{"#type":"MusicGroup","name":"Paul Mehling","image":"","url":"https://www.bandsintown.com/a/3307749-paul-mehling?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100672210-dj-spooky-at-catharine-clark-gallery?came_from=244","location":{"#type":"Place","name":"Catharine Clark Gallery","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.76639,"longitude":-122.40704}},"name":"DJ Spooky","performer":{"#type":"MusicGroup","name":"DJ Spooky","image":"https://photos.bandsintown.com/thumb/7060233.jpeg","url":"https://www.bandsintown.com/a/64476-dj-spooky?came_from=244"},"image":"https://photos.bandsintown.com/thumb/7060233.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012003162-craig-ventresco-at-atlas-cafe?came_from=244","location":{"#type":"Place","name":"Atlas Cafe","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.73189,"longitude":-122.47615}},"name":"Craig Ventresco","performer":{"#type":"MusicGroup","name":"Craig Ventresco","image":"","url":"https://www.bandsintown.com/a/139634-craig-ventresco?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100555258-rusty-jackson-music-at-kawika's-ocean-beach-deli?came_from=244","location":{"#type":"Place","name":"Kawika's Ocean Beach Deli","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.774627,"longitude":-122.509993}},"name":"Rusty Jackson Music","performer":{"#type":"MusicGroup","name":"Rusty Jackson Music","image":"https://photos.bandsintown.com/thumb/8250003.jpeg","url":"https://www.bandsintown.com/a/9978762-rusty-jackson-music?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8250003.jpeg"}]</script>, <script type="application/ld+json">[{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100653596-e.r.n.e.s.t.o-at-the-endup?came_from=244","location":{"#type":"Place","name":"The EndUp","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.7726402,"longitude":-122.4099154}},"name":"E.R.N.E.S.T.O","performer":{"#type":"MusicGroup","name":"E.R.N.E.S.T.O","image":"https://photos.bandsintown.com/thumb/8618862.jpeg","url":"https://www.bandsintown.com/a/4693798-e.r.n.e.s.t.o?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8618862.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239291-j.j.-grey-and-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey & Mofro","performer":{"#type":"MusicGroup","name":"J.J. Grey & Mofro","image":"https://photos.bandsintown.com/thumb/219177.jpeg","url":"https://www.bandsintown.com/a/2327212-j.j.-grey-and-mofro?came_from=244"},"image":"https://photos.bandsintown.com/thumb/219177.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239613-j.j.-grey-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"J.J. Grey","performer":{"#type":"MusicGroup","name":"J.J. Grey","image":"","url":"https://www.bandsintown.com/a/12437162-j.j.-grey?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012239435-mofro-at-uptown-theatre-napa?came_from=244","location":{"#type":"Place","name":"Uptown Theatre Napa","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.2963465,"longitude":-122.2873698}},"name":"Mofro","performer":{"#type":"MusicGroup","name":"Mofro","image":"","url":"https://www.bandsintown.com/a/71714-mofro?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100542800-brooke-heinichen-at-stuffed?came_from=244","location":{"#type":"Place","name":"Stuffed","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7485824,"longitude":-122.4184108}},"name":"Brooke Heinichen","performer":{"#type":"MusicGroup","name":"Brooke Heinichen","image":"https://photos.bandsintown.com/thumb/8921909.jpeg","url":"https://www.bandsintown.com/a/14944274-brooke-heinichen?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8921909.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012486121-william-fitzsimmons-at-hopmonk-tavern?came_from=244","location":{"#type":"Place","name":"Hopmonk Tavern","address":"Novato, CA","geo":{"#type":"GeoCoordinates","latitude":38.088489,"longitude":-122.553449}},"name":"William Fitzsimmons","performer":{"#type":"MusicGroup","name":"William Fitzsimmons","image":"https://photos.bandsintown.com/thumb/8852940.jpeg","url":"https://www.bandsintown.com/a/2450-william-fitzsimmons?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8852940.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100581554-kevin-paris-at-acoustic-yoga-#-yoga-source-los-gatos?came_from=244","location":{"#type":"Place","name":"Acoustic Yoga # Yoga Source Los Gatos","address":"Los Gatos, CA","geo":{"#type":"GeoCoordinates","latitude":37.2358078,"longitude":-121.9623751}},"name":"Kevin Paris","performer":{"#type":"MusicGroup","name":"Kevin Paris","image":"https://photos.bandsintown.com/thumb/8419497.jpeg","url":"https://www.bandsintown.com/a/1134314-kevin-paris?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8419497.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100692435-zak-fennie-at-black-stallion-winery?came_from=244","location":{"#type":"Place","name":"Black Stallion Winery","address":"Napa, CA","geo":{"#type":"GeoCoordinates","latitude":38.35983179999999,"longitude":-122.2906388}},"name":"Zak Fennie","performer":{"#type":"MusicGroup","name":"Zak Fennie","image":"https://photos.bandsintown.com/thumb/8851546.jpeg","url":"https://www.bandsintown.com/a/11843851-zak-fennie?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8851546.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100621943-frances-ancheta-at-off-the-grid-at-alameda-south-shore-center?came_from=244","location":{"#type":"Place","name":"Off the Grid at Alameda South Shore Center ","address":"Alameda, CA","geo":{"#type":"GeoCoordinates","latitude":37.7712165,"longitude":-122.2824021}},"name":"Frances Ancheta","performer":{"#type":"MusicGroup","name":"Frances Ancheta","image":"https://photos.bandsintown.com/thumb/8483059.jpeg","url":"https://www.bandsintown.com/a/7762254-frances-ancheta?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8483059.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013412612-pizza!-at-audio-nightclub?came_from=244","location":{"#type":"Place","name":"Audio Nightclub","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.771362,"longitude":-122.413795}},"name":"Pizza!","performer":{"#type":"MusicGroup","name":"Pizza!","image":"https://photos.bandsintown.com/thumb/161356.jpeg","url":"https://www.bandsintown.com/a/198680-pizza!?came_from=244"},"image":"https://photos.bandsintown.com/thumb/161356.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100372855-ryan-scott-long-at-drake's-barrel-house?came_from=244","location":{"#type":"Place","name":"Drake\u2019s barrel house ","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"Ryan Scott Long","performer":{"#type":"MusicGroup","name":"Ryan Scott Long","image":"https://photos.bandsintown.com/thumb/8671372.jpeg","url":"https://www.bandsintown.com/a/3168705-ryan-scott-long?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8671372.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012999412-come-from-away-at-golden-gate-theater?came_from=244","location":{"#type":"Place","name":"Golden Gate Theater","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7825715,"longitude":-122.4110742}},"name":"Come From Away","performer":{"#type":"MusicGroup","name":"Come From Away","image":"","url":"https://www.bandsintown.com/a/13889714-come-from-away?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100441096-and-then-came-humans-at-drake's-brewing-company?came_from=244","location":{"#type":"Place","name":"Drake\u2019s Brewing Company","address":"San Leandro, Ca","geo":{"#type":"GeoCoordinates","latitude":37.7249296,"longitude":-122.1560768}},"name":"And Then Came Humans","performer":{"#type":"MusicGroup","name":"And Then Came Humans","image":"https://photos.bandsintown.com/thumb/8897159.jpeg","url":"https://www.bandsintown.com/a/13151463-and-then-came-humans?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8897159.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1011601412-man-go-at-el-rio?came_from=244","location":{"#type":"Place","name":"El Rio","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.7467828,"longitude":-122.4193922}},"name":"Man-Go","performer":{"#type":"MusicGroup","name":"Man-Go","image":"","url":"https://www.bandsintown.com/a/3238684-man-go?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1013320819-paul-mehling-at-freight-and-salvage-coffeehouse?came_from=244","location":{"#type":"Place","name":"Freight & Salvage Coffeehouse","address":"Berkeley, CA","geo":{"#type":"GeoCoordinates","latitude":37.8708715,"longitude":-122.2695117}},"name":"Paul Mehling","performer":{"#type":"MusicGroup","name":"Paul Mehling","image":"","url":"https://www.bandsintown.com/a/3307749-paul-mehling?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100672210-dj-spooky-at-catharine-clark-gallery?came_from=244","location":{"#type":"Place","name":"Catharine Clark Gallery","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.76639,"longitude":-122.40704}},"name":"DJ Spooky","performer":{"#type":"MusicGroup","name":"DJ Spooky","image":"https://photos.bandsintown.com/thumb/7060233.jpeg","url":"https://www.bandsintown.com/a/64476-dj-spooky?came_from=244"},"image":"https://photos.bandsintown.com/thumb/7060233.jpeg"},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/1012003162-craig-ventresco-at-atlas-cafe?came_from=244","location":{"#type":"Place","name":"Atlas Cafe","address":"San Francisco, CA","geo":{"#type":"GeoCoordinates","latitude":37.73189,"longitude":-122.47615}},"name":"Craig Ventresco","performer":{"#type":"MusicGroup","name":"Craig Ventresco","image":"","url":"https://www.bandsintown.com/a/139634-craig-ventresco?came_from=244"},"image":""},{"#context":"http://schema.org","#type":"MusicEvent","startDate":"2019-01-26","endDate":"2019-01-26","url":"https://www.bandsintown.com/e/100555258-rusty-jackson-music-at-kawika's-ocean-beach-deli?came_from=244","location":{"#type":"Place","name":"Kawika's Ocean Beach Deli","address":"SF, CA","geo":{"#type":"GeoCoordinates","latitude":37.774627,"longitude":-122.509993}},"name":"Rusty Jackson Music","performer":{"#type":"MusicGroup","name":"Rusty Jackson Music","image":"https://photos.bandsintown.com/thumb/8250003.jpeg","url":"https://www.bandsintown.com/a/9978762-rusty-jackson-music?came_from=244"},"image":"https://photos.bandsintown.com/thumb/8250003.jpeg"}]</script>]
But, printing eventsJSON gives me an error:
TypeError: expected string or buffer
I want to be able to build a new JSON based on specific attributes in eventsJsonLd, ie "startDate", "name", etc. Can anyone tell me where I'm going wrong? Thanks in advance.
You are passing the script tag into json.loads, this is not string but an object of the bs4.element.Tag class.
script = i.find_all("script")[4]
print(type(script))
Output
<class 'bs4.element.Tag'>
You need to get the text from the tag and pass it to json.loads
eventsJSON = json.loads(script.text)
Note:
The current url you try (https://www.bandsintown.com/?came_from=257&page=0) has the contents of that script tag as empty, i was able to get an output for a different url (https://www.bandsintown.com/a/29109-pop-rocks) of the same domain.
print(eventsJSON[0])
Gave an output
{u'startDate': u'2019-02-15T21:00:00', u'performer': {u'url': u'https://www.bandsintown.com/a/29109-pop-rocks?came_from=244', u'image': u'https://photos.bandsintown.com/thumb/8532836.jpeg', u'#type': u'MusicGroup', u'name': u'Pop Rocks'}, u'name': u'Pop Rocks', u'url': u'https://www.bandsintown.com/e/100544648-pop-rocks-at-the-chapel?came_from=244', u'image': u'https://photos.bandsintown.com/thumb/8532836.jpeg', u'location': {u'address': u'San Francisco, CA', u'geo': {u'latitude': 37.7485824, u'#type': u'GeoCoordinates', u'longitude': -122.4184108}, u'#type': u'Place', u'name': u'The Chapel'}, u'#context': u'http://schema.org', u'#type': u'MusicEvent', u'description': u'Pop Rocks at The Chapel 2019-02-15T21:00:00'}

Merging a weird html-like txt file with an Excel file

I got two files which I'm supposed to merge (most likely using statistical software such as R or SPSS), one of them being a normal Excel table with 3 variables (names at the top of the columns). The second one, however, was sent to me in a format I haven't seen before, a large txt file with input per case (identified with the ID variable, which I would also use to merge with the Excel file) which looks like this:
<organizations>
<organization id="B0101">
<type1>E</type1>
<type2>v</type2>
<name>International Association for Official Statistics</name>
<acronym>IAOS</acronym>
<country_first_address>not known</country_first_address>
<city_first_address>not known</city_first_address>
<countries_in_which_members_located>not known</countries_in_which_members_located>
<subject_headings>Government; Statistics</subject_headings>
<foundation_year>1985</foundation_year>
<history>[[History]] Founded 1985, Amsterdam (Netherlands), at 45th Session of #A2590, as a specialized section of ISI. Absorbed, 1989, #D1316, which had been set up 22 Oct 1958, Geneva (Switzerland), following recommendations of ISI, as [International Association of Municipal Statisticians -- Association internationale de statisticiens municipaux]. </history>
<history_relations>#A2590; #D1316</history_relations>
<consultative_status>none known</consultative_status>
<igo_relations>none known</igo_relations>
<ngo_relations>#E1209; #M4975; #D1976; #E2125; #E3673; #D2578; #M0084</ngo_relations>
<member_organizations>none known</member_organizations>
</organization>
<organization id="B8500">
<type1>B</type1>
<type2>y</type2>
<name>World Blind Union</name>
<acronym>WBU</acronym>
<country_first_address>Canada</country_first_address>
<city_first_address>Toronto</city_first_address>
<countries_in_which_members_located>Algeria; Angola; Benin; Burkina Faso; Burundi; Cameroon; Cape Verde; Central African Rep; Chad; Congo Brazzaville; Congo DR; Côte d'Ivoire; Djibouti; Egypt; Equatorial Guinea; Eritrea; Ethiopia; Gabon; Gambia; Ghana; Guinea; Guinea-Bissau; Kenya; Lesotho; Liberia; Libyan AJ; Madagascar; Malawi; Mali; Mauritania; Mauritius; Morocco; Mozambique; Namibia; Niger; Nigeria; Rwanda; Sao Tomé-Principe; Senegal; Seychelles; Sierra Leone; Somalia; South Africa; South Sudan; Sudan; Swaziland; Tanzania UR; Togo; Tunisia; Uganda; Zambia; Zimbabwe; Anguilla; Antigua-Barbuda; Argentina; Bahamas; Barbados; Belize; Bolivia; Brazil; Canada; Chile; Colombia; Costa Rica; Cuba; Dominica; Dominican Rep; Ecuador; El Salvador; Grenada; Guatemala; Guyana; Haiti; Honduras; Jamaica; Martinique; Mexico; Montserrat; Nicaragua; Panama; Paraguay; Peru; St Kitts-Nevis; St Lucia; St Vincent-Grenadines; Trinidad-Tobago; Turks-Caicos; Uruguay; USA; Venezuela; Virgin Is UK; Afghanistan; Bahrain; Bangladesh; Brunei Darussalam; Cambodia; China; Hong Kong; India; Indonesia; Iraq; Israel; Japan; Jordan; Kazakhstan; Korea Rep; Kuwait; Kyrgyzstan; Laos; Lebanon; Macau; Malaysia; Mongolia; Myanmar; Nepal; Pakistan; Philippines; Qatar; Singapore; Sri Lanka; Syrian AR; Taiwan; Tajikistan; Thailand; Timor-Leste; Turkmenistan; United Arab Emirates; Uzbekistan; Vietnam; Yemen; Australia; Fiji; New Zealand; Tonga; Albania; Armenia; Austria; Azerbaijan; Belarus; Belgium; Bosnia-Herzegovina; Bulgaria; Croatia; Cyprus; Czech Rep; Denmark; Estonia; Finland; France; Georgia; Germany; Greece; Hungary; Iceland; Ireland; Italy; Latvia; Lithuania; Luxembourg; Macedonia; Malta; Moldova; Montenegro; Netherlands; Norway; Poland; Portugal; Romania; Russia; Serbia; Slovakia; Slovenia; Spain; Sweden; Switzerland; Turkey; UK; Ukraine;</countries_in_which_members_located>
<subject_headings>Blind, Visually Impaired</subject_headings>
<foundation_year>1984</foundation_year>
<history>[[History]] Founded 26 Oct 1984, Riyadh (Saudi Arabia), as one united world body composed of representatives of national associations of the blind and agencies serving the blind, successor body to both #B3499, set up 20 July 1951, Paris (France), and #B2024, formed in Aug 1964, New York NY (USA). Constitution adopted 26 Oct 1984; amended at: 3rd General Assembly, 2-6 Nov 1992, Cairo (Egypt); 26-30 Aug 1996, Toronto (Canada); 20-24 Nov 2000, Melbourne (Australia); 22-26 Nov 2004, Cape Town (South Africa); 18-22 Aug 2008, Geneva (Switzerland); 12-16 Nov 2012, Bangkok (Thailand). Registered in accordance with French law, 20 Dec 1984, Paris and again 20 Dec 2004, Paris. Incorporated in Canada as not-share-capital not-for-profit corporation, 16 Mar 2007. </history>
<history_relations>#B3499; #B2024</history_relations>
<consultative_status>#E3377; #B2183; #B3548; #B0971; #F3380; #B3635</consultative_status>
<igo_relations>#E7552; #F1393; #A3375; #B3408</igo_relations>
<ngo_relations>#E0409; #E6422; #J5215; #F5821; #C1224; #D5392; #F6792; #A1945; #B2314; #D1758; #F5810; #D1612; #J0357; #D1038; #G6537; #B2221; #B0094; #B3536; #D7556</ngo_relations>
<member_organizations>#F6063; #F4959; #J1979; #C1224; #B0094; #D5392; #A1945; #D2362; #F2936; #J4730; #F3167; #D8743; #F1898; #D0043; #G0853</member_organizations>
</organization>
Any help would be appreciated - what type of file this is and how to transform it into a manageable table?
I think your data is XML. I copied your sample data, pasted it into a blank file, and saved it as sample.xml. I made sure to add in a line with </organizations> at the very end (line 37 in your sample), to close off that tag.
Then I followed the instructions here to read it in:
library(XML)
xmlfile <- xmlTreeParse(file = "sample.xml")
xmltop = xmlRoot(xmlfile)
orgs <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
orgs_df <- data.frame(t(orgs),row.names=NULL)
This returns a dataframe orgs_df with 2 obs. of 15 variables. I presume you can now go ahead and merge this with your Excel file as you please.

Text manipulation with sed

I need a little help, in our class we've been playing around with GREP and SED commands in an attempt to learn how they work. More specifically we've been using sed commands to manipulate text and add tags.
So, we we're given an assignment, we've been given 500 lines of CSV fake data and it is our job to create a sed command that will automatically tag the data and tag any new data added down the road (theoretically).
Here's a few lines of our fake UN-TAGGED data, this is by default how we received it, as you can see all the data starts with a first name and ends with a web email:
FirstName,LastName,Company,Address,City,County,State,ZIP,Phone,Fax,Email,Web
"Essie","Vaill","Litronic Industries","14225 Hancock Dr","Anchorage","Anchorage","AK","99515","907-345-0962","907-345-1215","essie#vaill.com","http://www.essievaill.com"
"Cruz","Roudabush","Meridian Products","2202 S Central Ave","Phoenix","Maricopa","AZ","85004","602-252-4827","602-252-4009","cruz#roudabush.com","http://www.cruzroudabush.com"
"Billie","Tinnes","D & M Plywood Inc","28 W 27th St","New York","New York","NY","10001","212-889-5775","212-889-5764","billie#tinnes.com","http://www.billietinnes.com"
"Zackary","Mockus","Metropolitan Elevator Co","286 State St","Perth Amboy","Middlesex","NJ","08861","732-442-0638","732-442-5218","zackary#mockus.com","http://www.zackarymockus.com"
"Rosemarie","Fifield","Technology Services","3131 N Nimitz Hwy #-105","Honolulu","Honolulu","HI","96819","808-836-8966","808-836-6008","rosemarie#fifield.com","http://www.rosemariefifield.com"
"Bernard","Laboy","Century 21 Keewaydin Prop","22661 S Frontage Rd","Channahon","Will","IL","60410","815-467-0487","815-467-1244","bernard#laboy.com","http://www.bernardlaboy.com"
"Sue","Haakinson","Kim Peacock Beringhause","9617 N Metro Pky W","Phoenix","Maricopa","AZ","85051","602-953-2753","602-953-0355","sue#haakinson.com","http://www.suehaakinson.com"
"Valerie","Pou","Sea Port Record One Stop Inc","7475 Hamilton Blvd","Trexlertown","Lehigh","PA","18087","610-395-8743","610-395-6995","valerie#pou.com","http://www.valeriepou.com"
"Lashawn","Hasty","Kpff Consulting Engineers","815 S Glendora Ave","West Covina","Los Angeles","CA","91790","626-960-6738","626-960-1503","lashawn#hasty.com","http://www.lashawnhasty.com"
"Marianne","Earman","Albers Technologies Corp","6220 S Orange Blossom Trl","Orlando","Orange","FL","32809","407-857-0431","407-857-2506","marianne#earman.com","http://www.marianneearman.com"
"Justina","Dragaj","Uchner, David D Esq","2552 Poplar Ave","Memphis","Shelby","TN","38112","901-327-5336","901-327-2911","justina#dragaj.com","http://www.justinadragaj.com"
"Mandy","Mcdonnell","Southern Vermont Surveys","343 Bush St Se","Salem","Marion","OR","97302","503-371-8219","503-371-1118","mandy#mcdonnell.com","http://www.mandymcdonnell.com"
"Conrad","Lanfear","Kahler, Karen T Esq","49 Roche Way","Youngstown","Mahoning","OH","44512","330-758-0314","330-758-3536","conrad#lanfear.com","http://www.conradlanfear.com"
"Cyril","Behen","National Paper & Envelope Corp","1650 S Harbor Blvd","Anaheim","Orange","CA","92802","714-772-5050","714-772-3859","cyril#behen.com","http://www.cyrilbehen.com"
"Shelley","Groden","Norton, Robert L Esq","110 Broadway St","San Antonio","Bexar","TX","78205","210-229-3017","210-229-9757","shelley#groden.com","http://www.shelleygroden.com"
Our teacher wanted us to create sed commands that would automatically indent the data, add TR to the front and back of the data and add TD tags to each new field.
<HTML>
<HEAD><Title>Lab 4b by Andrey</Title></HEAD>
<BODY>
<table border="1">
<TR><TD>FirstName</TD><TD>LastName</TD><TD>Company</TD><TD>Address</TD><TD>City</TD><TD>County</TD><TD>State</TD><TD>ZIP</TD><TD>Phone</TD><TD>Fax</TD><TD>Email</TD><TD>Web</TD></TR>
<TR><TD>Essie</TD><TD>Vaill</TD><TD>Litronic Industries</TD><TD>14225 Hancock Dr</TD><TD>Anchorage</TD><TD>Anchorage</TD><TD>AK</TD><TD>99515</TD><TD>907-345-0962</TD><TD>907-345-1215</TD><TD>essie#vaill.com</TD><TD>http://www.essievaill.com</TD><TR>
<TR><TD>Cruz</TD><TD>Roudabush</TD><TD>Meridian Products</TD><TD>2202 S Central Ave</TD><TD>Phoenix</TD><TD>Maricopa</TD><TD>AZ</TD><TD>85004</TD><TD>602-252-4827</TD><TD>602-252-4009</TD><TD>cruz#roudabush.com</TD><TD>http://www.cruzroudabush.com</TD><TR>
<TR><TD>Billie</TD><TD>Tinnes</TD><TD>D & M Plywood Inc</TD><TD>28 W 27th St</TD><TD>New York</TD><TD>New York</TD><TD>NY</TD><TD>10001</TD><TD>212-889-5775</TD><TD>212-889-5764</TD><TD>billie#tinnes.com</TD><TD>http://www.billietinnes.com</TD><TR>
<TR><TD>Zackary</TD><TD>Mockus</TD><TD>Metropolitan Elevator Co</TD><TD>286 State St</TD><TD>Perth Amboy</TD><TD>Middlesex</TD><TD>NJ</TD><TD>08861</TD><TD>732-442-0638</TD><TD>732-442-5218</TD><TD>zackary#mockus.com</TD><TD>http://www.zackarymockus.com</TD><TR>
<TR><TD>Rosemarie</TD><TD>Fifield</TD><TD>Technology Services</TD><TD>3131 N Nimitz Hwy #-105</TD><TD>Honolulu</TD><TD>Honolulu</TD><TD>HI</TD><TD>96819</TD><TD>808-836-8966</TD><TD>808-836-6008</TD><TD>rosemarie#fifield.com</TD><TD>http://www.rosemariefifield.com<$
<TR><TD>Bernard</TD><TD>Laboy</TD><TD>Century 21 Keewaydin Prop</TD><TD>22661 S Frontage Rd</TD><TD>Channahon</TD><TD>Will</TD><TD>IL</TD><TD>60410</TD><TD>815-467-0487</TD><TD>815-467-1244</TD><TD>bernard#laboy.com</TD><TD>http://www.bernardlaboy.com</TD><TR>
<TR><TD>Sue</TD><TD>Haakinson</TD><TD>Kim Peacock Beringhause</TD><TD>9617 N Metro Pky W</TD><TD>Phoenix</TD><TD>Maricopa</TD><TD>AZ</TD><TD>85051</TD><TD>602-953-2753</TD><TD>602-953-0355</TD><TD>sue#haakinson.com</TD><TD>http://www.suehaakinson.com</TD><TR>
<TR><TD>Valerie</TD><TD>Pou</TD><TD>Sea Port Record One Stop Inc</TD><TD>7475 Hamilton Blvd</TD><TD>Trexlertown</TD><TD>Lehigh</TD><TD>PA</TD><TD>18087</TD><TD>610-395-8743</TD><TD>610-395-6995</TD><TD>valerie#pou.com</TD><TD>http://www.valeriepou.com</TD><TR>
<TR><TD>Lashawn</TD><TD>Hasty</TD><TD>Kpff Consulting Engineers</TD><TD>815 S Glendora Ave</TD><TD>West Covina</TD><TD>Los Angeles</TD><TD>CA</TD><TD>91790</TD><TD>626-960-6738</TD><TD>626-960-1503</TD><TD>lashawn#hasty.com</TD><TD>http://www.lashawnhasty.com</TD><T$
<TR><TD>Marianne</TD><TD>Earman</TD><TD>Albers Technologies Corp</TD><TD>6220 S Orange Blossom Trl</TD><TD>Orlando</TD><TD>Orange</TD><TD>FL</TD><TD>32809</TD><TD>407-857-0431</TD><TD>407-857-2506</TD><TD>marianne#earman.com</TD><TD>http://www.marianneearman.com</TD$
<TR><TD>Justina</TD><TD>Dragaj</TD><TD>Uchner David D Esq</TD><TD>2552 Poplar Ave</TD><TD>Memphis</TD><TD>Shelby</TD><TD>TN</TD><TD>38112</TD><TD>901-327-5336</TD><TD>901-327-2911</TD><TD>justina#dragaj.com</TD><TD>http://www.justinadragaj.com</TD><TR>
<TR><TD>Mandy</TD><TD>Mcdonnell</TD><TD>Southern Vermont Surveys</TD><TD>343 Bush St Se</TD><TD>Salem</TD><TD>Marion</TD><TD>OR</TD><TD>97302</TD><TD>503-371-8219</TD><TD>503-371-1118</TD><TD>mandy#mcdonnell.com</TD><TD>http://www.mandymcdonnell.com</TD><TR>
<TR><TD>Conrad</TD><TD>Lanfear</TD><TD>Kahler Karen T Esq</TD><TD>49 Roche Way</TD><TD>Youngstown</TD><TD>Mahoning</TD><TD>OH</TD><TD>44512</TD><TD>330-758-0314</TD><TD>330-758-3536</TD><TD>conrad#lanfear.com</TD><TD>http://www.conradlanfear.com</TD><TR>
<TR><TD>Cyril</TD><TD>Behen</TD><TD>National Paper & Envelope Corp</TD><TD>1650 S Harbor Blvd</TD><TD>Anaheim</TD><TD>Orange</TD><TD>CA</TD><TD>92802</TD><TD>714-772-5050</TD><TD>714-772-3859</TD><TD>cyril#behen.com</TD><TD>http://www.cyrilbehen.com</TD><TR>
<TR><TD>Shelley</TD><TD>Groden</TD><TD>Norton Robert L Esq</TD><TD>110 Broadway St</TD><TD>San Antonio</TD><TD>Bexar</TD><TD>TX</TD><TD>78205</TD><TD>210-229-3017</TD><TD>210-229-9757</TD><TD>shelley#groden.com</TD><TD>http://www.shelleygroden.com</TD><TR>
</table>
</BODY>
</HTML>
So, I was messing around and I tired to create a few sed commands that would mimic the second output.
My first attempt was:
#!/bin/sh
sed -e 's=^.*$=<TR><TD>&</TD></TR>=' input.csv
Unfortunately, this program only outputs something like this where I get TR TD at the beginning and end, but no TD tags inside:
<TR><TD>"Bryan","Rovell","All N All Shop","90 Hackensack St","East Rutherford","Bergen","NJ","07073","201-939-2788","201-939-9079","bryan#rovell.com","http://www.bryanrovell.com"</TD></TR>
<TR><TD>"Joey","Bolick","Utility Trailer Sales","7700 N Council Rd","Oklahoma City","Oklahoma","OK","73132","405-728-5972","405-728-5244","joey#bolick.com","http://www.joeybolick.com"</TD></TR>
I've also attempted to create individual seds to tag field, but instead I've only managed to tag each word, so I'm kinda stuck.
I'm partially on the right track, I think, but I need helping indenting and adding TD to the beginning & end of every field, along with TR to the beginning and end of each new column.
This is the main part of it:
$ sed -r 's:^"?: <TR><TD>:; s:"?,"?:</TD><TD>:g; s:"?$:</TD></TR>:' file
<TR><TD>FirstName</TD><TD>LastName</TD><TD>Company</TD><TD>Address</TD><TD>City</TD><TD>County</TD><TD>State</TD><TD>ZIP</TD><TD>Phone</TD><TD>Fax</TD><TD>Email</TD><TD>Web</TD></TR>
<TR><TD>Essie</TD><TD>Vaill</TD><TD>Litronic Industries</TD><TD>14225 Hancock Dr</TD><TD>Anchorage</TD><TD>Anchorage</TD><TD>AK</TD><TD>99515</TD><TD>907-345-0962</TD><TD>907-345-1215</TD><TD>essie#vaill.com</TD><TD>http://www.essievaill.com</TD></TR>
<TR><TD>Cruz</TD><TD>Roudabush</TD><TD>Meridian Products</TD><TD>2202 S Central Ave</TD><TD>Phoenix</TD><TD>Maricopa</TD><TD>AZ</TD><TD>85004</TD><TD>602-252-4827</TD><TD>602-252-4009</TD><TD>cruz#roudabush.com</TD><TD>http://www.cruzroudabush.com</TD></TR>
<TR><TD>Billie</TD><TD>Tinnes</TD><TD>D & M Plywood Inc</TD><TD>28 W 27th St</TD><TD>New York</TD><TD>New York</TD><TD>NY</TD><TD>10001</TD><TD>212-889-5775</TD><TD>212-889-5764</TD><TD>billie#tinnes.com</TD><TD>http://www.billietinnes.com</TD></TR>
<TR><TD>Zackary</TD><TD>Mockus</TD><TD>Metropolitan Elevator Co</TD><TD>286 State St</TD><TD>Perth Amboy</TD><TD>Middlesex</TD><TD>NJ</TD><TD>08861</TD><TD>732-442-0638</TD><TD>732-442-5218</TD><TD>zackary#mockus.com</TD><TD>http://www.zackarymockus.com</TD></TR>
<TR><TD>Rosemarie</TD><TD>Fifield</TD><TD>Technology Services</TD><TD>3131 N Nimitz Hwy #-105</TD><TD>Honolulu</TD><TD>Honolulu</TD><TD>HI</TD><TD>96819</TD><TD>808-836-8966</TD><TD>808-836-6008</TD><TD>rosemarie#fifield.com</TD><TD>http://www.rosemariefifield.com</TD></TR>
<TR><TD>Bernard</TD><TD>Laboy</TD><TD>Century 21 Keewaydin Prop</TD><TD>22661 S Frontage Rd</TD><TD>Channahon</TD><TD>Will</TD><TD>IL</TD><TD>60410</TD><TD>815-467-0487</TD><TD>815-467-1244</TD><TD>bernard#laboy.com</TD><TD>http://www.bernardlaboy.com</TD></TR>
<TR><TD>Sue</TD><TD>Haakinson</TD><TD>Kim Peacock Beringhause</TD><TD>9617 N Metro Pky W</TD><TD>Phoenix</TD><TD>Maricopa</TD><TD>AZ</TD><TD>85051</TD><TD>602-953-2753</TD><TD>602-953-0355</TD><TD>sue#haakinson.com</TD><TD>http://www.suehaakinson.com</TD></TR>
<TR><TD>Valerie</TD><TD>Pou</TD><TD>Sea Port Record One Stop Inc</TD><TD>7475 Hamilton Blvd</TD><TD>Trexlertown</TD><TD>Lehigh</TD><TD>PA</TD><TD>18087</TD><TD>610-395-8743</TD><TD>610-395-6995</TD><TD>valerie#pou.com</TD><TD>http://www.valeriepou.com</TD></TR>
<TR><TD>Lashawn</TD><TD>Hasty</TD><TD>Kpff Consulting Engineers</TD><TD>815 S Glendora Ave</TD><TD>West Covina</TD><TD>Los Angeles</TD><TD>CA</TD><TD>91790</TD><TD>626-960-6738</TD><TD>626-960-1503</TD><TD>lashawn#hasty.com</TD><TD>http://www.lashawnhasty.com</TD></TR>
<TR><TD>Marianne</TD><TD>Earman</TD><TD>Albers Technologies Corp</TD><TD>6220 S Orange Blossom Trl</TD><TD>Orlando</TD><TD>Orange</TD><TD>FL</TD><TD>32809</TD><TD>407-857-0431</TD><TD>407-857-2506</TD><TD>marianne#earman.com</TD><TD>http://www.marianneearman.com</TD></TR>
<TR><TD>Justina</TD><TD>Dragaj</TD><TD>Uchner</TD><TD> David D Esq</TD><TD>2552 Poplar Ave</TD><TD>Memphis</TD><TD>Shelby</TD><TD>TN</TD><TD>38112</TD><TD>901-327-5336</TD><TD>901-327-2911</TD><TD>justina#dragaj.com</TD><TD>http://www.justinadragaj.com</TD></TR>
<TR><TD>Mandy</TD><TD>Mcdonnell</TD><TD>Southern Vermont Surveys</TD><TD>343 Bush St Se</TD><TD>Salem</TD><TD>Marion</TD><TD>OR</TD><TD>97302</TD><TD>503-371-8219</TD><TD>503-371-1118</TD><TD>mandy#mcdonnell.com</TD><TD>http://www.mandymcdonnell.com</TD></TR>
<TR><TD>Conrad</TD><TD>Lanfear</TD><TD>Kahler</TD><TD> Karen T Esq</TD><TD>49 Roche Way</TD><TD>Youngstown</TD><TD>Mahoning</TD><TD>OH</TD><TD>44512</TD><TD>330-758-0314</TD><TD>330-758-3536</TD><TD>conrad#lanfear.com</TD><TD>http://www.conradlanfear.com</TD></TR>
<TR><TD>Cyril</TD><TD>Behen</TD><TD>National Paper & Envelope Corp</TD><TD>1650 S Harbor Blvd</TD><TD>Anaheim</TD><TD>Orange</TD><TD>CA</TD><TD>92802</TD><TD>714-772-5050</TD><TD>714-772-3859</TD><TD>cyril#behen.com</TD><TD>http://www.cyrilbehen.com</TD></TR>
<TR><TD>Shelley</TD><TD>Groden</TD><TD>Norton</TD><TD> Robert L Esq</TD><TD>110 Broadway St</TD><TD>San Antonio</TD><TD>Bexar</TD><TD>TX</TD><TD>78205</TD><TD>210-229-3017</TD><TD>210-229-9757</TD><TD>shelley#groden.com</TD><TD>http://www.shelleygroden.com</TD></TR>
I expect you can figure out the rest since that's just printing the head and tail lines.