I'm a beginner in web scraping using R. I'm trying to scrape the following webpage: https://bkmea.com/bkmea-members/#/company/2523.
I would like to get all text elements under div nodes with class="company_name", as well as text elements under td nodes. For example, I'm trying to fetch the company name ("MOMO APPARELS") as in the following HTML text.
<div class="comapny_header">
<div class="company_name">MOMO APPARELS LTD</div>
<div class="view_all">View All</div>
</div>
So I've written the following code:
library(textreadr)
library(rvest)
companyinfo <- read_html("https://bkmea.com/bkmea-members/#/company/2523")
html_nodes(companyinfo,"div")%>%
html_text() # it works
html_nodes(companyinfo,"div.company_name")%>%
html_text() # doesn't work
html_nodes(companyinfo,"td") %>%
html_text() # doesn't work
If I understand correctly - the first one should pull up texts with div nodes.
The second one should pull up texts within div nodes with attributes equal to company_name.
The third one should pull up texts within td nodes.
The first one works (which isn't what I'm trying to get) but the second and the third ones don't - am I doing something terribly wrong?
I'd really appreciate it if you could help me out here!!
Many thanks,
Sang
The data you're looking for is retrieved by this API (it is not present in the html body) :
GET https://bkmea.com/wp-admin/admin-ajax.php?action=bkmea_get_company&id=2523
You just need to extract the id from your original url, build the url above and parse json result as following :
library(httr)
originalUrl <- "https://bkmea.com/bkmea-members/#/company/2523"
id <- sub("^.+/", "", originalUrl)
userAgent <- "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
output <- content(GET("https://bkmea.com/wp-admin/admin-ajax.php", query = list(
"action" = "bkmea_get_company",
"id" = id
), add_headers('User-Agent' = userAgent)), as = "parsed", type = "application/json")
print(output$company$company_info$company_name)
output :
[1] "MOMO APPARELS LTD"
Hi I am making a scrape for yahoo finance and I am using JSON to get keys and then scraping the keys e.g ...
fwd_div_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']["summaryDetail"]['dividendYield']['raw']
The error is that if a company doesn't pay a dividend it will produce a key error as there is no key 'raw' instead of using raw = 0 they just don't have raw. But if a company does have a dividend it will return 'raw', 'fmt' etc.
I was wondering what the most efficient way of dealing with this is?
Another Question Is how would you access ...
[{'raw': 1595894400, 'fmt': '2020-07-28'}, {'raw': 1596412800, 'fmt': '2020-08-03'}]
my current soloution is...
earnings_dates = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate'][0]['fmt']
earnings_datee = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate'][1]['fmt']
earnings_date = earnings_dates+", "+earnings_datee
To extract the dividend yield from the raw key and not get a KeyError when it's not there, do the following:
fwd_div_yield = data['context']['dispatcher']['stores']['QuoteSummaryStore']["summaryDetail"]['dividendYield'].get('raw', 0)
In the event raw is not there, the fwd_div_yield will be 0.
Then to retrieve each date from the list of dictionaries, you can use a list comprehension:
earnings_dates = data['context']['dispatcher']['stores']['QuoteSummaryStore']['calendarEvents']['earnings']['earningsDate']
fmt_dates = [date['fmt'] for date in earnings_dates]
Also, this data is available via url: https://query2.finance.yahoo.com/v10/finance/quoteSummary/aapl?modules=summaryDetail. Just replace aapl with the symbol you're scraping.
I would wrap whatever code is checking if the company pays a dividend in a try/except block.
def paysDivivend(data):
try:
if 'raw' in data:
return True
except KeyError:
return False
Without seeing any example code this is a quick fix solution
For the second question...
IF you are asking to create [{'raw': 1234,'fmt':'2020-07-28'},...]:
Based on the compiled list of companies that pay a dividend.
Create a the list:
def dividendList(data):
dividend_list = []
for company in data:
dividend_list.append({'raw':compay['path']['to']['raw'],'fmt':company['path']['to'][fmt']})
return dividend_list
IF you are trying to access each one after you already created the list:
def accessDividend(dividend_data):
for dividend in dividend_data:
print(f"{dividend['raw']}, {dividend['fmt']}")
I created this method as a workaround.
def yfinanceDataframe(symbol, interval, _range):
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
data = requests.get(f'https://query1.finance.yahoo.com/v8/finance/chart/{symbol}?interval={interval}&range={_range}', headers=headers).json()
timestamp = data['chart']['result'][0]['timestamp']
data = data['chart']['result'][0]['indicators']['quote'][0]
df = pd.DataFrame(data)
df['Datetime'] = timestamp
df['Datetime'] = df['Datetime'].apply(lambda x: dt.fromtimestamp(x).strftime('%m/%d/%Y %H:%M'))
df.dropna(inplace=True)
df.reset_index(inplace=True)
df.rename(columns={'close': 'Close'}, inplace=True)
return df
So today I was writing a nodejs app to get data out of a website's API.So the API returns data in JSON. This is my code :
var processing = WooCommerce.get('orders?status='+type, function(err, data,
res) {
var result = res;
JSON.stringify(result)
console.log(result);
result = result[0].meta_data;
console.log(result);
});
And this is my console log : (Sorry for the mess)
[{"id":2977,"parent_id":0,"number":"2977","order_key":"wc_order_5a8bc4c350d54","created_via":"checkout","version":"3.0.5","status":"on-hold","currency":"INR","date_created":"2018-02-20T12:18:3
5","date_created_gmt":"2018-02-20T06:48:35","date_modified":"2018-02-20T12:18:41","date_modified_gmt":"2018-02-20T06:48:41","discount_total":"0.00","discount_tax":"0.00","shipping_total":"0.00
","shipping_tax":"0.00","cart_tax":"0.00","total":"40.00","total_tax":"0.00","prices_include_tax":false,"customer_id":342,"customer_ip_address":"103.104.77.159","customer_user_agent":"mozilla\
/5.0 (linux; android 6.0.1; le x526 build\/iixosop5801910121s) applewebkit\/537.36 (khtml, like gecko) chrome\/64.0.3282.137 mobile safari\/537.36","customer_note":"","billing":{"first_name":"
Fahad","last_name":"Khan","company":"","address_1":"","address_2":"","city":"Delhi","state":"DL","postcode":"","country":"IN","email":"shimail786#gmail.com","phone":"8745076002"},"shipping":{"
first_name":"","last_name":"","company":"","address_1":"","address_2":"","city":"","state":"","postcode":"","country":""},"payment_method":"paytm-qr","payment_method_title":"Pay with Paytm QR"
,"transaction_id":"","date_paid":null,"date_paid_gmt":null,"date_completed":null,"date_completed_gmt":null,"cart_hash":"0119311d11c4978ecc7bf6f59b53586f","meta_data":[{"id":91320,"key":"_billi
ng_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?partner=452464312&token=Gq27CMGc"},{"id":91321,"key":"billing_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?par
tner=452464312&token=Gq27CMGc"},{"id":91324,"key":"_woocs_order_rate","value":"1"},{"id":91325,"key":"_woocs_order_base_currency","value":"INR"},{"id":91326,"key":"_woocs_order_currency_change
d_mannualy","value":"0"}],"line_items":[{"id":1641,"name":"MAG-7 | Silver (Factory New)","product_id":2972,"variation_id":0,"quantity":1,"tax_class":"","subtotal":"40.00","subtotal_tax":"0.00"
,"total":"40.00","total_tax":"0.00","taxes":[],"meta_data":[],"sku":"","price":40}],"tax_lines":[],"shipping_lines":[],"fee_lines":[],"coupon_lines":[],"refunds":[],"_links":{"self":[{"href":"
https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders\/2977"}],"collection":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders"}],"customer":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/w
c\/v2\/customers\/342"}]}},{"id":2976,"parent_id":0,"number":"2976","order_key":"wc_order_5a8bc2fabf6d8","created_via":"checkout","version":"3.0.5","status":"on-hold","currency":"INR","date_cr
eated":"2018-02-20T12:10:58","date_created_gmt":"2018-02-20T06:40:58","date_modified":"2018-02-20T12:11:02","date_modified_gmt":"2018-02-20T06:41:02","discount_total":"0.00","discount_tax":"0.
00","shipping_total":"0.00","shipping_tax":"0.00","cart_tax":"0.00","total":"95.00","total_tax":"0.00","prices_include_tax":false,"customer_id":342,"customer_ip_address":"103.104.77.159","cust
omer_user_agent":"mozilla\/5.0 (linux; android 6.0.1; le x526 build\/iixosop5801910121s) applewebkit\/537.36 (khtml, like gecko) chrome\/64.0.3282.137 mobile safari\/537.36","customer_note":""
,"billing":{"first_name":"Fahad","last_name":"Khan","company":"","address_1":"","address_2":"","city":"Delhi","state":"DL","postcode":"","country":"IN","email":"shimail786#gmail.com","phone":"
8745076002"},"shipping":{"first_name":"","last_name":"","company":"","address_1":"","address_2":"","city":"","state":"","postcode":"","country":""},"payment_method":"paytm-qr","payment_method_
title":"Pay with Paytm QR","transaction_id":"","date_paid":null,"date_paid_gmt":null,"date_completed":null,"date_completed_gmt":null,"cart_hash":"ca6b663ea1f4b4c7ed65b9fd39acc2cb","meta_data":
[{"id":91268,"key":"_billing_stl","value":"https:\/\/steamcommunity.com\/tradeoffer\/new\/?partner=452464312&token=1m7SCUVf"},{"id":91269,"key":"billing_stl","value":"https:\/\/steamcommunity.
com\/tradeoffer\/new\/?partner=452464312&token=1m7SCUVf"},{"id":91272,"key":"_woocs_order_rate","value":"1"},{"id":91273,"key":"_woocs_order_base_currency","value":"INR"},{"id":91274,"key":"_w
oocs_order_currency_changed_mannualy","value":"0"}],"line_items":[{"id":1639,"name":"SG 553 | Tiger Moth (Field Tested)","product_id":911,"variation_id":0,"quantity":1,"tax_class":"","subtotal
":"42.00","subtotal_tax":"0.00","total":"42.00","total_tax":"0.00","taxes":[],"meta_data":[],"sku":"","price":42},{"id":1640,"name":"Glock-18 | Bunsen Burner (Factory New)","product_id":532,"v
ariation_id":0,"quantity":1,"tax_class":"","subtotal":"53.00","subtotal_tax":"0.00","total":"53.00","total_tax":"0.00","taxes":[],"meta_data":[{"id":4861,"key":"_woocs_order_rate","value":"1"}
,{"id":4862,"key":"_woocs_order_base_currency","value":"INR"},{"id":4863,"key":"_woocs_order_currency_changed_mannualy","value":"0"}],"sku":"","price":53}],"tax_lines":[],"shipping_lines":[],"
fee_lines":[],"coupon_lines":[],"refunds":[],"_links":{"self":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/orders\/2976"}],"collection":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\
/v2\/orders"}],"customer":[{"href":"https:\/\/ezpz-skins.com\/wp-json\/wc\/v2\/customers\/342"}]}}]
undefined
So I realize (after reading tons of questions on StackOverflow) that my data is an array. That's why I have added the result = result[0].meta_data; But that gives me undefined (notice it at the end of log). Also if I remove .meta_data , it just returns [, the very first character.
Where am I going wrong ? I'm kinda new to all this and am still learning, so please explain :)
The 'res' is in string format so instead of JSON.stringify() use JSON.parse() so that it will be converted back into Javascript object, then try consoling the result as shown below and try to access meta_data after that.
var processing = WooCommerce.get('orders?status='+type, function(err, data,
res) {
var result = JSON.parse(res);
console.log(result[0]);
result = result[0].meta_data;
console.log(result);
});
I'm ahmed and I'm working on opensips.
Actually, I sow your questions on the forum and I have a problem that I think you have the answer.
Actually, I did a simple senario to route calls between users registered in opensips server, but when it comes to real IP phones( that each one has its own ip address ), it doasn't work. ( trunk ).
for example: my opensips address: 10.42.15.18
and my IP phone address is : 10.42.13.82
it is all about sip trunk I think.
I am blocked in this part and searched a lot for a solution, maybe there is a detail that I have missed.
which function is responsible for handling requests and responses with an IP phone ?
I used this code :
account only INVITEs
if ($rU=="49894614950666"){
$rU = $tU;
rewritehostport("10.42.13.82:5060");
$du = "sip:49894614950666#10.42.13.82;user=phone";
t_relay();
xlog("reference to URI of 'To' header ====> $tu");
xlog("reference to domain in URI of 'TO' header ====> $td");
# route the call out based on RURI
route(3);
}
route[3]{
seturi("sip:49894614950666#10.42.13.82;user=phone");
$du = "sip:49894614950666#10.42.13.82;user=phone";
rewriteuri("sip:49894614950666#10.42.13.82;user=phone");
xlog("route 2 : forwarding to $tU \n $ruri \n");
xlog("Received $rm from $fu (callid: $ci)\n");
forward();
if (is_method("INVITE")) {
t_on_branch("2");
t_on_reply("2");
t_on_failure("1");
}
if (!t_relay()) {
sl_reply_error();
};
exit;
}
When calling from a soft phone the requested number, the server sends a request INVITE as follow :
INVITE sip:49894614950666#10.42.15.18;transport=TCP SIP/2.0
Via: SIP/2.0/TCP 10.42.15.12:5060;branch=z9hG4bK-524287-1---dedd27ee7475c0f1
Max-Forwards: 70
Contact: <sip:test11#10.42.15.12:5060;transport=tcp>
To: <sip:49894614950666#10.42.15.18;transport=TCP>
From: <sip:test11#10.42.15.18;transport=TCP>;tag=2f025b44
Call-ID: tdO14DnlADH9Okx6Sr0p4A..
CSeq: 1 INVITE
Content-Type: application/sdp
User-Agent: Z 3.15.40006 rv2.8.20
Allow-Events: presence, kpml, talk
Content-Length: 237
and the target VM resend an INVITE request to Opensips server, but then, the server start to send to himself messages and not responding the target machine...
I wonder that the "To" field in the INVITE message is false !
opensips only sends a invite to the IP phone and ignore messages coming from it, does not respond after with any ack.
what should I add or modify ?
thank you a lot.
Why just not using lookup function ? It intended exactly for cases like your and will do all duty work in rewriting URI's automatically.
Something like that :
if (lookup("location","m")) {
xlog("[INCOMINGCALL][$rU # $si:$sp ] Forward call call to <$ru> via <$du>\n");
if (!t_relay()) {
send_reply("503","Internal Error");
};
exit;
}
t_reply("404", "Not Found");
exit;
The advantage of this technique you will be able to change locations at run time using 'opensipsctl address' command
I'm using an Express server with RethinkDB, and I want to send in multiple coordinates into my 'locations' table on RethinkDB and create an r.polygon(). I understand how to do the query via RethinkDB's data explorer , but I'm having trouble figuring out how to send it via JSON from the client to the server and insert it through my query there.
I basically want to do this:
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(req.body.bounds)
})
where req.body.bounds looks like this:
[long, lat],[long, lat], [long, lat]
I can't send it in as a string because then it gets read as one single input instead of three arrays. I'm sure there's a 'right in front of me' way, but I'm drawing a blank.
What's the best way to do this?
Edit: To clarify, my question is, what should my JSON look like and how should it be received on my server?
This is what RethinkDB wants in order to make a polygon:
r.polygon([lon1, lat1], [lon2, lat2], [lon3, lat3], ...) → polygon
As per the suggestion, I've added in r.args() to my code:
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(r.args(req.body.bounds))
})
Edit
Ok, I was dumb and had a typo in one of my coordinates!
Sending it as an array of arrays and wrapping it in r.args() on the server side works.
What you need is r.args to unpack the array into arguments for r.polygon. https://www.rethinkdb.com/api/javascript/args/
With assumption that req.body.bounds is:
[[long, lat],[long, lat], [long, lat]]
And you are submit a raw JSON string from client.
You first need to decode the JSON payload, and get the bounds field, wrap it with args as following:
var body = JSON.parse(req.body)
r.db('places').table('locations').insert({
name: req.body.name,
bounds: r.polygon(r.args(body.bounds))
})