I'm an absolute beginner in get/post requests and micropython.
I'm programming my ESP8266 Wemos D1 mini as a HTTP server with micropython. My project consists of using a website to control the RGB values of a neopixel matrix hooked up to the D1 (all the code is on my GitHub here: https://github.com/julien123123/NeoLamp-Micro).
Basically, the website contains three sliders: one for Red, one for Green and one for Blue. A javascript code reads the value of each slider and sends it to the micropython code with using the POST method as follows :
getColors = function() {
var rgb = new Array(slider1.value, slider2.value, slider3.value);
return rgb;
};
postColors = function(rgb) {
var xmlhttp = new XMLHttpRequest();
var npxJSON = '{"R":' + rgb[0] + ', "G":' + rgb[1] + ', "B":' + rgb[2] + '}';
xmlhttp.open('POST', 'http://' + window.location.hostname + '/npx', true);
xmlhttp.setRequestHeader('Content-type', 'application/json');
xmlhttp.send(npxJSON);
};
To recieve the resquest in micropython here's my code:
conn, addr = s.accept()
request = conn.recv(1024)
request = str(request)
print(request)
The response prints as follows:
b'POST /npx HTTP/1.1\r\nHost: 192.xxx.xxx.xxx\r\nConnection: keep-alive\r\nContent-Length: 27\r\nOrigin: http://192.168.0.110\r\nUser-Agent: Mozilla/5.0 (X11; CrOS x86_64 10323.46.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.107 Safari/537.36\r\nContent-type: application/json\r\nAccept: */*\r\nReferer: http://192.xxx.xxx.xxx/\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: fr,en;q=0.9,fr-CA;q=0.8\r\n\r\n{"R":114, "G":120, "B":236}'
The only important bit for me is at the end : {"R":114, "G":120, "B":236}. I want to use those values to change the color values of my neopixel object.
My question to you is how to I process the response so that I keep only the dictionary containing the RGB variables at the end of the response??
Thanks in advance (I'm almost there!)
This is more related to generic python data type. The data type of request is in bytes as indicated by prefix b in b'POST /npx HTTP/1.1...\r\n{"R":114, "G":120, "B":236}'. You will have to use decode() to convert it to string
import json
request = b'POST /npx HTTP/1.1\r\nHost: 192.xxx.xxx.xxx\r\nConnection: keep-alive\r\nContent-Length: 27\r\nOrigin: http://192.168.0.110\r\nUser-Agent: Mozilla/5.0 (X11; CrOS x86_64 10323.46.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.107 Safari/537.36\r\nContent-type: application/json\r\nAccept: */*\r\nReferer: http://192.xxx.xxx.xxx/\r\nAccept-Encoding: gzip, deflate\r\nAccept-Language: fr,en;q=0.9,fr-CA;q=0.8\r\n\r\n{"R":114, "G":120, "B":236}'
data = request.decode() # convert to str
rgb = data.split('\r\n')[-1:] #split the str and discard the http header
for color in rgb:
print(color, type(color))
d = json.loads(color)
print(d, type(d))
The result of color is a str representation of an json object, the d will give you a python dict object to be used for further manipulation:
{"R":114, "G":120, "B":236} <class 'str'>
{'R': 114, 'G': 120, 'B': 236} <class 'dict'>
Related
This was working, and suddenly fails on extracting the "event.body" JSON object passed into this AWS Lambda nodeJS function:
exports.handler = function (event, context, callback) {
console.log('Event: ' + JSON.stringify(event));
console.log('Event.Body: ' + event.body);
//console.log('Parsed Event: ' + JSON.parse(event));
let body = event.body;
console.log('Body: ' + body);
const tgQueryName = body.queryName;
const tgQueryParams = body.queryParams;
console.log('tgQueryName: ' + tgQueryName);
console.log('tgQueryParams: ' + tgQueryParams);
...
Both tgQueryName and tgQueryParams are 'undefined' - see CloudWatch log:
INFO Event: {"version":"2.0","routeKey":"POST /tg-query","rawPath":"/dev/tg-query","rawQueryString":"","headers":{"accept":"application/json, text/plain, */*","accept-encoding":"gzip, deflate","accept-language":"he-IL,he;q=0.9,en-US;q=0.8,en;q=0.7","cache-control":"no-cache","content-length":"51","content-type":"application/json; charset=UTF-8","host":"p6ilp2ts0g.execute-api.us-east-1.amazonaws.com","origin":"http://localhost","referer":"http://localhost/","sec-fetch-dest":"empty","sec-fetch-mode":"cors","sec-fetch-site":"cross-site","user-agent":"Mozilla/5.0 (Linux; Android 11; Redmi Note 8 Build/RKQ1.201004.002; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/101.0.4951.61 Mobile Safari/537.36","x-amzn-trace-id":"Root=1-629b960c-072e8fa475ad26f56893c6f9","x-forwarded-for":"89.139.32.60","x-forwarded-port":"443","x-forwarded-proto":"https","x-requested-with":"com.skillblaster.simplify.dev"},"requestContext":{"accountId":"140360121027","apiId":"p6ilp2ts0g","domainName":"p6ilp2ts0g.execute-api.us-east-1.amazonaws.com","domainPrefix":"p6ilp2ts0g","http":{"method":"POST","path":"/dev/tg-query","protocol":"HTTP/1.1","sourceIp":"89.139.32.60","userAgent":"Mozilla/5.0 (Linux; Android 11; Redmi Note 8 Build/RKQ1.201004.002; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/101.0.4951.61 Mobile Safari/537.36"},"requestId":"TNRh_gq-oAMESEw=","routeKey":"POST /tg-query","stage":"dev","time":"04/Jun/2022:17:27:40 +0000","timeEpoch":1654363660597},"body":"{\"queryName\":\"getActiveCountries\",\"queryParams\":{}}","isBase64Encoded":false}
INFO Event.Body: {"queryName":"getActiveCountries","queryParams":{}}
INFO Body: {"queryName":"getActiveCountries","queryParams":{}}
INFO tgQueryName: undefined
INFO tgQueryParams: undefined
I also tried: body["queryName"] - same result.
What am I missing?
Your body content is a string and you need to JSON.parse it:
let body = JSON.parse(event.body);
It was only clear when I stuck your initial event JSON into a JSON beautifier and it was a little clearer.
I try to find emails into html using regex but I have problems with some websites.
The main problem is that regex function paralyzes the process and leaves the cpu overloaded.
import re
from urllib.request import urlopen, Request
email_regex = re.compile('([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})', re.IGNORECASE)
request = Request('http://www.serviciositvyecla.com')
request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
html = str(urlopen(request, timeout=5).read().decode("utf-8", "strict"))
email_regex.findall(html) ## here is where regex takes a long time
I have not problems if the website is another one.
request = Request('https://www.velezmalaga.es/')
If someone know how to solve this problem or know how to timeout the regex function, I will appreciate it.
I use Windows.
I initially tried fiddling with your approach, but then I ditched it and resorted to BeautifulSoup. It worked.
Try this:
import re
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
}
pages = ['http://www.serviciositvyecla.com', 'https://www.velezmalaga.es/']
emails_found = set()
for page in pages:
html = requests.get(page, headers=headers).content
soup = BeautifulSoup(html, "html.parser").select('a[href^=mailto]')
for item in soup:
try:
emails_found.add(item['href'].split(":")[-1].strip())
except ValueError:
print("No email :(")
print('\n'.join(email for email in emails_found))
Output:
info#serviciositvyecla.com
oac#velezmalaga.es
EDIT:
One reason your approach doesn't work is, well, the regex itself. The other one is the size (I suspect) of the HTML returned.
See this:
import re
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
}
html = requests.get('https://www.velezmalaga.es/', headers=headers).text
op_regx = '([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})'
simplified_regex = '[\w\.-]+#[\w\.-]+\.\w+'
print(f"OP's regex results: {re.findall(op_regx, html)}")
print(f"Simplified regex results: {re.findall(simplified_regex, html)}")
This prints:
OP's regex results: []
Simplified regex results: ['oac#velezmalaga.es', 'oac#velezmalaga.es']
Finally, I found a solution for no consume all RAM with a regex search. In my problem, obtaining a white result even though there is email on the web is an acceptable solution, as long as not to block the process due to lack of memory.
The html of the scraped page contained 5.5 million characters. 5.1 millions did not contain priority information, since it is a hidden div with unintelligible characters.
I have added an exception similar than:
if len(html) < 1000000: do whathever
I am working on my first website scraper and am trying to get the number 41,110 that is saved in a column on the webpage https://mcassessor.maricopa.gov/mcs.php?q=14014003N. Below is my code.
How can I get to this number and print it?
from bs4 import BeautifulSoup
import requests
web_page = 'https://mcassessor.maricopa.gov/mcs.php?q=14014003N'
web_header = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
response = requests.get(web_page,headers=web_header)
soup = BeautifulSoup(response.content,'html.parser')
for row in soup.findAll('table')[0].thread.tr.findAll('tr'):
first_column = row.findAll('th')[0].contents
print(first_column)
A straightforward approach would involve getting the "improvements" table, getting the first non-header row and then the last cell in this row:
table = soup.find("table", id="improvements-table")
first_row = table.find_all("tr")[1] # skipping a header
last_cell = first_row.find_all("td")[-1]
print(last_cell.get_text()) # prints 41,110
A more generic approach would involve making a list of dictionaries out of this table where keys are header names:
table = soup.find("table", id="improvements-table")
headers = [th.get_text() for th in table('th')]
data = [dict(zip(headers, [td.get_text() for td in row('td')])) for row in table("tr")[1:]]
print(data)
print(data[0]['Sq Ft.'])
Prints:
[
{u'Imp #': u'000101', u'Description': u'Mini-Warehouse', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'41,110', u'CCI': u'C', u'Model': u'386'},
{u'Imp #': u'000201', u'Description': u'Site Improvements', u'Age': u'1', u'Rank': u'2', u'Sq Ft.': u'1', u'CCI': u'D', u'Model': u'163'}
]
41,110
For learning purposes I'm trying to reproduce Instagram internal API with Ruby and Faraday. However, the response's body I get when making a POST is somehow encoded instead of JSON:
What the response's body should look like:
{
"status": "ok",
"media": {
"page_info": {
"start_cursor": "1447303180937779444_4460593680",
"has_next_page": true,
"end_cursor": "1447303180937779444",
"has_previous_page": true
},
...
What I get:
#=> \x1F\x8B\b\x00#\x15\x9EX\x02\xFF...
Question:
Any idea (i) why I'm getting a response's body like that and (ii) how can I convert that to JSON?
Flow:
When you hit https://www.instagram.com/explore/locations/127963847/madrid-spain/ in your browser Instagram makes two requests (among others):
GET: https://www.instagram.com/explore/locations/127963847/madrid-spain/
POST: https://www.instagram.com/query/
I used Postman to intercept requests and just copied headers and parameters for the second (/query/) request. This is my implementation (get status '200'):
class IcTest
require 'open-uri'
require "net/http"
require "uri"
def self.faraday
conn = Faraday.new(:url => 'https://www.instagram.com') do |faraday|
faraday.request :url_encoded # form-encode POST params
faraday.response :logger # log requests to STDOUT
faraday.adapter Faraday.default_adapter # make requests with Net::HTTP
end
res = conn.post do |req|
req.url '/query/'
req.headers['Origin'] = 'https://www.instagram.com'
req.headers['X-Instagram-AJAX'] = '1'
req.headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
req.headers['Content-Type'] = 'application/x-www-form-urlencoded'
# req.headers['Accept'] = '*/*'
req.headers['X-Requested-With'] = 'XMLHttpRequest'
req.headers['X-CSRFToken'] = 'SrxvROytxQHAesy1XcgcM2PWrEHHuQnD'
req.headers['Referer'] = 'https://www.instagram.com/explore/locations/127963847/madrid-spain/'
req.headers['Accept-Encoding'] = 'gzip, deflate, br'
req.headers['Accept-Language'] = 'es,en;q=0.8,gl;q=0.6,pt;q=0.4,pl;q=0.2'
req.headers['Cookie'] = 'mid=SJt50gAEAAE6KZ50GByVoStJKLUH; sessionid=IGSC514a2e9015f548b09176228f83ad5fe716f32e7143f6fe710c19a71c08b9828b%3Apc2KPxgwvokLyZhfZHcO1Qzfb2mpykG8%3A%7B%22_token%22%3A%2233263701%3Ai7HSIbxIMLj70AoUrCRjd0o1g7egHg79%3Acde5fe679ed6d86011d70b7291901998b8aae7d0aaaccdf02a2c5abeeaeb5908%22%2C%22asns%22%3A%7B%2283.34.38.249%22%3A3352%2C%22time%22%3A1486584547%7D%2C%22last_refreshed%22%3A1436584547.2838287%2C%22_platform%22%3A4%2C%22_token_ver%22%3A2%2C%22_auth_user_backend%22%3A%22accounts.backends.CaseInsensitiveModelBackend%22%2C%22_auth_user_id%22%3A33233701%2C%22_auth_user_hash%22%3A%22%22%7D; ds_user_id=31263701; csrftoken=sxvROytxQHAesy1XcgcM2PWrEHHuQnD; s_network=""; ig_vw=1440; ig_pr=2;'
req.body = { :q => "ig_location(127963847) { media.after('', 60) { count, nodes { caption, code, comments { count }, comments_disabled, date, dimensions { height, width }, display_src, id, is_video, likes { count }, owner { id }, thumbnail_src, video_views }, page_info} }",
:ref => "locations::show",
:query_id => "" }
end
end
Thanks.
Josh comment made it! :-)
The body's content was gzip.
Solution here.
Sample code:
sub record_put :Private {
my ( $self, $c, #args ) = #_;
$c->log->info( join ', ', %{ $c->request->headers } ) ;
$c->log->info( $c->request->body ) ;
$c->response->body( $c->request->body ) ;
}
Here's the log data:
[info] user-agent, Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/28.0.1500.71 Chrome/28.0.1500.71 Safari/537.36, connection, keep-alive, accept, application/json, text/javascript, */*; q=0.01, accept-language, en-US,en;q=0.8, x-requested-with, XMLHttpRequest, origin, http://localhost:3000, accept-encoding, gzip,deflate,sdch, content-length, 125, host, localhost:3000, ::std_case, HASH(0xaec0ba0), content-type, application/json, referer, http://localhost:3000/test
[info] /tmp/PM2C6FXpcC
Here's a snippet of text from the Catalyst::Request document:
$req->body
Returns the message body of the request, as returned by HTTP::Body: a string, unless Content-Type is application/x-www-form-urlencoded, text/xml, or multipart/form-data, in which case a File::Temp object is returned.
The File::Temp manpage does not help. Even the 'object' overloads its stringification, I can't see how to extract the contents.
Here's what I used:
my $rbody = $c->req->body;
if ($rbody) {
# Post requests are stored on the filesystem under certain obscure conditions,
# in which case $rbody is a filehandle pointing to the temporary file
if (ref $rbody) { # a filehandle
$content = join "", readline($rbody);
close $rbody;
unlink "$rbody"; # filehandle stringifies to name of temp file
} else { # a string
$content = $rbody;
}
}
The thing you get back from the body method represents a temporary file, and can be treated like a filehandle or like a string. if you treat it like a filehandle, it reads from the temporary file; if used like a string, its value is the name of the temporary file. I used the seldom-seen builtin function readline, which is the same as the more common <…> operator.
I don't expect the else path to ever be taken, but it's there defensively, because you never know.
Added 2014-06-09: You need the explicit close; otherwise the code has a file descriptor leak. Catalyst devs claim that it should be cleaning up the handle automatically, but it doesn't.
if you are just trying to parse JSON, the newest stable Catalyst has a method 'body_data' that does this for you (see: http://www.catalystframework.org/calendar/2013/6)