Google Page Speed API Path? - google-apis-explorer

I' wondering if anyone can shed some light on the PageSpeed API results.
I am trying to access specific elements from the document. I have used the "selector" but that can return multiple elements. I am curious if there is a way to use the the path result and convert this to XPATH?

I was recently working on pulling specific elements from PageSpeed results too, but I parsed the JSON data and extracted my desired data from there. This seemed more straightforward once I understood how to parse the JSON. I set up a loop because I had hundreds of URLs, but for simpler purposes this may be helpful. This is what I used:
url = ['https://www.yourwebsite.com/']
API_Key = "yoUrKeY"
baseURL = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url="
strategy = "mobile"
response_url = baseURL+url+'&key='+API_Key+'&strategy='+strategy
response = requests.get(response_url).json()
url_id = response['originLoadingExperience']['id']
overall_score = response['lighthouseResult']['categories']['performance']['score']*100
fcp_score = response['originLoadingExperience']['metrics']['FIRST_CONTENTFUL_PAINT_MS']['percentile']/1000
print(url_id, overall_score, fcp_score)

Related

Apps script JSON.parse() returns unexpected result, how can I solve this?

I am currently working on external app using Google Sheets and JSON for data transmission via Fetch API. I decided to mock the scenario (for debugging matters) then simple JSON comes from my external app through prepared Code.gs to be posted on Google sheets. The code snippet I run through Apps-scripts looks like this:
function _doPost(/* e */) {
// const body = e.postData.contents;
const bodyJSON = JSON.parse("{\"coords\" : \"123,456,789,112,113,114,115,116\"}" /* instead of : body */);
const db = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
db.getRange("A1:A10").setValue(bodyJSON.coords).setNumberFormat("#"); // get range, set value, set text format
}
The problem is the result I get: 123,456,789,112,113,000,000,000 As you see, starting from 114 and the later it outputs me 000,... instead. I thought, okay I am gonna explicitly specify format to be returned (saved) as a text format. If the output within the range selected on Google Sheets UI : Format -> Number -> it shows me Text.
However, interesting magic happens, let's say if I would update the body of the JSON to be parsed something like that when the sequence of numbers composed of 2 digits instead of 3 (notice: those are actual part of string, not true numbers, separated by comma!) : "{\"coords\" : \"123,456,789,112,113,114,115,116,17,18\"}" it would not only show response result as expected but also brings back id est fixes the "corrupted" values hidden under the 000,... as so : "{"coords" : "123,456,789,112,113,114,115,116,17,18 "}".
Even Logger.log() returns me initial JSON input as expected. I really have no clue what is going on. I would really appreciate one's correspondence to help solving this issue. Thank you.
You can try directly assigning a JSON formatted string in your bodyJSON variable instead of parsing a set of string using JSON.parse.
Part of your code should look like this:
const bodyJSON = {
"coords" : "123,456,789,112,113,114,115,116"
}
I found simple workaround after all: just added the preceding pair of zeros 0,0,123,... at the very beginning of coords. This prevents so called culprit I defined in my issue. If anyone interested, the external app I am building currently, it's called Hotspot widget : play around with DOM, append a marker which coordinates (coords) being pushed through Apps-script and saved to Google Sheets. I am providing a link with instructions on how to set up one's own copy of the app. It's a decent start-off for learning Vanilla JavaScript basics including simple database approach on the fly. Thank you and good luck!
Hotspot widget on Github

interpreting a json string

I have an object in my database following a file upload that look like this
a:1:{s:4:"file";a:3:{s:7:"success";b:1;s:8:"file_url";a:2:{i:0;s:75:"http://landlordsplaces.com/wp-content/uploads/2021/01/23192643-threepersons.jpg";i:1;s:103:"http://landlordsplaces.com/wp-content/uploads/2021/01/364223-two-female-stick-figures.jpg";}s:9:"file_path";a:2:{i:0;s:93:"/var/www/vhosts/landlordsplaces.com/httpdocs/wp-content/uploads/2021/01/23192643-threepersons.jpg";i:1;s:121:"/var/www/vhosts/landlordsangel.com/httpdocs/wp-content/uploads/2021/01/364223-two-female-stick-figures.jpg";}}}
I am trying with no success to parse extract the two jpg urls programmatically from the object so i can show the images ont he site. Tried assigning parse(object) but that isnt helping. I just need to get the urls out.
Thank you in anticipation of any general direction
What you're looking at is not a JSON string. It is a serialized PHP object. If this database entry was created by Forminator, you should use the Forminator API to retrieve the needed form entry. The aforementioned link points to the get_entry method, which I suspect is what you're looking for (I have never used Forminator), but in any case, you should look for a method that will return that database entry as a PHP object containing your needed URLs.
In case it is ever of any help to anyone the answer to the question was based on John input. The API has the classes to handle that without needing to understand the data structure.
Forminator_API::initialize();
$form_id = 1449; // ID of a form
$entry_id = 3; // ID of an entry
$entry = Forminator_API::get_entry( $form_id, $entry_id );
$file_url = $entry->meta_data['upload-1']['value']['file']['file_url'];
$file_path = $entry->meta_data['upload-1']['value']['file']['file_path'];
var_dump($entry); //contains paths and urls
Hope someone benefits.

How do I have beautiful soup read in the html fully? Possibly selenium issue?

I am trying to have some practice with beautiful soup, web scraping and python but I am struggling with getting this data from certain tags. I am trying to go through multiple pages of data on cars.com.
So when I read in the html, and the tags I need are
<cars-shop-srp-pagination>
</cars-shop-srp-pagination>
because the page number is in between them and in order for me to loop through the website pages, I need to know the max pages
from bs4
import BeautifulSoup
import requests
url = 'https://www.cars.com/for-sale/searchresults.action/?dealerType=all&mkId=20089&page=1&perPage=20&prMx=25000&rd=99999&searchSource=GN_REFINEMENT&sort=relevance&stkTypId=28881&zc=21042'
#
source = requests.get('https://www.cars.com/for-sale/searchresults.action/?dealerType=all&mdId=58767&mkId=20089&page=1&perPage=20&prMx=25000&rd=99999&searchSource=GN_REFINEMENT&sort=relevance&zc=21042').text
source = requests.get(url).content
soup = BeautifulSoup(source, 'html.parser')
print(soup.prettify())
link = soup.find(word_ = "cars-shop-srp-pagination")# linkNext = link.find('a')
print(link)
When I go through the output, the only thing I see for the "cars-shop-srp-pagination: is
<cars-shop-srp-pagination>
</cars-shop-srp-pagination>
when I need to see:
All of the code inside of them, specifically I want to get to:
*"<li ng-if="showLast"> <a class="js-last-page" ng-click="goToPage($event, numberOfPages)">50</a> </li>"*
Remember that BeautifulSoup only parses through HTML/XML code that you give it. If the page number isn't in your captured HTML code in your first place, then that's a problem with being able to capture the code properly, not with BeautifulSoup. Unfortunately, I think that this data is dynamically generated.
I found a work-around, though. Notice that at the top of the search results, the page says "(some number of cars) matches near you". For example:
<div class="matchcount">
<span class="filter-count">1,711</span>
<span class="filter-text"> matches near you</span>
You could capture this number, then divide by the number of results per page being displayed. In fact, this latter number can be passed into the URL. Note that you have to round up to the nearest integer, to catch the search results that show up on the final page. Also, any commas in numbers over 999 have to be removed from the string before you can int it.
from bs4 import BeautifulSoup
import urllib2
import math
perpage = 100
url = 'https://www.cars.com/for-sale/searchresults.action/'
url += '?dealerType=all&mdId=58767&mkId=20089&page=1&perPage=%d' % perpage
url += '&prMx=25000&searchSource=PAGINATION&sort=relevance&zc=21042'
response = urllib2.urlopen(url)
source = response.read()
soup = BeautifulSoup(source, 'lxml')
count_tag = soup.find('span', {'class' : 'filter-count'})
count = int(count_tag.text.replace(',',''))
pages = int(math.ceil(1.0* count / perpage))
print(pages)
One catch to this however is that if the search isn't refined enough, the website will say something like "Over 30 thousand matches", which is not an integer.
Also, I was getting a 503 response from requests.get(), so I switched to using urllib2 to get the HTML.
All that info (number of results, number of pages, results per page) is stored in a javascript dictionary within the returned content. You can simply regex out the object and parse with json. Note that the url is a query string and you can alter the results per page count in it. So, after doing an initial request to determine how many results there are, you can perform calcs to make any other changes. Note that you may also be able to use json through out and not BeautifulSoup. Though I think there would be a limit (perhaps the 20) with grabbing as shown below from each page so probably better to go with the 100 results per page and make initial request, regex out info and if more than 100 results then loop, altering url, to collect rest of results.
I don't think, regardless of the number of pages indicated/calculated, that you can actually go beyond page 50.
import requests
import re
import json
p = re.compile(r'digitalData = (.*?);')
r = requests.get('https://www.cars.com/for-sale/searchresults.action/?dealerType=all&mkId=20089&page=1&perPage=20&prMx=25000&rd=99999&searchSource=GN_REFINEMENT&sort=relevance&stkTypId=28881&zc=21042')
data = json.loads(p.findall(r.text)[0])
num_results_returned = data['page']['search']['numResultsReturned']
total_num_pages = data['page']['search']['totalNumPages']
num_results_on_page = data['page']['search']['numResultsOnPage']

is there a way to automate downloading of wikipedia articles using special:export?

I want to be able to download full histories of a few thousand articles from http://en.wikipedia.org/wiki/Special:Export and I am looking for a programmatic approach to automate it
I started the following in python but that doesn't get any useful result
query = "http://en.wikipedia.org/w/index.api?title=Special:Export&pages=%s&history=1&action=submit" % 'Page_title_here'
f = urllib.urlopen(query)
s = f.read()
Any suggestions?
Drop the list of pages you want to download in the pages array and this should work. Run the script and it will print the XML file. Note that Wikipedia seems to block the urllib user agent, but I don't see anything on the pages that suggests automatically downloading is disallowed. Use at your own risk.
You can also add 'curonly':1 to the dictionary to fetch only the current version.
#!/usr/bin/python
import urllib
class AppURLopener(urllib.FancyURLopener):
version = "WikiDownloader"
urllib._urlopener = AppURLopener()
query = "http://en.wikipedia.org/w/index.php?title=Special:Export&action=submit"
pages = ['Canada']
data = { 'catname':'', 'wpDownload':1, 'pages':"\n".join(pages)}
data = urllib.urlencode(data)
f = urllib.urlopen(query, data)
s = f.read()
print(s)

Trouble finding DataGrid to implement for over 10,000 records with pagination, filtering, and clickable links for each row

I have tried using a few different data grids (FlexiGrid, ExtJs Grid, and YUI DataGrid) and have found YUI to work the best as far as documentation and features available. However, I am having difficulty setting up the data source. When I try to set it up using JSON, it takes too long, or times out. I have already maxed out the memory usage in the php.ini file. There will be many more records in the future as well.
I need to select data to populate the grid based on the user that is currently logged in. Once this information populates the grid, I need each id to be click-able and take me to a different page, or populate information in a div on the same page.
Does anyone have suggestions on loading 25 – 50 records at a time of dynamic data? I have tried implementing the following example to do what I want: YUI Developer Example
I cannot get the data grid to show at all. I have changed the data instance to the following.
// DataSource instance
var curDealerNumber = YAHOO.util.Dom.getElementsByClassName('dealer_number', 'input');
var ds_path = + "lib/php/json_proxy.php?dealernumber='" + curDealerNumber + "'";
var myDataSource = new YAHOO.util.DataSource("ds_path");
myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;
myDataSource.responseSchema = {
resultsList: "records",
fields: [
{key:"id", parser:"number"},
{key:"user_dealername"},
{key:"user_dealeraccttype"},
{key:"workorder_num", parser:"number"},
{key:"segment_num", parser:"number"},
{key:"status"},
{key:"claim_type"},
{key:"created_at"},
{key:"updated_at"}
],
metaFields: {
totalRecords: "totalRecords" // Access to value in the server response
}
};
Any help is greatly appreciated, and sorry if this seems similar to other posts, but I searched and still could not resolve my problem. Thank you!
It's hard to troubleshoot without a repro case, but I'd suggest turning on logging to see where the problem might be:
load datatable-debug file
load logger
either call YAHOO.widget.Logger.enableBrowserConsole() to output logs to your browser's JS console (i.e., Firebug), or call new YAHOO.widget.LogReader() to output logs to the screen.
Also make sure the XHR request and response are well-formed with Firebug or similar tool.
Finally, when working with large datasets, consider
pagination
enabling renderLoopSize (http://developer.yahoo.com/yui/datatable/#renderLoop)
chunking data loads into multiple requests (http://developer.yahoo.com/yui/examples/datatable/dt_xhrjson.html).
There is no one-size-fits-all solution for everyone, but hopefully you can find the right set of tweaks for your use case.