First off I'm a scientist, NOT a coder. Haven't coded since my college days so feel free to knock me around a bit. But I have a project for a non-profit that I'd like to help with.
I have the code to dl the JSON file, a sample of which I'll provide below. But for now, my goal is to search and display unique birds in each of the unique areas. I've spend about 10 days scouring the web, writing many hundreds of lines of code, all to no avail. I'm certain one of you will spend 3 minutes to write one line comprehension that'll do it perfectly. My hat's off in advance.
Here's a small sample extracted from thousands of items in a dl'ed JSON file:
{
"speciesCode": "snogoo",
"comName": "Snow Goose",
"sciName": "Anser caerulescens",
"locId": "L1415313",
"locName": "Vacation Isle",
"obsDt": "2023-02-15 15:28",
"howMany": 3,
"lat": 32.7750146,
"lng": -117.2352583,
"obsValid": false,
"obsReviewed": false,
"locationPrivate": false,
"subId": "S128423924"
},
{
"speciesCode": "gwfgoo",
"comName": "Greater White-fronted Goose",
"sciName": "Anser albifrons",
"locId": "L1415313",
"locName": "Vacation Isle",
"obsDt": "2023-02-15 15:28",
"howMany": 1,
"lat": 32.7750146,
"lng": -117.2352583,
"obsValid": false,
"obsReviewed": false,
"locationPrivate": false,
"subId": "S128423924"
},
{
"speciesCode": "snogoo",
"comName": "Snow Goose",
"sciName": "Anser caerulescens",
"locId": "L1415313",
"locName": "Vacation Isle",
"obsDt": "2023-02-15 15:28",
"howMany": 3,
"lat": 32.7750146,
"lng": -117.2352583,
"obsValid": false,
"obsReviewed": false,
"locationPrivate": false,
"subId": "S128423922"
},
What I need to do is to extract the unique "comName" species that have been seen at each "locName". So in the above extraction there are two different records of a Snow Goose showing up at the same location. I only need one. I won't give you the giggles by offering my attempts. I can traverse the JSON fine, and create a dict with different results. But selecting the uniques in the nested loops have bamboozled me. If I've violated any common rules here, please flog me gently. I really would like to be a good netizen.
Thank you much for any help you can provide.
Related
I am currently having troubles with Secomea Data Collection Module, I was wondering if anyone here might be able to enlighten me.
So I am collecting sensor data from the product Secomea 3529 through a portal called Secomea Sitemanager. I can't seem to find any information about my two questions below, I hope someone knows the answer.
Information about protocol used in this project
"Protocol": "S7/TCP",
"S7Access": {
"S7Model": "S7-200",
"S7Rack": 0,
"S7Slot": 1
When collecting data it is programmed using JSON as seen below.
I was wondering if it is possible to somehow have more than one TriggerSample and if so how is it set up?
{
"SampleName": "Sensor1",
"SampleDescription": "Some Description",
"SampleDataType": "bool",
"SamplesSaved": 3600,
"Aggregation": {
"Function": "compute",
"Expression": "Sensor2,1,/",
"TriggerSample": "Sensor3"
}
},
My other question, is it possible to have more than one S7Var?
{
"SampleName": "ModeCheck",
"SampleDescription": "Mode status",
"SampleDataType": "int16",
"SamplesSaved": 360,
"S7Var": {
"S7PLCVar": "LocationInMachineDB1",
"S7SampleInterval": 5
}
},
So in my angular project, I want to render an array of product object.
I was able to render it as JSON object:
<td>{{o.products |json}}</td>
And for example this one of the outputs:
[ { "id": 4, "name": "Forever", "description": "Because you suffer a lot physically and morally, we will not let you suffer financially.\n• Lump sum payment: Up to US $500,000 paid immediately upon diagnosis of any covered 32 critical illnesses.\n• Worldwide coverage: Giving you the assistance you need even if you move to another country.\n• Telemedicine and e-counsultancy through World Care International: Access to free expert care from world-renowned medical centres in the US specialising in your condition.", "price": 300, "logo": "assets\\download(5).jpg", "category": 1, "image": "assets\\forever.jpg" } ]
Now what if I only want to show the name attribute and not the whole product attributes. How can I do that?
You should use ngFor directive, to create a for-loop that iterates over all products, and print only the product name:
<td *ngFor="let product of o.products">{{product.name}}</td>
I would like to get recent fact checks for using Google's Fact Check Tools. There is a search API here: https://developers.google.com/fact-check/tools/api/reference/rest/v1alpha1/claims.
I want to get the list of recent fact checks with no search query, just like the explorer here: https://toolbox.google.com/factcheck/explorer/search/list:recent;hl=en. The APIs seem to only show query, even though the explorer lets you get recent fact checks. Is there a way to get the recent ones?
This link could give you all the information: https://developers.google.com/fact-check/tools/api/reference/rest/v1alpha1/claims/search
Enter a query and press EXECUTE and you can examine the results by "selecting all" in the little box where show the results.
This https://developers.google.com/fact-check/tools/api/reference/rest/v1alpha1/claims/search?apix_params=%7B%22maxAgeDays%22%3A33%2C%22query%22%3A%22preexisiting%22%2C%22reviewPublisherSiteFilter%22%3A%22Washington%20Post%22%7D
doesn't work because "Washington Post" is not valid and Google provides no list of valid "reviewPublisherSiteFilter"s.
Leave APIKEY box blank and use NUMBER OF DAYS in "maxAgeDays" and you should get the result you want.
Something like this:
{
"claims": [
{
"text": "“We're going to be doing a health care plan, very strongly, and protect people with preexisting conditions… We have other alternatives to Obamacare that are 50% less expensive and that are actually better.”\n“We have run [Obamacare] so much better than Obama ran it.”\n“At the end of my first term, we're going to have close to 300, maybe over 300 new federal judges, including Court of Appeal, two Supreme Court justices.”\nStock Market is proof that Americans are “doing better than they were doing before the pandemic came.”\n“We want people to come into our country ... but we want them to come in through a legal system.”",
"claimant": "#dwebbKHN",
"claimDate": "2020-09-17T10:21:00Z",
"claimReview": [
{
"publisher": {
"name": "Misbar",
"site": "misbar.com"
},
"url": "https://misbar.com/factcheck/2020/09/17/trump-town-hall-special-%E2%80%93-other-topics",
"title": "Trump Town Hall Special – Other Topics | Fact Check",
"reviewDate": "2020-09-17T10:21:00Z",
"textualRating": "Fake",
"languageCode": "en"
}
]
},
{
"text": "Mr. Trump, who has not followed through on a pledge in July that he would have a health care plan ready and signed in two weeks, said his administration would not get rid of the preexisting conditions coverage that were implemented by the Affordable Care Act. He was responding to Ellesia Blaque, an assistant professor who lives in Philadelphia, who told him she's paying $7,000 a year for life-saving medicine because of a condition she was born with, sarcoidosis.",
"claimant": "Donald Trump",
"claimDate": "2020-09-16T00:00:00Z",
"claimReview": [
{
"publisher": {
"name": "CBS News",
"site": "cbsnews.com"
},
"url": "https://www.cbsnews.com/news/trump-town-hall-fact-check-health-care-covid-19/#preexisting",
"title": "Fact-checking Trump's town hall health care claims",
"reviewDate": "2020-09-16T00:00:00Z",
"textualRating": "Mostly False",
"languageCode": "en"
}
]
},
I am no RegEx expert. I am trying to understand if can use RegEx to find a block of data from a JSON file.
My Scenario:
I am using an AWS RDS instance with enhanced monitoring. The monitoring data is being sent to a CloudWatch log stream. I am trying to use the data posted in CloudWatch to be visible in log management solution Loggly.
The ingestion is no problem, I can see the data in Loggly. However, the whole message is contained in one big blob field. The field content is a JSON document. I am trying to figure out if I can use RegEx to extract only certain parts of the JSON document.
Here is an sample extract from the JSON payload I am using:
{
"engine": "MySQL",
"instanceID": "rds-mysql-test",
"instanceResourceID": "db-XXXXXXXXXXXXXXXXXXXXXXXXX",
"timestamp": "2017-02-13T09:49:50Z",
"version": 1,
"uptime": "0:05:36",
"numVCPUs": 1,
"cpuUtilization": {
"guest": 0,
"irq": 0.02,
"system": 1.02,
"wait": 7.52,
"idle": 87.04,
"user": 1.91,
"total": 12.96,
"steal": 2.42,
"nice": 0.07
},
"loadAverageMinute": {
"fifteen": 0.12,
"five": 0.26,
"one": 0.27
},
"memory": {
"writeback": 0,
"hugePagesFree": 0,
"hugePagesRsvd": 0,
"hugePagesSurp": 0,
"cached": 505160,
"hugePagesSize": 2048,
"free": 2830972,
"hugePagesTotal": 0,
"inactive": 363904,
"pageTables": 3652,
"dirty": 64,
"mapped": 26572,
"active": 539432,
"total": 3842628,
"slab": 34020,
"buffers": 16512
},
My Question
My question is, can I use RegEx to extract, say a subset of the document? For example, CPU Utilization, or Memory etc.? If that is possible, how do I write the RegEx? If possible, I can use it to drill down into the extracted document to get individual data elements as well.
Many thanks for your help.
First I agree with Sebastian: A proper JSON parser is better.
Anyway sometimes the dirty approach must be used. If your text layout will not change, then a regexp is simple:
E.g. "total": (\d+\.\d+) gets the CPU usage and "total": (\d\d\d+) the total memory usage (match at least 3 digits not to match the first total text, memory will probably never be less than 100 :-).
If changes are to be expected make it a bit more stable: ["']total["']\s*:\s*(\d+\.\d+).
It may also be possible to match agains return chars like this: "cpuUtilization"\s*:\s*\{\s*\n.*\n\s*"irq"\s*:\s*(\d+\.\d+) making it a bit more stable (this time for irq value).
And so on and so on.
You see that you can get fast into very complex expressions. That approach is very fragile!
P.S. Depending of the exact details of the regex of loggy, details may change. Above examples are based on Perl.
I am attempting to scrape User reviews from Google places reviews (the API only returns 5 most helpful reviews). I am attempting to use Beautifulsoup to retrieve 4 pieces of information
1) Name of the reviewer
2) When the review was written
3) Rating (out of 5)
4) Body of review
Inspecting each element I can find the location of the information
1) Name of reviewer:
<a class="_e8k" style="color:black;text-decoration:none" href="https://www.google.com/maps/contrib/103603482673238284204/reviews">Steve Fox</a>
2) When the review was written
<span style="color:#999;font-size:13px">3 months ago</span>
3) Rating (visible in code, but doesn't show when "run code snippet"
<span class="_pxg _Jxg" aria-label="Rated 1.0 out of 5,"><span style="width:14px"></span></span>
4) Body of the review
<span jsl="$t t-uvHqeLvCkgA;$x 0;" class="r-i8GVQS_tBTbg">Don't go near this company. Must be the world's worst ISP. Threatened to set debt collection services on me when I refused to pay for a service that they had cut off through competence. They even spitefully managed to apply block on our internet connection after we moved to a new Isp. I hate this company.</span>
I am struggling with how to refer to the position of information within the HTML. I see the last 3 pieces of information are in spans so I attempted the following- but none of relevant information was returned
import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://www.google.co.nz/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=orcon&lrd=0x6d0d3833fefacf95:0x59fef608692d4541,1,').read()
soup = bs.BeautifulSoup(sauce, 'lxml')
attempt1 = soup.find_all('span class')
for span in attempt1:
print(span)
I assume I am not correctly/accurately referencing the 4 pieces of information within the HTML. Can some point out what is wrong? Regards Steve
To scrape the reviews of a place you'll need the place id. It looks like this 0x89c259a61c75684f:0x79d31adb123348d2.
And then you need to make the request with the fallowing url that contains the place_id:
https://www.google.com/async/reviewDialog?hl=en&async=feature_id:0x89c259a61c75684f:0x79d31adb123348d2,sort_by:,next_page_token:,associated_topic:,_fmt:pc
Alternatively you could use a third party solution like SerpApi. It's a paid API with a free trial. We handle proxies, solve captchas, and parse all the rich structured data for you.
Example python code (available in other libraries also):
from serpapi import GoogleSearch
params = {
"engine": "google_maps_reviews",
"place_id": "0x89c259a61c75684f:0x79d31adb123348d2",
"hl": "en",
"api_key": "secret_api_key"
}
search = GoogleSearch(params)
results = search.get_dict()
Example JSON output:
"reviews": [
{
"user": {
"name": "HerbertTomlinson O",
"link": "https://www.google.com/maps/contrib/100851257830988379503?hl=en-US&sa=X&ved=2ahUKEwiIlNzLtJrxAhVFWs0KHfclCwAQvvQBegQIARAy",
"thumbnail": "https://lh3.googleusercontent.com/a/AATXAJyjD5T8NEJSdOUAveA8IuMDTLXE9edBHDpFTvZ8=s40-c-c0x00000000-cc-rp-mo-br100",
"reviews": 2
},
"rating": 4,
"date": "2 months ago",
"snippet": "Finally, I found the best coffee shop today. Their choice of music is usually blasting from the past which was really relaxing and made me stay longer. There are tables for lovers and also for group of friends. The coffees and foods here are very affordable and well worth the money. You can't go wrong with this coffee shop. This is very worth to visit."
},
{
"user": {
"name": "Izaac Collier",
"link": "https://www.google.com/maps/contrib/116734781291082397423?hl=en-US&sa=X&ved=2ahUKEwiIlNzLtJrxAhVFWs0KHfclCwAQvvQBegQIARA-",
"thumbnail": "https://lh3.googleusercontent.com/a-/AOh14GgfhltPhiWrkTwe6swLUQRCWf_asuTfHPRnJCLc=s40-c-c0x00000000-cc-rp-mo-br100",
"reviews": 2
},
"rating": 5,
"date": "a month ago",
"snippet": "I am not into coffee but one of my friends invited me here. As I looked the menu, I was convinced, so I ordered one for me. The food was tasty and the staff were very friendly and accommodating. The ambience was very cosy and comfortable. The coffee was great and super tasty. I will recommend this and will visit again!"
},
...
Check out the documentation for more details.
Disclaimer: I work at SerpApi.