How to parse encoded HTML - html

I'm working on a digest email to send to users of my companies app. For this I'm going through each users emails and trying to find some basic information about each email (from, subject, timestamp, and, the aspect that's causing me difficulty, an image).
I assumed Nokogiri's search('img') function would be fine to pull out images. Unfortunately it looks like most emails have a lot of garbage embedded in the URLs of those images, like newlines ("\n"), escape characters ("\"), and the string "3D" for some reason. For example:
<img src=3D\"https://=\r\nd3ui957tjb5bqd.cloudfront.net/images/emails/1/logo.png\"
This is causing the search to only pull out pieces of the actual URLs/src's:
#(Element:0x3fd0c8e83b80 {
name = "img",
attributes = [
#(Attr:0x3fd0c8e82a28 { name = "src", value = "3D%22https://=" }),
#(Attr:0x3fd0c8e82a14 { name = "d3ui957tjb5bqd.cloudfront.net", value = "" }),
#(Attr:0x3fd0c8e82a00 { name = "width", value = "3D\"223\"" }),
#(Attr:0x3fd0c8e829ec { name = "heigh", value = "t=3D\"84\"" }),
#(Attr:0x3fd0c8e829d8 { name = "alt", value = "3D\"Creative" }),
#(Attr:0x3fd0c8e829c4 { name = "market", value = "" }),
#(Attr:0x3fd0c8e829b0 { name = "border", value = "3D\"0\"" })]
})
Does anyone have an idea why this is happening, and how to remove all this junk?
I'm getting decent results from lots of gsub's and safety checks but it feels pretty tacky.
I've also tried Sanitize.clean which doesn't work and the PermitScrubber mentioned in "How to sanitize html string except image url?".

The mail body is encoded as quoted printable. You will need to decode the body before you parse it with Nokogiri. You can do this fairly easily with Ruby using unpack:
decoded = encoded.unpack('M').first
You should check what the encoding is by looking at the mail headers before trying to decode, not all mail is encoded this way, and there are other types of encoding.

I am not a master in scraping, but you are able to get it through the CSS attribute
.at_css("img")['src']
For example:
require "open-uri"
require "nokogiri"
doc = open(url_link)
page = Nokogiri::HTML(doc)
page.css("div.col-xs-12.visible-xs.visible-sm div.school-image").each do |pic|
img = pic.at_css("img")['src'].downcase if pic.at_css("img")
end

Related

Trying to make a command to store and retreat info from a json file

Im trying to make a command that will store name,description and image of a character and another command to retrieve that data in an embed,but i have trouble working with json files
this is my code to add them:
#client.command()
async def addskillset(ctx):
await ctx.send("Let's add this skillset!")
questions = ["What is the monster name?","What is the monster description?","what is the monster image link?"]
answers = []
#code checking the questions results
embedkra = nextcord.Embed(title = f"{answers[0]}", description = f"{answers[1]}",color=ctx.author.color)
embedkra.set_image(url = f"{answers[2]}")
mess = await ctx.reply(embed=embedkra,mention_author=False)
await mess.add_reaction('✅')
await mess.add_reaction('❌')
def check(reaction, user):
return user == ctx.author and (str(reaction.emoji) == "✅" or "❌")
try:
reaction, user = await client.wait_for('reaction_add', timeout=1000.0, check=check)
except asyncio.TimeoutError:
#giving a message that the time is over
else:
if reaction.emoji == "✅":
monsters = await get_skillsets_data() #this data is added at the end
if str(monster_name) in monsters:
await ctx.reply("the monster is already added")
else:
monsters[str(monster_name)]["monster_name"] = {}
monsters[str(monster_name)]["monster_name"] = answers[0]
monsters[str(monster_name)]["monster_description"] = answers[1]
monsters[str(monster_name)]["monster_image"] = answers[2]
with open('skillsets.json','w') as f:
json.dump(monsters,f)
await mess.delete()
await ctx.reply(f"{answers[0]} successfully added to the list")
Code to get the embed with the asked info:
#client.command()
async def skilltest(ctx,*,monster_name):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
and my json should look like this:
{"katufo": {"monster_name": "Katufo","Monster_description":"Katufo is the best","Monster_image":"#image_link"},
"armor claw":{"monster_name": "Armor Claw","Monster_description":"Armor claw is the best","Monster_image":#image_link}}
The get_skillsets_data used in first command:
async def get_skillsets_data():
with open('skillsets.json','r') as f:
monsters = json.load(f)
return monsters
Well, When you are trying to retrieve data from your json file try using name = data["katufo"]["monster_name"] now here it will only retrieve monster_name of key katufo. If You want to retrieve data for armor claw code must go like this name = data["armor claw"]["monster_name"]. So try this code :
#client.command()
async def skilltest(ctx,*,monster):
data = open('skillsets.json').read()
data = json.loads(data)
if str(monster) in data:
name = data[f"monster"]["monster_name"]
description = data[f"monster"]["Monster_description"]
link = data[f"monster"]["Monster_image"]
embedkra = nextcord.Embed(title = f"{name}", description = f"{description}",color=ctx.author.color)
embedkra.set_image(url = f"{link}")
await ctx.reply(embed=embedkra,mention_author=False)
else:
# otherwise, it is still None meaning we didn't find it
await ctx.reply("monster not found",mention_author=False)
Hope this works for you :)
If your json looks like what you showed above,
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
}
}
then there is no data["monster_name"] the two objects inside of your JSON are named katufo and armor_claw. To get one of them you can simply write data['katufo']['monster_name'] or data.katufo.monster_name.
Your problem stems from looking up the monster name like this:
if str(monster_name) in data:
name = data["monster_name"]
description = data["monster_description"]
link = data["monster_image"]
What you could do instead is loop through data, as it contains several monsters and then on each object, to the check that you do:
for monster in data:
if str(monster_name) in monster.values():
name = monster.monster_name
description = monster.Monster_description
link = monster.Monster_image
One thing to think about, the way the variables are named is not something I personally recommend. Don't be afraid of adding longer descriptive names so things make more sense for you in the code. Also, in the JSON you provided, there are certain attributes starting with a capital letter, something you should think about.
Edit:
Dicts in python are the equivalent of objects in Javascript and are initialized using the same syntax which we can see below:
monster_data = {}
But since you want a specific structure on these monsters we can go further and create a function called add_monster_object():
def add_monster_object(original_dict, new_monster):
new_monster = {
"monster_name": '',
"monster_description": '',
"monster_image": ''
}
#Now we have a new empty object with the correct names.
return original_dict.update(new_monster)
Now every time you run this function with a given name, in the dict there will be an object with that name. Example is if user writes armor_sword as the monster_name attribute, then we can call the function above as add_monster_object(original_dict, monster_name).
This will, if we take your initial dict as an example, return this:
{
"katufo":{
"monster_name":"Katufo",
"Monster_description":"Katufo is the best",
"Monster_image":"#image_link"
},
"armor claw":{
"monster_name":"Armor Claw",
"Monster_description":"Armor claw is the best",
"Monster_image":"#image_link"
},
"armor sword":{
"monster_name":"",
"monster_description":"",
"monster_image":""
}
}
Then you can populate them as you want, or update the function to take more parameters. The important part here is that you take a minute and figure out what you want to keep saved. Then make sure that you can read and write from file and you should have a somewhat simple structure going. Warning: This isn't a slap and dry method, you will also have to think about special cases, such as adding an object that already exists and soforth.
If you decide to go with Replit you could use their database to create similar functionality but you wouldn't have to worry about reading and writing to a file.
As it is right now, I still think you need to proceed with your bot, add some of the changes that I mentioned before the next actual problem arrives as there are many things that arent quite right. I also suggest you break everything into managing parts, 1 would be to read from a file. 2 would be to write. 3 to write a dict to a file. 4 to update a dict and soforth. Good luck!

Bad JSON escape sequence: \\1. Path 'sections[0].facts[0].value', line 1, position 196."

Am getting getting Bad JSON escape sequence: \\1. Path 'sections[0].facts[0].value', line 1, position 196." while parsing json string with string.Format to add data based on placeholder.
Below is the code :
string jsonFormatString = #"{{""title"": ""{0}"",""sections"": [{{""facts"": [{{""name"": ""TransactionNo"",""value"": ""{1}""}}]}}]}}";
string formattedJson= string.Format(jsonFormatString , message, transactionNo ?? "");
var result = JsonConvert.DeserializeObject<dynamic>(formattedJson);
The string values for placeholders were message = "Transaction Success", transactionNo = "1920\01\ABC";
Looks like escape character transactionNo field is creating problem.
I tried with string.Replace(#"\\", #"\") and others nothing helped in resolving the error.
If i added string.Replace(#"\", #"\\", it works but result json contains transactionNo as "1920\\01\\ABC" which is also wrong.
Am i missing something or should i add anything more
I have seen folks do this in the past and even had to maintain code that looks like this... and as you are experiencing: It is a huge PITA.
You can easily lay out your objects using anonymous types so they are simple to maintain and understand for the folks who may have the pleasure to work with your code in the future.
Take a look at this:
var myObj = new {
title = "<input value>",
sections = new[]
{
new
{
facts = new[]
{
new
{
name = "TransactionNo",
value = "<input value>"
}
}
}
}
};
var result = JsonConvert.SerializeObject(myObj, Formatting.Indented);
Result will look like this:
{
"title": "<input value>",
"sections": [
{
"facts": [
{
"name": "TransactionNo",
"value": "<input value>"
}
]
}
]
}
Now, keep in mind that this is easier to maintain than formatting a string, but at the end of the day it's better to just bite the bullet and create some concrete models.

Getting image URL's from amazon product page

I'm trying to scrape image URL's from Amazon products, for example, this link.
In the page source code, there is a section which contains all the urls for images of different sizes (large, medium, hirez, etc). I can get that part of the script by doing, with scrapy,
imagesString = (response.xpath('//script[contains(., "ImageBlockATF")]/text()').extract_first())
Which gives me a string that looks like this,
P.when('A').register("ImageBlockATF", function(A){
var data = {
'colorImages': { 'initial': [{"hiRes":"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SL1500_.jpg","thumb":"https://images-na.ssl-images-amazon.com/images/I/31HoKqtljqL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/31HoKqtljqL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX355_.jpg":[308,355],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX450_.jpg":[390,450],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX425_.jpg":[369,425],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX466_.jpg":[404,466],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX522_.jpg":[453,522],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX569_.jpg":[494,569],"https://images-na.ssl-images-amazon.com/images/I/81FED1p-sTL._SX679_.jpg":[589,679]},"variant":"MAIN","lowRes":null},{"hiRes":"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SL1500_.jpg","thumb":"https://images-na.ssl-images-amazon.com/images/I/31Y%2B8oE5DtL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/31Y%2B8oE5DtL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX355_.jpg":[308,355],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX450_.jpg":[390,450],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX425_.jpg":[369,425],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX466_.jpg":[404,466],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX522_.jpg":[453,522],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX569_.jpg":[494,569],"https://images-na.ssl-images-amazon.com/images/I/81e8905DlhL._SX679_.jpg":[589,679]},"variant":"PT01","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/51rORrvh0hL.jpg":[333,500]},"variant":"PT02","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/41L2OU5rPyL.jpg":[333,500]},"variant":"PT03","lowRes":null},{"hiRes":null,"thumb":"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SS40_.jpg","large":"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL.jpg","main":{"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX355_.jpg":[236,355],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX450_.jpg":[300,450],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX425_.jpg":[283,425],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL._SX466_.jpg":[310,466],"https://images-na.ssl-images-amazon.com/images/I/51%2BsCYjx6OL.jpg":[333,500]},"variant":"PT04","lowRes":null}]},
'colorToAsin': {'initial': {}},
'holderRatio': 1.0,
'holderMaxHeight': 700,
'heroImage': {'initial': []},
'heroVideo': {'initial': []},
'spin360ColorData': {'initial': {}},
'spin360ColorEnabled': {'initial': 0},
'spin360ConfigEnabled': false,
'spin360LazyLoadEnabled': false,
'playVideoInImmersiveView':'false',
'tabbedImmersiveViewTreatment':'T2',
'totalVideoCount':'0',
'videoIngressATFSlateThumbURL':'',
'mediaTypeCount':'0',
'atfEnhancedHoverOverlay' : true,
'winningAsin': 'B00XLSS79Y',
'weblabs' : {},
'aibExp3Layout' : 1,
'aibRuleName' : 'frank-powered',
'acEnabled' : false
};
A.trigger('P.AboveTheFold'); // trigger ATF event.
return data;
});
My goal is to get into a Json dictionary the data inside colorImages, so then I can easily get each URL.
I tried doing something like this:
m = re.search(r'^var data = ({.*};)', imagesString , re.S | re.M)
data = m.group()
jsonObj = json.loads(data[:-1].replace("'", '"'))
But it seems that imagesString does not work well with re.search, I keep getting errors regarding imagesString not being a string when it actually is.
I got similar data from an amazon page by using re.findall, something like this (script is a chunk of text i got from the page).
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
and then
variationValuesDict = json.loads(variationValues)
But my knowledge of regular expressions is not that great.
From the string I pasted above, I erased the start and end so only the data remained, so I was left with this:
https://jsoneditoronline.org/?id=9ea92643044f4ac88bcc3e76d98425fc
I can't figure out how to get colorImages with re.findall() (or the data in the json editor) so I can then load it into Json and use it like a dictionary, any ideas on how to achieve this?
You just need to initially convert the var data to the correct markup json. It is easy ))) Just replace all chars ' to " and delete SPACES. And you will get json object:
(It's your right json)

razor URL actionlink

I am trying to create this URL link:
mysite.com/Vote/2/Learn-to-code
Where
area = vote,
id = 2,
topicURL = Learn-to-code
In my routing, I have this to handle this URL pattern:
context.MapRoute(
"Topic",
"Vote/{id}/{topicURL}",
new { controller = "Topic", action = "TopicAnswers" },
new[] { "eus.UI.Areas.Vote.Controllers"}
);
But I am having trouble generating the URL link. Here's my attempt:
#Html.ActionLink("ViewBag.TopicTitle", "TopicAnswers", new { area = "Vote", controller = "Topic", id = ViewBag.TopicId, topicURL = #ViewBag.TopicURL })
First question is: How do I use ViewBag.TopicTitle? If I remove the quotes, it gives red squiggly error. I put the quotes in just so I could run the app to see what URL this generates.
It generates a monster URL.
mysite.com/Vote/Topic/TopicAnswers/2?url=Learn-to-code
However, the URL actually works. But I would really like to create my short and clean looking URL.
mysite.com/Vote/2/Learn-to-code
Any tips greatly appreciated, gracias.
Ok, I did this and it works. This is so simple to read and understand.
#ViewBag.TopicTitle
Is there any good reason why I should attempt to use #Html.ActionLink(...). That just feels like spaghetti.
For starters, why do they place the parameters ("text", action, controller) in this order? That is such a twisty path. This is far more natural to think of:
(controller, action, "text") ... is it not?

Make dynamic name text field in Postman

I'm using Postman to make REST API calls to a server. I want to make the name field dynamic so I can run the request with a unique name every time.
{
"location":
{
"name": "Testuser2", // this should be unique, eg. Testuser3, Testuser4, etc
"branding_domain_id": "52f9f8e2-72b7-0029-2dfa-84729e59dfee",
"parent_id": "52f9f8e2-731f-b2e1-2dfa-e901218d03d9"
}
}
In Postman you want to use Dynamic Variables.
The JSON you post would look like this:
{
"location":
{
"name": "{{$guid}}",
"branding_domain_id": "52f9f8e2-72b7-0029-2dfa-84729e59dfee",
"parent_id": "52f9f8e2-731f-b2e1-2dfa-e901218d03d9"
}
}
Note that this will give you a GUID (you also have the option to use ints or timestamps) and I'm not currently aware of a way to inject strings (say, from a test file or a data generation utility).
In Postman you can pass random integer which ranges from 0 to 1000, in your data you can use it as
{
"location":
{
"name": "Testuser{{$randomInt}}",
"branding_domain_id": "52f9f8e2-72b7-0029-2dfa-84729e59dfee",
"parent_id": "52f9f8e2-731f-b2e1-2dfa-e901218d03d9"
}
}
Just my 5 cents to this matter. When using randomInt there is always a chance that the number might eventually be present in the DB which can cause issues.
Solution (for me at least) is to use $timestamp instead.
Example:
{
"username": "test{{$timestamp}}",
"password": "test"
}
For anyone who's about to downvote me this post was made before the discussion in comments with the OP (see below). I'm leaving it in place so the comment from the OP which eventually described what he needs isn't removed from the question.
From what I understand you're looking for, here's a basic solution. It's assuming that:
you're developing some kind of script where you need test data
the name field should be unique each time it's run
If your question was more specific then I'd be able to give you a more specific answer, but this is the best I can do from what's there right now.
var counter = location.hash ? parseInt(location.hash.slice(1)) : 1; // get a unique counter from the URL
var unique_name = 'Testuser' + counter; // create a unique name
location.hash = ++counter; // increase the counter by 1
You can forcibly change the counter by looking in the address bar and changing the URL from ending in #1 to #5, etc.
You can then use the variable name when you build your object:
var location = {
name: unique_name,
branding_domain_id: 'however-you-currently-get-it',
parent_id: 'however-you-currently-get-it'
};
Add the below text in pre-req:
var myUUID = require('uuid').v4();
pm.environment.set('myUUID', myUUID);
and use the myUUID wherever you want
like
name: "{{myUUID}}"
It will generate a random unique GUID for every request
var uuid = require('uuid');
pm.globals.set('unique_name', 'testuser' + uuid.v4());
add above code to the pre-request tab.
this was you can reuse the unique name for subsequent api calls.
Dynamic variable like randomInt, or guid is dynamic ie : you donot know what was send in the request. there is no way to refer it again, unless it is send back in response. even if you store it in a variable,it will still be dynamic
another way is :
var allowed = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var shuffled_unique_str = allowed.split('').sort(function(){return 0.5-Math.random()}).join('');
courtsey refer this link for more options