Selenium and html agility pack drops html - html

I'm using the HTML Agility Pack and Selenium to crawl a site, find particular tables, and then parse those tables. Everything works fine individually, but when I run the app, it sometimes drops huge chunks of HTML from within the table. When I track down the page on the site with the data, the HTML is there. For whatever reason, it isn't there when the crawler is running.
Here's the code. The rows[r].InnerHtml is NOT the HTML from page. Anyone have any thoughts on what might be happening here?
public IMyInterface CreateObjectFromHtmlRow(HtmlNode rowNode)
{
try
{
var columns = rowNode.SelectNodes("td");
MyClass obj = new MyClass()
{
OnlineId = columns[0].InnerText.Trim(),
FirstName = columns[1].InnerText.Trim(),
MiddleInitial = columns[2].InnerText.Trim(),
LastName = columns[3].InnerText.Trim(),
Residence = columns[4].InnerText.Trim(),
};
return obj;
}
catch (Exception exc)
{
_logger.LogFormat("Error trying to parse row: {0}", exc.Message);
return null;
}
}
IMyInterface obj = null;
obj = _repository.CreateObjectFromHtmlRow(rows[r]);
if (obj == null)
{
_logger.LogFormat("Unable to create object from this data: {0}", rows[r].InnerHtml);
}
else
{
// Do something useful
}
Thanks for your help.
WW

Related

TableContinuationToken not getting Deserialised from JSON correctly

I am having trouble trying to retrieve large datasets from Azure TableStorage. After several attempts at trying to get it in one go I have given up and am now using the TableContinuation Token, which is now not getting Deserialized correctly.The object is getting created but all the Next... values (i.e. NextRowKey, NextPartitionKey, etc are NULL, when the in stringresponse that gets created you can see the values it should be populating with...
The class I am passing contains a list of objects and the token
public class FlorDataset
{
public List<FlorData> Flors { get; set; }
public TableContinuationToken Token { get; set; }
}
The controller code is not exactly rocket science either....
[HttpGet, Route("api/list/{token}")]
public IHttpActionResult FindAll(string token)
{
try
{
TableContinuationToken actualToken = token == "None"
? null
: new TableContinuationToken()
{
NextPartitionKey = NextPartition,
NextRowKey = token,
NextTableName = NextTableName
};
var x = Run(actualToken);
Flors = x.Flors;
actualToken = x.Token;
NextTableName = actualToken.NextTableName;
NextPartition = actualToken.NextPartitionKey;
return Flors != null
? (IHttpActionResult)new IsoncOkResult<FlorDataset>(x, this)
: NotFound();
}
catch (Exception ex)
{
Trace.TraceError(ex.ToString());
return NotFound();
}
}
private FlorDataset Run(TableContinuationToken token)
{
return _repo.GetAllByYear("2016", token) as FlorDataset;
}
The calling code, which calls my fairly standard Web API 2 Controller is:
do
{
try
{
HttpResponseMessage response = null;
if (string.IsNullOrEmpty(token.NextRowKey))
{
response = await client.GetAsync("api/list/None");
}
else
{
response = await client.GetAsync($"api/list/{token.NextRowKey}");
}
if (response.IsSuccessStatusCode)
{
var stringresponse = await response.Content.ReadAsStringAsync();
var ds = JsonConvert.DeserializeObject<FlorDataset>(stringresponse);
token = ds.Token;
Flors.AddRange(ds.Flors);
}
else
{
token = null;
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
token = null;
}
} while (token != null);
Okay, this is not the greatest solution, but it's the only thing that works so far in case anyone else is trying the same and stumbling across my question....
In the calling code bit you do a horrible bit of string replacement before you do the deserialisation.... I actually feel dirty just posting this, so if anyone comes up with a better answer, please feel free to share.....
if (response.IsSuccessStatusCode)
{
var stringresponse = await response.Content.ReadAsStringAsync();
stringresponse = stringresponse.Replace(">k__BackingField", "");
stringresponse = stringresponse.Replace("<", "");
var ds = JsonConvert.DeserializeObject<FlorDataset>(stringresponse);
token = ds.Token;
Flors.AddRange(ds.Flors);
}
Not nice, not pretty, but does work!!!! :-D Going to wash my fingers with bleach now!!!

MVC View adding in quotations to html attribute

I am working on a small project and in the partial navigation view I am checking if a page is selected and highlighting the menu.
var controller = HttpContext.Current.Request.RequestContext.RouteData.Values["controller"].ToString().ToLower();
var home = string.Empty;
var content = string.Empty;
switch(controller) {
case "home":
home = "class=current";
break;
case "content":
content = "class=current";
break;
}
In the view I am then doing:
<li #home>Home</li>
Originally in my code I had
home = "class='current'";
Notice I had quotations around it, but when I executed the code the html source looks like
So when I remove the quatations and run it again, since it's adding them in by default, it works, even though the debugger looks like
So the project is working, my question is why is it by default adding in the quotations?
I'm not certain that MVC is adding the quotes, that is probably the Chrome DevTools doing it. If you "View page source", I don't think you will see the quotes.
Just FYI, because of these kinds of things I usually don't include the attribute in such strings, just the value...
<li class="#home">
MVC doesn't add quotes for #home. If you decompile this page, you could get codes like below:
public class _Page_Views_Home_Index_cshtml : WebViewPage<object>
{
// Methods
public override void Execute()
{
((dynamic) base.ViewBag).Title = "Home Page";
base.BeginContext("~/Views/Home/Index.cshtml", 0x27, 2, true);
this.WriteLiteral("\r\n");
base.EndContext("~/Views/Home/Index.cshtml", 0x27, 2, true);
string str = HttpContext.Current.Request.RequestContext.RouteData.Values["controller"].ToString();
string str2 = string.Empty;
string str3 = string.Empty;
string str4 = str;
if (str4 != null)
{
if (!(str4 == "Home"))
{
if (str4 == "content")
{
str3 = "class=current";
}
}
else
{
str2 = "class=current";
}
}
base.BeginContext("~/Views/Home/Index.cshtml", 0x1a6, 9, true);
this.WriteLiteral("\r\n\r\n<div ");
base.EndContext("~/Views/Home/Index.cshtml", 0x1a6, 9, true);
base.BeginContext("~/Views/Home/Index.cshtml", 0x1b0, 4, false);
this.Write(str2);
....
}
}
The Write methods will finally calls WebUtility.HtmlDecode method, this method replaces special chars, but will not add quotes.
Hope this helps.

How to hide library source code in Google way?

For instance, I have a library and I would like to protect the source code to being viewed. The first method that comes to mind is to create public wrappers for private functions like the following
function executeMyCoolFunction(param1, param2, param3) {
return executeMyCoolFunction_(param1, param2, param3);
}
Only public part of the code will be visible in this way. It is fine, but all Google Service functions look like function abs() {/* */}. I am curious, is there an approach to hide library source code like Google does?
Edit 00: Do not "hide" a library code by using another library, i.e. the LibA with known project key uses the LibB with unknown project key. The public functions code of LibB is possible to get and even execute them. The code is
function exploreLib_(lib, libName) {
if (libName == null) {
for (var name in this) {
if (this[name] == lib) {
libName = name;
}
}
}
var res = [];
for (var entity in lib) {
var obj = lib[entity];
var code;
if (obj["toSource"] != null) {
code = obj.toSource();
}
else if (obj["toString"] != null) {
code = obj.toString();
}
else {
var nextLibCode = exploreLib_(obj, libName + "." + entity);
res = res.concat(nextLibCode);
}
if (code != null) {
res.push({ libraryName: libName, functionCode: code });
}
}
return res;
}
function explorerLibPublicFunctionsCode() {
var lstPublicFunctions = exploreLib_(LibA);
var password = LibA.LibB.getPassword();
}
I don't know what google does, but you could do something like this (not tested! just an idea):
function declarations:
var myApp = {
foo: function { /**/ },
bar: function { /**/ }
};
and then, in another place, an anonymous function writes foo() and bar():
(function(a) {
a['\u0066\u006F\u006F'] = function(){
// here code for foo
};
a['\u0062\u0061\u0072'] = function(){
// here code for bar
};
})(myApp);
You can pack or minify to obfuscate even more.
Edit: changed my answer to reflect the fact that an exception's stacktrace will contain the library project key.
In this example, MyLibraryB is a library included by MyLibraryA. Both are shared publicly to view (access controls) but only MyLibraryA's project key is made known. It appears it would be very difficult for an attacker to see the code in MyLibraryB:
//this function is in your MyLibraryA, and you share its project key
function executeMyCoolFunction(param1, param2, param3) {
for (var i = 0; i < 1000000; i++) {
debugger; //forces a breakpoint that the IDE cannot? step over
}
//... your code goes here
//don't share MyLibraryB project key
MyLibraryB.doSomething(args...);
}
but as per the #megabyte1024's comments, if you were to cause an exception in MyLibraryB.doSomething(), the stacktrace would contain the project key to MyLibraryB.

Encompassing object attributes with HTML and return in JSON

currently, i have written the following json search method.
[HttpPost]
public JsonResult Search(string videoTitle)
{
var auth = new Authentication() { Email = "abc#smu.abc", Password = "abc" };
var videoList = server.Search(auth, videoTitle);
String html = "";
foreach(var item in videoList){
var video = (Video)item;
html += "<b>"+video.Title+"</b>";
}
return Json(html, JsonRequestBehavior.AllowGet);
}
On screen, it returns this.
"\u003cb\u003eAge of Conan\u003c/b\u003e"
what should i do? The reason why i want to do this is so that i can make use of CSS to style tags so that it looks aesthetically better as the items drop down from the search input.
thanks
If you want to return pure HTML you shouldn't return JSON, you should rather use the ContentResult:
[HttpPost]
public ContentResult Search(string videoTitle)
{
var auth = new Authentication() { Email = "smu#smu.com", Password = "test" };
var videoList = server.Search(auth, videoTitle);
String html = "";
foreach(var item in videoList)
{
var video = (Video)item;
html += "<b>"+video.Title+"</b>";
}
return Content(html, "text/html");
}
You can request that with standard jQuery.get() and insert directly into DOM.

Multiple PushNotification Subscriptions some work properly and some don't

I tried posting this on the Exchange Development forum and didnt get any replies, so I will try here. Link to forum
I have a windows services that fires every fifteen minutes to see if there is any subscriptions that need to be created or updated. I am using the Managed API v1.1 against Exchange 2007 SP1. I have a table that stores all the users that want there mailbox monitored. So that when a notifcation comes in to the "Listening Service" I am able to look up the user and access the message to log it into the application we are building. In the table I have the following columns that store the subscription information:
SubscriptionId - VARCHAR(MAX)
Watermark - VARCHAR(MAX)
LastStatusUpdate - DATETIME
My services calls a function that queries the data needed (based on which function it is doing). If the user doesn't have a subscription already the service will go and create one. I am using impersonation to access the mailboxes. Here is my "ActiveSubscription" method that is fired when a user needs the subscription either created or updated.
private void ActivateSubscription(User user)
{
if (user.ADGUID.HasValue)
{
PrincipalContext ctx = new PrincipalContext(ContextType.Domain, Settings.ActiveDirectoryServerName, Settings.ActiveDirectoryRootContainer);
using (UserPrincipal up = UserPrincipal.FindByIdentity(ctx, IdentityType.Guid, user.ADGUID.Value.ToString()))
{
ewService.ImpersonatedUserId = new ImpersonatedUserId(ConnectingIdType.SID, up.Sid.Value);
}
}
else
{
ewService.ImpersonatedUserId = new ImpersonatedUserId(ConnectingIdType.SmtpAddress, user.EmailAddress);
}
PushSubscription pushSubscription = ewService.SubscribeToPushNotifications(
new FolderId[] { WellKnownFolderName.Inbox, WellKnownFolderName.SentItems },
Settings.ListenerService, 30, user.Watermark,
EventType.NewMail, EventType.Created);
user.Watermark = pushSubscription.Watermark;
user.SubscriptionID = pushSubscription.Id;
user.SubscriptionStatusDateTime = DateTime.Now.ToLocalTime();
_users.Update(user);
}
We have also ran the following cmdlet to give the user we are accessing the EWS with the ability to impersonate on the Exchange Server.
Get-ExchangeServer | where {$_.IsClientAccessServer -eq $TRUE} | ForEach-Object {Add-ADPermission -Identity $_.distinguishedname -User (Get-User -Identity mailmonitor | select-object).identity -extendedRight ms-Exch-EPI-Impersonation}
The "ActivateSubscription" code above works as expected. Or so I thought. When I was testing it I had it monitoring my mailbox and it worked great. The only problem I had to work around was that the subscription was firing twice when the item was a new mail in the inbox, I got a notification for the NewMail event and Created event. I implemented a work around that checks to make sure the message hasn't already been logged on my Listening service. It all worked great.
Today, we started testing two mailboxes being monitor at the same time. The two mailboxes were mine and another developers mailbox. We found the strangest behavior. My subscription worked as expected. But his didn't, the incoming part of his subscription work properly but any email he sent out the listening service never was sent a notification. Looking at the mailbox properties on Exchange I don't see any difference between his mailbox and mine. We even compared options/settings in Outlook. I can see no reasons why it works on my mailbox and not on his.
Is there something that I am missing when creating the subscription. I didn't think there was since my subscription works as expected.
My listening service code works perfectly well. I have placed the code below incase someone wants to see it to make sure it is not the issue.
Thanks in advance, Terry
Listening Service Code:
/// <summary>
/// Summary description for PushNotificationClient
/// </summary>
[WebService(Namespace = "http://tempuri.org/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
[System.ComponentModel.ToolboxItem(false)]
// To allow this Web Service to be called from script, using ASP.NET AJAX, uncomment the following line.
// [System.Web.Script.Services.ScriptService]
public class PushNotificationClient : System.Web.Services.WebService, INotificationServiceBinding
{
ExchangeService ewService = new ExchangeService(ExchangeVersion.Exchange2007_SP1);
public PushNotificationClient()
{
//todo: init the service.
SetupExchangeWebService();
}
private void SetupExchangeWebService()
{
ewService.Credentials = Settings.ServiceCreds;
try
{
ewService.AutodiscoverUrl(Settings.AutoDiscoverThisEmailAddress);
}
catch (AutodiscoverRemoteException e)
{
//log auto discovery failed
ewService.Url = Settings.ExchangeService;
}
}
public SendNotificationResultType SendNotification(SendNotificationResponseType SendNotification1)
{
using (var _users = new ExchangeUser(Settings.SqlConnectionString))
{
var result = new SendNotificationResultType();
var responseMessages = SendNotification1.ResponseMessages.Items;
foreach (var responseMessage in responseMessages)
{
if (responseMessage.ResponseCode != ResponseCodeType.NoError)
{
//log error and unsubscribe.
result.SubscriptionStatus = SubscriptionStatusType.Unsubscribe;
return result;
}
var sendNoficationResponse = responseMessage as SendNotificationResponseMessageType;
if (sendNoficationResponse == null)
{
result.SubscriptionStatus = SubscriptionStatusType.Unsubscribe;
return result;
}
var notificationType = sendNoficationResponse.Notification;
var subscriptionId = notificationType.SubscriptionId;
var previousWatermark = notificationType.PreviousWatermark;
User user = _users.GetById(subscriptionId);
if (user != null)
{
if (user.MonitorEmailYN == true)
{
BaseNotificationEventType[] baseNotifications = notificationType.Items;
for (int i = 0; i < notificationType.Items.Length; i++)
{
if (baseNotifications[i] is BaseObjectChangedEventType)
{
var bocet = baseNotifications[i] as BaseObjectChangedEventType;
AccessCreateDeleteNewMailEvent(bocet, ref user);
}
}
_PreviousItemId = null;
}
else
{
user.SubscriptionID = String.Empty;
user.SubscriptionStatusDateTime = null;
user.Watermark = String.Empty;
_users.Update(user);
result.SubscriptionStatus = SubscriptionStatusType.Unsubscribe;
return result;
}
user.SubscriptionStatusDateTime = DateTime.Now.ToLocalTime();
_users.Update(user);
}
else
{
result.SubscriptionStatus = SubscriptionStatusType.Unsubscribe;
return result;
}
}
result.SubscriptionStatus = SubscriptionStatusType.OK;
return result;
}
}
private string _PreviousItemId;
private void AccessCreateDeleteNewMailEvent(BaseObjectChangedEventType bocet, ref User user)
{
var watermark = bocet.Watermark;
var timestamp = bocet.TimeStamp.ToLocalTime();
var parentFolderId = bocet.ParentFolderId;
if (bocet.Item is ItemIdType)
{
var itemId = bocet.Item as ItemIdType;
if (itemId != null)
{
if (string.IsNullOrEmpty(_PreviousItemId) || (!string.IsNullOrEmpty(_PreviousItemId) && _PreviousItemId != itemId.Id))
{
ProcessItem(itemId, ref user);
_PreviousItemId = itemId.Id;
}
}
}
user.SubscriptionStatusDateTime = timestamp;
user.Watermark = watermark;
using (var _users = new ExchangeUser(Settings.SqlConnectionString))
{
_users.Update(user);
}
}
private void ProcessItem(ItemIdType itemId, ref User user)
{
try
{
ewService.ImpersonatedUserId = new ImpersonatedUserId(ConnectingIdType.SmtpAddress, user.EmailAddress);
EmailMessage email = EmailMessage.Bind(ewService, itemId.Id);
using (var _entity = new SalesAssistantEntityDataContext(Settings.SqlConnectionString))
{
var direction = EmailDirection.Incoming;
if (email.From.Address == user.EmailAddress)
{
direction = EmailDirection.Outgoing;
}
int? bodyType = (int)email.Body.BodyType;
var _HtmlToRtf = new HtmlToRtf();
var message = _HtmlToRtf.ConvertHtmlToText(email.Body.Text);
bool? IsIncoming = Convert.ToBoolean((int)direction);
if (IsIncoming.HasValue && IsIncoming.Value == false)
{
foreach (var emailTo in email.ToRecipients)
{
_entity.InsertMailMessage(email.From.Address, emailTo.Address, email.Subject, message, bodyType, IsIncoming);
}
}
else
{
if (email.ReceivedBy != null)
{
_entity.InsertMailMessage(email.From.Address, email.ReceivedBy.Address, email.Subject, message, bodyType, IsIncoming);
}
else
{
var emailToFind = user.EmailAddress;
if (email.ToRecipients.Any(x => x.Address == emailToFind))
{
_entity.InsertMailMessage(email.From.Address, emailToFind, email.Subject, message, bodyType, IsIncoming);
}
}
}
}
}
catch(Exception e)
{
//Log exception
using (var errorHandler = new ErrorHandler(Settings.SqlConnectionString))
{
errorHandler.LogException(e, user.UserID, user.SubscriptionID, user.Watermark, user.SubscriptionStatusDateTime);
}
throw e;
}
}
}
I have two answers for you.
At first you will have to create one instance of ExchangeService per user. Like I understand your Code you just create one instance and switch the impersonation, which is not supported. I developed a windowsservice which is pretty similar to yours. Mine is synchronising the mails between our CRM and Exchange. So at startup I create an instance per user and Cache it as long as the application runs.
Now about cache-mode. The diffrence between using cache-mode and not is just a timing gab. In cache-mode Outlook synchronizes from time to time. And non cached it's in time. When you use the cache-mode and want the Events immediatly on your Exchange-Server you can press the "send and receive"-button in Outlook to force the sync.
Hope that helps you...