How to handle duplicates in emailaddreses when using multiple signup forms with MailChimp? - duplicates

I have various signup forms on my website, all adding to 1 list in MailChimp.
Each signup form has different purpose, e.g. free ebook, video etc.
By using a hidden source field I managed to provide the correct download urls etc.
But... How to handle a visitor who signs up on multiple forms?
I want to share the related materials with him, but MC marks the duplicate as error and requests to update profile... with no option to obtain the related material.

You'll probably need to use the API to make this work, a simple form isn't going to cut it.
There are some options. If there are only a handful of things, have them be interest groups. When someone signs up, say for a free ebook, mark the free ebook interest group and subscribe them if necessary.
You can then create an automation for each interest group to send the material.

I would consider creating multiple lists for this purpose, since a MC list can never contain any duplicates.

Related

make a URL displayed clickable without control of the website

I am doing some volunteer work for a charity that is using a couple online systems that store their donors and related data. I would like to find a way to store a URL as a custom field in such a way that they can put corresponding links between donors in one of the systems in order to quickly find the same donor in another system. The only built-in method in the products being used is to store a single value in a field labeled "website" which is originally intended to store a value for any website associated with the donor. I would like to avoid using this field if possible and instead create a custom field.
However, the rub is the custom fields only have a handful of options (clear text, date, currency, etc). There is no option to store a URL or something like rich text). I've thought of a couple less optimal ways to make the values stored in those fields clickable (a browser plugin or a proxy) however both of those have obvious drawbacks that I would like to avoid.
What I am wondering and hoping someone has a possible answer for, is if there are an ways of storing a value in a clear text field that might disrupt or escape the underlying html encoding such that the displayed link is clickable. I already control the values being put into these fields (users cannot enter their own values, they are essentially read-only), so security isn't much of a concern.
I have very limited access or influence to have any system level changes, however I would like to make this possible as it would help them a great deal (their users are all volunteers with limited time and education). I've tried a few tricks but havn't found anything that doesn't get converted to unicode or escaped (it could be that it's completely controlled for at output, i simply don't know).
My current attempts have been limited to using the built in forms submission, I may explore their import and/or API methods on the theory that might allow better low-level access to storing the actual values in the system, however I'm still not certain what to try other than adding .
I have also tried an inline script to add the corresponding tab, however that seems to break the form submission method (perhaps it'll work via csv import or via the API)
Does anyone have suggestions for other things I could try before I go any further? I'm a bit of a novice and feel like there may be something else obvious I haven't tried.

PowerApp Canvas App for survey/questionanire

I need to build a PowerApp which presents a daily list of tasks to a user such as number of surveys to complete; within which postcode area etc.
They then want to complete a questionnaire; and store submissions in the PowerApps own Dataverse.
My thoughts are;
Ideally i would want to hook up Microsoft Forms (standard not Pro) so the PowerApp could either link to the form and pre-populate a unique ID or similar? they complete the form; and a Logic App would process into the DataVerse by unique ID. However not sure this is possible?
Other option is i design the Canvas App to complete the survey (similar to the Employee Engagement survey template) - but likely will have 30/40 questions - this would obviously have a direct link to the Dataverse so no Logic App required
Other options?
Thanks in advance
Rich
Stay away from Microsoft Forms for a 30-40 question survey. The conditional logic, error handling, input validation, etc are not robust enough. You'd likely get too much free text to produce any meaningful reports from.
Caveat being: Do external users (outside your Active Directory) need to access the survey?
If so, then Forms is likely the only way to go (unless you want to somehow manage PowerApps licenses for external users).
PowerApps will suffice.

If I have a collection of random websites, how do I get specific information from each?

Say I have a collection of websites for accountants, like this:
http://www.johnvanderlyn.com
http://www.rubinassociatespa.com
http://www.taxestaxestaxes.com
http://janus-curran.com
http://ricksarassociates.com
http://www.condoaudits.com
http://www.krco-cpa.com
http://ci.boca-raton.fl.us
What I want to do is crawl each and get the names & emails of the partners. How should I approach this problem, at a high-level?
Assume I know how to actually crawl each site (and all subpages) & parse the HTML elements -- I am using Oga.
What I am struggling with is how to make sense of data that is presented in a wide variety of ways. For instance, the email address for the firm (and or partner) can be found in one of these ways:
On the About Us page, under the name of the partner.
On the About Us page, as a generic catch-all email.
On the Team page, under the name of the partner.
On the Contact Us page, as a generic catch-all email.
On a Partner's page, under the name of the partner.
Or it could be any other way.
One way I was thinking about approaching the email, is just to search for all mailto a tags and filter from there.
The obvious downside for this is that there is no guarantee that the email will be for the partner and not some other employee.
Another issue that is more obvious is detecting the partner(s) names just from the markup. I was initially thinking I could just pull all the header tags and text in them, but I have stumbled across a few sites that have the partner names in span tags.
I know SO is usually for specific programming questions, but I am not sure how to approach this and where to ask this. Is there another StackExchange site that this question is more appropriate for?
Any advice on specific direction you can give me would be great.
I looked at the http://ricksarassociates.com/ website and I cant find any partners at all so in my opinion you better stand to gain from this if not you better look for some other invention.
I have done similar datascraping from time to time, and in norway we have laws - or should I say "laws" - that you are not allowed to email people however you are allowed to email the company - so in a way the same problem from another angle.
I wish I knew maths and algorythms by heart because I am sure there is a fascinating sollution hidden in AI and machine learning, but in my mind the only sollution I can see is building a rule set that over time probably gets quite complex. Maby you could apply some bayesian filtering - it works very well for email.
But - to be a little more productive here. One thing i know is inmportant, you could start by creating the crawler environment and building the dataset. Have the database for URLS so you can add more at any time, and start the crawling on what you have already so that you do your testing querying your own data with a 100% copy. This will save you enormous time instead of live scraping while tweaking.
I did my own search engine some years ago, scraping all NO domains however I needed only the index file that time. Took over a week alone just to scrape it down and I think it was 8GB of data just for that single file, and I had to use several proxyservers aswell to make it work due to problems with to much DNS traffik. Lots of problems that needed being taken care of. I guess I am only saying - if you are crawling a large scale you might aswell start getting the data down if you want to work efficient with the parsing later.
Good luck, and do post if you get a sollution. I do not think it is posible without an algorythm or AI though - people design websites the way they like and they pull templates out of their arse so there are no rules to follow. You will end up with bad data.
Do you have funding for this? If so its simpler. Then you could just crawl each site, and make a profile for each site. You could employ someone cheap to manual go through the parsed data and remove all the errors. This is probably how most people does it, unless someone already have done it and the database is for sale / available from webservice so it can be scraped.
The links you provide are mainly US site, so I guess you are focusing on English names. In that case, instead of parsing from html tags, I would just search the whole webpage for name. (There are free database of first name and last name) This may also work if you are donig this for some other Europe company, but it would be a problem for company from some countries. Take Chinese as an example, while there is a fix set of last name, one may use basically any combination of Chinese character as first name, so this solution won't work for Chinese site.
It is easy to find email from a webpage as there is a fixed format of (username)#(domain name) with no space in between. Again I won't treat it as html tags but just as normal string so that the email can be found no matter it is in mailto tag or in plain text. Then, to determine what email is it:
Only one email in page?
Yes -> catch-all email.
No -> Is name found in that page as well?
No -> catch-all email (can have more than one catch-all email, maybe for different purpose like info + employment)
Yes -> Email should be attached to the name found right before it. It is normal that the name should appear before the email.
Then, it should be safe to assume the name appear first belongs to more important member, e.g. Chairman or partner.
I have done similar scraping for these types of pages, and it varies wildly from site to site. If you are trying to make one crawler to sort of auto find the information, it will be difficult. However, the high level looks something like this.
For each site you check, look for element patterns. Divs will often have labels, ID's, and classes which will easily let you grab information. Perhaps you find that many divs will have a particular class name. Check for this first.
It is often better to grab too much data from a particular page, and boil it down on your side afterwards. You could, perhaps, look for information which comes up on a screen by utilizing type (is link) or regex (is email) to look for formatted text. Names and occupation will be harder to find by this method, but might be related positionally on many pages to other well formatted items.
Names will often be affixed with honorifics (Mrs., Mr., Dr., JD, MD, etc.) You could come up with a bank of those, and check against them for any page you end up on.
Finally, if you really wanted to make this process general purpose, you could do some heuristics to improve your methods based off of expected information; names, for example, are most often within a particular list. If it was worth your time, you could check certain text for whether it matches a list of more common names.
What you mentioned in your initial question seems that you would have a lot of benefit with a general purpose Regular Expressions crawler, and you could make improvements on it as you know more about the sites which you interact with.
There are excellent posts on this topic with a lot of useful links throughout these webpages:
https://www.quora.com/What-is-a-good-web-scraper-for-pulling-emails-names-etc-even-if-the-contact-info-is-another-page-deep-a-browser-add-on-is-a-plus
http://www.hongkiat.com/blog/web-scraping-tools/
http://www.garethjames.net/a-guide-to-web-scraping-tools/
http://www.butleranalytics.com/15-web-scraping-tools/
Some of the examined applications are working in macOS.

How can I disable semantic notations in text areas in Semantic MediaWiki Forms?

I am working on a user-moderated database and settled on MediaWiki with Semantic MediaWiki as an engine. I installed Semantic Forms to force the end users to conform to a certain standard when creating or editing entries. The problem is that since a user can add a semantic notation to any form text input it can throw off the proper structure of the system, i.e. if it was an IMDB clone a user can add [[Directed by:Forest Gump]] which would then result in the movie "Forest Gump" showing up under a list of directors.
I doubt that there's any setting that can simply turn this off or on, but I've had one or two ideas as to how to get it working.
One, perhaps there's a way to disable semantic notation on specific namespaces and put the forms on those namespaces. I have a feeling that this will cause the forms to merely break.
Another idea is to modify the code. This is clearly the less ideal approach. To get started, I believe I would need to create some sort of filter on SFTextAreaInput which would disable semantic notations for the user inserted text, but alas I'm unsure as to how to get started on that.
Well, Semantic MediaWiki is still a Wiki. In your classical enterprise database, you restrict the users' input options as a means of ensuring data integrity. That isn't what wikis do; the thinking with a wiki is, yes, the user can enter incorrect information, but another user will amend it and let the first user know what was wrong.
I wouldn't try to coerce SMW into rigid data acquisition. I mean, you do have options such as removing the standard input fields in forms:
'''Free text:'''
{{{standard input|free text|rows=10}}}
If users are selecting a movie page when they should be selecting a director page, then you probably want to encourage correct selection by populating the form control from the Directors category, like:
{{{field|Director|input type=combobox|values from category=Directors}}}
Yes, they can still go very far out of their way to select "Forrest Gump", but if that happens then the fact that someone wilfully circumvented the preselected correct options is a more pressing concern than the fact that the system permits it.
Wikis work best when the system encourages rather than enforces valid knowledge.
My name is Wolfgang Fahl I am behind the smartMediaWiki approach. You might want to go the smartMediaWiki route
see
http://semantic-mediawiki.org/wiki/SMWCon_Spring_2015/smartMediaWiki
For a start don't go just by the property values but e.g. also by a category.
{{#ask: [[Category:Movie]] [[Directed by::+]]
|?Directed by
}}
will only show pages that have both the property set and are in the correct category.
In the smartMediaWiki approach you'd create a topic "Movie" and the entry of movies would be done via Forms. This is an elaboration of the SemanticForms and semantic PageSchemas idea that recently evolved. You can find out more about this at SMWCon Barcelona 2015 this fall.

How do I prevent my HTML from being exploitable while avoiding GUIDs?

I've recently inherited a ASP.NET MVC 4 code base. One problem I noted was the use of some database ids (ints) in the urls as well in html form submissions. The code in its present state is exploitable through both URL tinkering and creating custom HTML posts with different numbers.
Now while I can easily fix the URL problems by using session state or additional auth checks i'm less sure about the database ids that get embedded into the HTML that the site spits out (i.e. I give them a drop down to fill). When the ids come back in a post how can I be sure I put them there as valid options?
What is considered "best practice" in terms of addressing this problem?
While I appreciate I could just "GUID it up" I'm hesitant to do so because I find them a pain in the ass to work with when debugging databases.
Do I have a choice here? Must I GUID to prevent easy guessing of ids or is there some kind of DRY mechanism I can use to validate the usage of ids as they come back into the site?
UPDATE: A commenter asked about the exploits I'm expecting. Lets say I spit out a HTML form with a drop down list of all the locations one can import "treasure" from. The id of the locations that the user owns are 1,2 and 3, these are provided in the HTML. But the user examines the html, fiddles with it and decides to put together a POST with the id of 4 selected. 4 is not his location, its someone else's.
Validate the ID passed against the IDs the user can modify.
It may seem tedious, but this is really the only way to make sure the user has access to what they're trying to modify. Using GUIDs without validation is security by obscurity: sure guessing them is hard, but you can potentially guess them given enough resources.
You can do this at the top of the controller before you do anything else with the posted data. If there's a violation, just throw an exception and have your global exception handler deal with it; you don't need to handle it in a pretty way since you can safely assume that the user is tampering with data in an unsupported way.
The issue you describe is known as "insecure direct object references," and the OWASP group recommends two policies for dealing with this issue:
using session-based indirect object references, and
validating all accesses to object references.
An example of Suggestion #1 would be that instead of having dropdown options 1, 2, and 3, you assign each option a GUID that is associated with the original ID in a map in the user's session. When you get a POST from that user, you check to see what object the given ID was supposed to be tied to. OWASP's ESAPI has some libraries to help with this in various languages.
But in many cases Suggestion #1 is actually counterproductive. For example, in many cases you want to have URLs that can be copy/pasted from one user to another. Process #2 is generally seen as the most foolproof way to address this issue.
You are describing Broken Access Control with Insecure Ids. Once you've identified the threat and decided which Ids are owned by certain users, ensure checks are in place for this server side.