Most of our staff work remotely in different countries of the world.
Often several staff work (on different aspects) of the same case.
At the moment the person who initiates the cases has to email the office manager who has to inssue a case number which then has to be shared with different staff members to make sure they use the same Case number in their forms and correspondence.
I was wondering whether it would be possible to:
Have a page on our website (accesible to our staff only)
Where the person initating a case goes to
The staff member is asked to enter his initials (eg DH or RD)
Then automatically a code is generated (RD001, DH001, etc.), it will be helpful for other purposes if the number is always 5 characters long (e.g. RD001, RD025, RD234, etc...).
These numbers need to be sequential (so if RD got the number RD001 1 hr ago, or 1 day ago, he needs to get RD002 the next time he requests a number), so the page needs to remember the last number that was issued for that staff member (they need to be sequential per staff member).
That number is then emailed to the relevant staff members who need to be aware that this number has been issued
Is that possible?
Sure it's possible, but what you are asking is actually a complete solution development. You've to hire a developer, who will create a system with authentication AND authorization, cases management (new case, details of the case, etc...) and so on.
But overall, it's a trivial job : )
EDIT: If your question is exclusively considering only HTML, then I really don't think this is possible, since your "number" should be generated and accessed from anywhere. Then, you have to make it globally accessible.
Also, it's really important that only your staff, and only the ones with rights to do that, could access and/or generate new numbers, hence the authorization/authentication need.
EDIT 2: Another possibility is search for a already made solution. I believe that should exist even online services with your requirements, like some online CRM or something like that.
Related
Say I have a collection of websites for accountants, like this:
http://www.johnvanderlyn.com
http://www.rubinassociatespa.com
http://www.taxestaxestaxes.com
http://janus-curran.com
http://ricksarassociates.com
http://www.condoaudits.com
http://www.krco-cpa.com
http://ci.boca-raton.fl.us
What I want to do is crawl each and get the names & emails of the partners. How should I approach this problem, at a high-level?
Assume I know how to actually crawl each site (and all subpages) & parse the HTML elements -- I am using Oga.
What I am struggling with is how to make sense of data that is presented in a wide variety of ways. For instance, the email address for the firm (and or partner) can be found in one of these ways:
On the About Us page, under the name of the partner.
On the About Us page, as a generic catch-all email.
On the Team page, under the name of the partner.
On the Contact Us page, as a generic catch-all email.
On a Partner's page, under the name of the partner.
Or it could be any other way.
One way I was thinking about approaching the email, is just to search for all mailto a tags and filter from there.
The obvious downside for this is that there is no guarantee that the email will be for the partner and not some other employee.
Another issue that is more obvious is detecting the partner(s) names just from the markup. I was initially thinking I could just pull all the header tags and text in them, but I have stumbled across a few sites that have the partner names in span tags.
I know SO is usually for specific programming questions, but I am not sure how to approach this and where to ask this. Is there another StackExchange site that this question is more appropriate for?
Any advice on specific direction you can give me would be great.
I looked at the http://ricksarassociates.com/ website and I cant find any partners at all so in my opinion you better stand to gain from this if not you better look for some other invention.
I have done similar datascraping from time to time, and in norway we have laws - or should I say "laws" - that you are not allowed to email people however you are allowed to email the company - so in a way the same problem from another angle.
I wish I knew maths and algorythms by heart because I am sure there is a fascinating sollution hidden in AI and machine learning, but in my mind the only sollution I can see is building a rule set that over time probably gets quite complex. Maby you could apply some bayesian filtering - it works very well for email.
But - to be a little more productive here. One thing i know is inmportant, you could start by creating the crawler environment and building the dataset. Have the database for URLS so you can add more at any time, and start the crawling on what you have already so that you do your testing querying your own data with a 100% copy. This will save you enormous time instead of live scraping while tweaking.
I did my own search engine some years ago, scraping all NO domains however I needed only the index file that time. Took over a week alone just to scrape it down and I think it was 8GB of data just for that single file, and I had to use several proxyservers aswell to make it work due to problems with to much DNS traffik. Lots of problems that needed being taken care of. I guess I am only saying - if you are crawling a large scale you might aswell start getting the data down if you want to work efficient with the parsing later.
Good luck, and do post if you get a sollution. I do not think it is posible without an algorythm or AI though - people design websites the way they like and they pull templates out of their arse so there are no rules to follow. You will end up with bad data.
Do you have funding for this? If so its simpler. Then you could just crawl each site, and make a profile for each site. You could employ someone cheap to manual go through the parsed data and remove all the errors. This is probably how most people does it, unless someone already have done it and the database is for sale / available from webservice so it can be scraped.
The links you provide are mainly US site, so I guess you are focusing on English names. In that case, instead of parsing from html tags, I would just search the whole webpage for name. (There are free database of first name and last name) This may also work if you are donig this for some other Europe company, but it would be a problem for company from some countries. Take Chinese as an example, while there is a fix set of last name, one may use basically any combination of Chinese character as first name, so this solution won't work for Chinese site.
It is easy to find email from a webpage as there is a fixed format of (username)#(domain name) with no space in between. Again I won't treat it as html tags but just as normal string so that the email can be found no matter it is in mailto tag or in plain text. Then, to determine what email is it:
Only one email in page?
Yes -> catch-all email.
No -> Is name found in that page as well?
No -> catch-all email (can have more than one catch-all email, maybe for different purpose like info + employment)
Yes -> Email should be attached to the name found right before it. It is normal that the name should appear before the email.
Then, it should be safe to assume the name appear first belongs to more important member, e.g. Chairman or partner.
I have done similar scraping for these types of pages, and it varies wildly from site to site. If you are trying to make one crawler to sort of auto find the information, it will be difficult. However, the high level looks something like this.
For each site you check, look for element patterns. Divs will often have labels, ID's, and classes which will easily let you grab information. Perhaps you find that many divs will have a particular class name. Check for this first.
It is often better to grab too much data from a particular page, and boil it down on your side afterwards. You could, perhaps, look for information which comes up on a screen by utilizing type (is link) or regex (is email) to look for formatted text. Names and occupation will be harder to find by this method, but might be related positionally on many pages to other well formatted items.
Names will often be affixed with honorifics (Mrs., Mr., Dr., JD, MD, etc.) You could come up with a bank of those, and check against them for any page you end up on.
Finally, if you really wanted to make this process general purpose, you could do some heuristics to improve your methods based off of expected information; names, for example, are most often within a particular list. If it was worth your time, you could check certain text for whether it matches a list of more common names.
What you mentioned in your initial question seems that you would have a lot of benefit with a general purpose Regular Expressions crawler, and you could make improvements on it as you know more about the sites which you interact with.
There are excellent posts on this topic with a lot of useful links throughout these webpages:
https://www.quora.com/What-is-a-good-web-scraper-for-pulling-emails-names-etc-even-if-the-contact-info-is-another-page-deep-a-browser-add-on-is-a-plus
http://www.hongkiat.com/blog/web-scraping-tools/
http://www.garethjames.net/a-guide-to-web-scraping-tools/
http://www.butleranalytics.com/15-web-scraping-tools/
Some of the examined applications are working in macOS.
Is it possible to make a website with only one account per person?
Any suggestion is good.
Thanks
If you don't want people to go generating 100 accounts a minute, you'll need something like captcha, which is very easy to add on to your website.
You can do other things, like associate each account with an email address, and make the user verify that that email actually exists by sending a link out to that email address so that when they click on it, it verifies the connection.
To associate exactly one account per person, you are going to need to use some sort of official identity, and usually for smaller websites that doesn't make sense. By official identity I mean verify their credit card or government identification (social security?), but then you run into a lot of problems because people won't want to do this, and it is going to cost money to make sure that these identities are real. Also, if you really need something like this, you're going to have to beef up the security of your website.
An alternative is to require a user to put in a verification number which you send to them via SMS, and ensure the phone number they enter is unique, phones/simcards are relatively cheap these days, but most people wouldn't go through the effort of spending $5 on a new simcard to get a duplicate account on your site.
(and if they would, sell accounts for $3 and undercut the cost of a sim ;)
Ask for identifying information for example credit card data that you can verify and allow registrations only with this identifying information. Of course credit card data can be stolen, so you cannot be 100% sure about anything in internet or in world generally.
Whatever you do, people will still try to buck the system. If you use Email, then people can have multiple free emails, if you block free emails, lots of people dont have any other emails so you block them. If you use IP you block anyone with a shared IP such as ISPs who enforce proxy servers for their clients.
Unless you start asking for social security numbers or NI or whatever your country has, and you start alienating people as a rule because they would consider that irrelevant personal information.
You just have to hope people treat your site fairly, and know some wont.
One potential solution these days is to only create user accounts via federation of identity from a much stricter service provider. For example, verify your users via a facebook account using oauth. I believe facebook is pretty good at detecting shills/spammers and you can harness their resources in your service too.
I think you have 2 options, neither of which will solve the whole problem:
Log the IP address and prevent multiple logins from one IP. Not great, as this will probably backfire if college kids use your site and the dorm uses shared IPs.
Log the MAC address and prevent multiple logins from one MAC. Better, but it will prevent multiple people from the same household from using your site, and most houses have more than one computer/mobile device.
You could always combine the two options, but again with multiple mobile devices it's possible to circumnavigate that.
you could make a code that only lets 1 account be made per computer used. you have to be able to get the computers IP address and block it from making more than 1 account.
I'm designing a web application - prototyping and wireframing the main pages so I've got an idea of what it will do. I'm struggling on how to display my data to users.
We basically provide them with an email inbox, a phone message system and a fax system. This means three different types of data - one is textual, one is audio and one is visual. They share some common properties however, and the point of our service is to unify users communications, so it makes sense to combine them.
Mashing the data together in any way results in a very sparse summary, the only information they share is the sender and the date. So after spending 5 hours agonising over design decisions I thought I'd open it up. The options we're leaning to is
Show a 'unified inbox' with a link to view the full item details on a per line basis
Drop the idea of a dashboard and just have an individual inbox in the web interface for each service. We can display the number of new messages on the tab for the service so they know there are new messages
Show a very simple summary as the dashboard, merely showing the number of new 'communications' in each of the users inboxes (fax, email, voice).
What is best from a design perspective? We could conduct user testing, but it's a shoestring startup, so the costs of mocking up 3 complete UI's is prohibitive at this point.
I'm confused what the question is, should we suggest the UI layout? Or are you looking for ideas on how to prototype / play around with a look / solution?
I use Balsamic Mockups for all my UI designs, spend some time laying it out, and it is a great way to visualize what you want, and it adds a level of interactivity to it as well.
Hopefully thats somewhat along the lines of what you were asking ;).
Otherwise I would go with something like you mentioned above:
Show a summary / dashboard page showing say last 10 of the last messages (voice / email / fax).
Show # of new items per service, and go from there.
As I understand, your problem is that you can't show anything useful for Fax or voicemail?
Still, what would be gained by separate inboxes? If you want to unify these three services, separating by type is what you don't do. The most important search / access vectors are WHO and WHEN.
(There is of course the need to search for "the fax from Mr. Lyle", so filtering by type should be possible. But it's not the fundamental access filter)
My suggestions (I understand that some of these might be complicated):
Single inbox. Icon for type.
If possible, try using "natural times" such as "a few minutes ago", "yesterday, 12:31" (if you use it for minutes, you may need to do that ajaxy thing to refresh them).
e-mail: Include the title of the e-mail / text message. If you can, Add line of text - fill up from the body, omitting line breaks, untuil you reach a certain character count or the width of the panel.
Fax: it might help to show # of pages (not sure if this is possible) and mouseover for thumbnail. The first deals with people failing to send all pages at once, the second with people inserting them the wrong way around.
Audio: Allow to play right from the inbox. Duration might be helpful to filter out "oh, it's voice mail, I'll hang up" calls, it's also a good preview on how much time I need to "read" this message.
Don't add irrelevant data just because it's shared between the two (e.g. size).
Sort by time received (or time sent if available?).
If there are many unread messages in the inbox, and there are multiple messages from the same sender without other messages inbetween, you can collapse them (e.g. only show the first two of the sequence, and a "more messages from Joanna..." link. This helps against important single message drowned by communicators gone wild.
An option would be to group by sender, at least for selected Senders, so that it reads
From Joanna
|V| 5 min ago Hey again Joe, just ust wanted to say....
o<| 5 min ago Hello, it's me! Hmm it seems oyu areally are on a business..
72 new since yesterday ( |=| 5 o<| 52 |V| 15)
From Mr Lyle
|=| yesterday, 12:12 7 pages [show]
Other
|V| 1h ago gunk1243#523.com Cheap Torpedoes Your best source of cheap, ...
|V| 1h ago gunk563#523.com Torpedoes Mania ON SALE! GU 537! sinks any ...
12 new since yesterday ( |V| 12)
Mr. Lyle doesn't have an abstract since there is only one new message. Clicking an abstract would expand that list, clicking a user would show you messages (including old ones) only from this user.
Phew. Hope that helps.
We have a service where we literally give away free money.
Naturally said service is ripe for abuse. To defend against this we do the following:
log ip address
use unique email addresses (only 1 acct/email addy)
collect more info like st. address, phone number, etc.
use signup captcha
BHOs (I've seen poker rooms use these)
Now, let's get real here -- NONE of this will stop a determined user.
Obviously ip addresses can be changed via a proxy (which could be blacklisted via akismet) but change anyways if the user has a dynamic ip or if more than one user is behind a NAT'd network (can we say almost everyone?)
I can sign up for thousands of unique email addresses each hour -- this is no defense.
I can put in fake information taken from lists for street addresses and phone numbers.
I can buy captchas from captcha solving services (1k for $5).
bhos seem only effective for downloadable software -- this is a website
What are some other ways to prevent multiple users from abusing the service? How do all the PPC people control click fraud?
I know we could actually call the person but I don't think we are trying to do that anytime soon.
Thanks,
It's pretty difficult to generate lots of fake phone numbers that can send and receive SMS messages. SMS verification could go a long way towards cutting down on fraud. Of course, it also limits you to giving away free money to cell phone owners.
I think only way is to bind your users accounts to 'real world' information, like his/her passport number, for instance. Of course, you'll need to make sure that information is securely stored and to find some way to validate it.
Re: signing up for new email accounts...
A user doesn't even need to do that. Please feel free to send your mail to brian_s#mailinator.com, or feydr.asks.a.question#spamherelots.com, or stackoverflow#safetymail.info, or my_arbitrary_username#zippymail.info. I haven't registered any of those email addresses, but all of them will work.
Those domains are owned by ManyBrain, and they (and probably others as well) set the domain to accept any email user. ManyBrain in particular then makes the inboxes for those emails publicly accessible without any registration (stripping everything by text from the email and deleting old mail). Check it out: admin#mailinator.com's email inbox!
Others have mentioned ways to try and keep user identities unique. This is just one more reason to not trust email addresses.
First, I suppose (hope) that you don't literally give away free money but rather give it to use your service or something like that.
That matters as there is a big difference between users trying to just get free money from you they can spend on buying expensive cars vs only spending on your service which would be much more limited.
Obviously many more user will try to fool the system in the former than in the latter case.
Why it matters? Because it is all about the balance between your control vs your user annoyance. I see many answers concentrating on the control part, so let's go through annoyance, shall we?
Log IP address. What if I am the next guy on the computer in say internet shop and the guy before me already used that IP? The other guy left your hot page that I now see but I am screwed because the IP is blocked. Yes, I can go to another computer but it is annoyance and I may have other things to do.
Collecting physical Adresses. For what??? Are you going to visit me? Or start sending me spam letters? Let me guess, more often than not you get addresses with misprints at best and fake ones at worst. In fact, it is much less hassle for me to give you fake address and not dealing with whatever possible spam letters I'll have to recycle in environment-friendly way. :)
Collecting phone numbers. Again, why shall I trust your site? This is the real story. I gave my phone nr to obscure site, then later I started receiving occasional messages full of nonsense like "hit the fly". That I simply deleted. Only later and by accident to discover that I was actually charged 2 euros to receive each of those messages!!! Do I want to get those hassles? Obviously not! So no, buddy, sorry to disappoint but I will not give your site my phone number unless your company is called Facebook or Google. :)
Use signup captcha. I love that :). So what are we trying to achieve here? Will the user who is determined to abuse your service, have problems to type in a couple of captchas? I doubt it. But what about the "good user"? Are you aware how annoying captchas are for many users??? What about users with impaired vision? But even without it, most captchas are so bad that they make you feel like you have impaired vision! The best advice I can give - if you care about user experience, avoid captchas as plague! If you have any doubts, do your online research first!
See here more discussion about control vs annoyance and here some more thoughts about being user-friendly.
You have to bind their information to something that is 'real world', as Rubens says. Of course, you also need to be able to verify this information (I can just make up passport numbers all day if you don't check to make sure they're correct).
How do you deliver the money? Perhaps you can index this off the paypal account, mailing address, or whatever you're sending the money to?
Sometimes the only way to prevent people abusing a system is to not have the system in the first place.
If you're doing what you say you're doing, "giving away money to people", then surprise surprise, there will be tons of people with more time available to try to find ways to game the system than you will have to fix it.
I guess it will never be possible to have an identification system which identifies fake identities that is:
cheap to run (I think it's called "operational cost"?)
cheap to implement (ideally one time cost - how do you call that?)
has no Type-I/Type-II errors
is scalable
But I think you could prevent users from having too many (to say a quite random number: more than 50) accounts.
You might combine the following approaches:
IP address: can be bypassed with VPN
CAPTCHA: can be bypassed with human farms (see this article, for example - although they claim that their test can't be that easily passed to other humans, I doubt this is true)
Ability-based identification: can be faked when you know what is stored and how exactly the identification works by randomly (but with a given distribution) acting (example: brainauth.com)
Real-world interaction: Although this might be the best one, but I guess it is expensive and not many users will accept it. Also, for some users/countries it might not be possible. (example: Postident in Germany, where the Post wants to see your identity card. I guess this can only be faced in massive scale by the government.)
Other sites/resources: This basically transforms the problem for other sites. You can use services, where it is not allowed/uncommon/expensive to have much more than 1 account
Email
Phone number: e.g. by using SMS, see Multi-factor authentication
Bank account: PayPal; transfer not much money or ask them to transfer a random (small) amount to you (which you will send back).
Social based
When you take the social graph (vertices are people, edges are connections), you will expect some distribution. You know that you are a single human and you know some other people. So you have a "network of trust" (in quotes, because I think this might be used in other context as well). Now you might not trust people / networks how interact heavily with your service, but are either isolated (no connection) or who connect a large group with another large group ("articulation points"). You also might not trust fast growing, heavily interacting new, isolated graphs.
When a user provides content that is liked by many other users (who you trust), this might be an indicator that there is a real human creating it.
We had a similar issue recently on our website, it is really a hassle to solve this issue if you are providing a business over one time or monthly recurring free credits system.
We are using a fraud detection solution https://fraudradar.io for a while and that helped us a lot to clean out most of the spam activities. It is pretty customizable with:
IP checks
Email domain validity
Regex rules
Whitelisting options per IP, email domain etc.
Simple API to communicate through
I would suggest to check that out.
We have a customer who wants to go through their CRM database and somehow determine phone numbers which are valid, without actually having someone sit there and try calling them all.
Is there any way to do something akin to a "ping" on a phone number (including landlines)?
You will need to go through a third party. I have used Melissa data for address verification with good success, they also offer phone verification, but I have not used it
http://www.melissadata.com/listservices/resphoneverify.htm
If getting a 100% correct phone number is crucial, I'd look into a service which would actually call the number, give a verification code and make the user confirm that code with the site. It is a PIA from the users perspective, but that is the most complete route you can take. Doing a quick little googling came up with this site, http://www.phoneconfirm.com which seems to do what I mentioned. I am sure there are others though.
If you can't/don't want to go through a third party, I can't imagine writing something like this yourself would be impossible. Scaling it would be the biggest issue.
could always go with the good ole war dialer
I believe a CTI system using ISDN calling based service can quickly return a status code that the number is either valid/invalid before the destination begins to ring.
One vendor is Katalina systems, their product is called VoiceGuide and they have a dialling out module that may give you what you want. see www.voiceguide.com.
Just export the calling list to the dialler (csv file) and review the call status after processing.
If the list is very large, it may justify purchasing a system to do this. The rate of calling depends upon the number of lines installed/availble. You might require some custom modifications to abort the call after obtaining the status. Katalina should be able to help. I am not sure if VoIP trunks can give you full access to the line status.
I once did something like that. Yeah, for telemarketers. And yeah, it haunts my conscience to this day.
It was based on a module called app_amd.c (Answering Machine Detection) which was a third party add-on for Asterisk and, AFAIK, can be found in their main tree now. With an E1/T1, you can also distinguish between bad numbers, busy, and many other status codes. Look that up, it may help.