Web displays: Paging vs. long tables - html

It seems that the trend in web design is to provide paged output, where long tables are displayed a page at a time. My customers don't like that, and have requested that the web sites I design for them show all entries in long tables. The arguments for paging seem to be mostly based on the performance hit of displaying long tables, and this is less of a concern in a high-bandwidth corporate intranet. Arguments against paging include the ability to print the entire table, do string searches against the entire table, select arbitrary ranges from the entire table for copying, etc. I've pointed out that these features can easily be added to paged web designs (e.g. a print button that prints the entire table, or a button that creates a CSV file of the the table), but the paged output still seems inconvenient to them. Our typical table is about 100 to 600 items. Obviously tables that would be significantly larger would probably have to be paged.
Questions:
What is your experience with personal or customer preferences for paged vs. full output in long tables?
Web design tools seem to be pushing the paging paradigm. Are they out of touch, or are my customers unusual?
If you're thinking "It depends on the length of the table", what threshold would you use?

I love long one-page listings.
One of the few reasons I can see for paged
listing is the ones you point out about performance.
I think your customers are very usual and in-touch.
The threshold would be about page loading times. When the server can't produce the full lists fast enough or when the lists gets so long that the browser slows down. (The latter can happen for quite short lists if you have non-a-tag hover stuff in your CSS and the browser is IE.)
Give the users a powerful search function and they'll narrow down their page lists themselves.

Why not simply have it be a user configurable option. It sounds like you plan to essentially implement both anyway.
To be honest I think that no matter which you choose someone will complain. At least with it being user configurable you have the ability to put it back on the user.

Provide a default page length, and a configurable parameter (e.g. in the query string for programmatic use, and/or a form on the webpage for interactive use) to control how many listings are in a page.
User flexibility is good. Texas Instruments has a parametric search tool for electrical engineers to find ICs that meet certain technical characteristics, and they include a link both to "show all" in a webpage and "download all" as a .csv file. That's a good model, kudos to TI. Ditto to flickr; their API lets you control (to a large extent) how many results show up on a web service call.
I personally HATE websites that default to 10 listings per page with no way to increase it. It takes FOREVER to browse them, & I'm willing to wait longer if I can get all the stuff at once.
If it's an interactive webpage, I would consider going to an AJAX solution that downloads 100 at a time so there's an indication of progress (and the user can stop it if there are 20000 results).
I agree with PEZ, it's all about responsiveness.

Best solution: Don't provide lists with more than 100 items.
Usually your user doesn't want to read more than 100 or even 600 items. They just don't care. They are searching for one (or possibly a few). Make sure that there's a way for them to get to those items without visual-grep-ing through the list.
And if your client insists on displaying all items, then provide paging with a configurable page size and let him enter "100000 items per page" if he wants to.

One of the seminal books on web design (sorry, I forget which one) used to say not to count on your users scrolling down because most of them don't know how or can't be bothered. I think a more recent update says that while is is true for the general public, certain sectors of more technical users can be expected to scroll down and you can make pages that require scrolling IFF (if and only iff) you know your users can handle it.

I can understand your situation extremely well. I have been in similar situation. I moved a business workflow from being man managed to an automated one. Initially it was carried out using excel spreadsheets. The stakeholders for my software were in the age group of 55+ they dont like anything ajaxy or any of the UI patterns you are talking about. It such cases data retreival logic can be optimized. Any table that touches the 1K mark or has item like image blobs or things like that should be shown in parts from a performance point of view.
long outputs slow rendering and will be performance leech
Customers dont want to changes most times and customer is always right unless u can convince them.
I have put forth my threshhold but it also depends on the content of the rows.
Happy Coding!

Related

Search HTML Tables on Multiple Pages

Hello Stack Overflow Community!
I am making a directory of many thousand custom mods for a game using HTML tables. When I started this project, I thought one HTML page would be slow, but adequate for the ~4k files I was expecting. As I progressed, I realized there are tens of thousands of files I need to have in these tables, and let the user search though to find what they are missing to load up a new scenario. Each entry has about 20 text entries and a small image (~3KB). I only need to be able to search through one column.
I'm thinking of dividing the tables across several pages on my website to help loading speeds and improve overall organization. But then a user would have to navigate to each page, and perform a search there. This could take a while and be very cumbersome.
I'm not great at website programming. Can someone advise a way to allow the user to search through several web pages and tables from one location? Ideally this would jump to the location in the table on the new webpage, or maybe highlight the entry like the browser's search function does.
You can see my current setup here : https://www.loco-dat-directory.site/
Hopefully someone can point me in the right direction, as I'm quite confused now :-)
This would be my steps,
Copy all my info into an excel spredsheet, then convert that to json, then make that an array for javascript (myarray), then can make an input field, and on click an if statement if input == myarray[0].propertyName
if you want something more than an exact match, you'd need https://lodash.com/
in your project.
Hacky Solution
There is a browser tool, called TableCapture, to capture data from html tables and load into excel/spreadsheets - where you are basically deferring to spreadsheet software to manage the searching.
You would have to see if:
This type of tool would solve your problem - maybe you can pull each HTML page's contents manually, then merge these pages into a document with multiple "sheets", and then let people download the "spreadsheet" from your website.
If you do not take on the labor above and just tell other people to do it, then you'd have to see if you can teach the people how to perform the search and do this method on their own. eg. "download this plugin, use it on these pages, search"
Why your question is difficult to answer
The reason why it will be hard for people to answer you in stackoverflow.com (usually code solutions) is that you need a more complicated solution (in my opinion) than hard coded tables and html/css/javascript.
This type of situation is exactly why people use databases and APIs to accept requests ("term": "something") for information and deliver responses ( "results": [...] ).
Thank you everyone for your great advice. I wasn't aware most of these potential solutions existed, and it was good to see how other people were tackling problems of similar scope.
I've decided to go with DataTables for their built-in sorting and filtering : https://datatables.net/
I'm also going to use a javascript array with an input field on the main page to allow users to search for which pack their mod is in. This will lead them to separate pages on my site, each with a unique datatable for a mod pack. Separate pages will load up much quicker than one gigantic page trying to show everything.

If I have a collection of random websites, how do I get specific information from each?

Say I have a collection of websites for accountants, like this:
http://www.johnvanderlyn.com
http://www.rubinassociatespa.com
http://www.taxestaxestaxes.com
http://janus-curran.com
http://ricksarassociates.com
http://www.condoaudits.com
http://www.krco-cpa.com
http://ci.boca-raton.fl.us
What I want to do is crawl each and get the names & emails of the partners. How should I approach this problem, at a high-level?
Assume I know how to actually crawl each site (and all subpages) & parse the HTML elements -- I am using Oga.
What I am struggling with is how to make sense of data that is presented in a wide variety of ways. For instance, the email address for the firm (and or partner) can be found in one of these ways:
On the About Us page, under the name of the partner.
On the About Us page, as a generic catch-all email.
On the Team page, under the name of the partner.
On the Contact Us page, as a generic catch-all email.
On a Partner's page, under the name of the partner.
Or it could be any other way.
One way I was thinking about approaching the email, is just to search for all mailto a tags and filter from there.
The obvious downside for this is that there is no guarantee that the email will be for the partner and not some other employee.
Another issue that is more obvious is detecting the partner(s) names just from the markup. I was initially thinking I could just pull all the header tags and text in them, but I have stumbled across a few sites that have the partner names in span tags.
I know SO is usually for specific programming questions, but I am not sure how to approach this and where to ask this. Is there another StackExchange site that this question is more appropriate for?
Any advice on specific direction you can give me would be great.
I looked at the http://ricksarassociates.com/ website and I cant find any partners at all so in my opinion you better stand to gain from this if not you better look for some other invention.
I have done similar datascraping from time to time, and in norway we have laws - or should I say "laws" - that you are not allowed to email people however you are allowed to email the company - so in a way the same problem from another angle.
I wish I knew maths and algorythms by heart because I am sure there is a fascinating sollution hidden in AI and machine learning, but in my mind the only sollution I can see is building a rule set that over time probably gets quite complex. Maby you could apply some bayesian filtering - it works very well for email.
But - to be a little more productive here. One thing i know is inmportant, you could start by creating the crawler environment and building the dataset. Have the database for URLS so you can add more at any time, and start the crawling on what you have already so that you do your testing querying your own data with a 100% copy. This will save you enormous time instead of live scraping while tweaking.
I did my own search engine some years ago, scraping all NO domains however I needed only the index file that time. Took over a week alone just to scrape it down and I think it was 8GB of data just for that single file, and I had to use several proxyservers aswell to make it work due to problems with to much DNS traffik. Lots of problems that needed being taken care of. I guess I am only saying - if you are crawling a large scale you might aswell start getting the data down if you want to work efficient with the parsing later.
Good luck, and do post if you get a sollution. I do not think it is posible without an algorythm or AI though - people design websites the way they like and they pull templates out of their arse so there are no rules to follow. You will end up with bad data.
Do you have funding for this? If so its simpler. Then you could just crawl each site, and make a profile for each site. You could employ someone cheap to manual go through the parsed data and remove all the errors. This is probably how most people does it, unless someone already have done it and the database is for sale / available from webservice so it can be scraped.
The links you provide are mainly US site, so I guess you are focusing on English names. In that case, instead of parsing from html tags, I would just search the whole webpage for name. (There are free database of first name and last name) This may also work if you are donig this for some other Europe company, but it would be a problem for company from some countries. Take Chinese as an example, while there is a fix set of last name, one may use basically any combination of Chinese character as first name, so this solution won't work for Chinese site.
It is easy to find email from a webpage as there is a fixed format of (username)#(domain name) with no space in between. Again I won't treat it as html tags but just as normal string so that the email can be found no matter it is in mailto tag or in plain text. Then, to determine what email is it:
Only one email in page?
Yes -> catch-all email.
No -> Is name found in that page as well?
No -> catch-all email (can have more than one catch-all email, maybe for different purpose like info + employment)
Yes -> Email should be attached to the name found right before it. It is normal that the name should appear before the email.
Then, it should be safe to assume the name appear first belongs to more important member, e.g. Chairman or partner.
I have done similar scraping for these types of pages, and it varies wildly from site to site. If you are trying to make one crawler to sort of auto find the information, it will be difficult. However, the high level looks something like this.
For each site you check, look for element patterns. Divs will often have labels, ID's, and classes which will easily let you grab information. Perhaps you find that many divs will have a particular class name. Check for this first.
It is often better to grab too much data from a particular page, and boil it down on your side afterwards. You could, perhaps, look for information which comes up on a screen by utilizing type (is link) or regex (is email) to look for formatted text. Names and occupation will be harder to find by this method, but might be related positionally on many pages to other well formatted items.
Names will often be affixed with honorifics (Mrs., Mr., Dr., JD, MD, etc.) You could come up with a bank of those, and check against them for any page you end up on.
Finally, if you really wanted to make this process general purpose, you could do some heuristics to improve your methods based off of expected information; names, for example, are most often within a particular list. If it was worth your time, you could check certain text for whether it matches a list of more common names.
What you mentioned in your initial question seems that you would have a lot of benefit with a general purpose Regular Expressions crawler, and you could make improvements on it as you know more about the sites which you interact with.
There are excellent posts on this topic with a lot of useful links throughout these webpages:
https://www.quora.com/What-is-a-good-web-scraper-for-pulling-emails-names-etc-even-if-the-contact-info-is-another-page-deep-a-browser-add-on-is-a-plus
http://www.hongkiat.com/blog/web-scraping-tools/
http://www.garethjames.net/a-guide-to-web-scraping-tools/
http://www.butleranalytics.com/15-web-scraping-tools/
Some of the examined applications are working in macOS.

How can I summarize the updates to a table on an page I browse?

I am a student at a University. With the placement process going on, we have an internal placement website that shows updates and status about various companies I have applied to. Since the number of companies is too large it becomes cumbersome to scroll through the complete list to find information. Sometimes, I just miss some things. Now, to tackle this problem, here is what I want to do:
The data is in an HTML table. Each row shows information about one company: Some dates, Status(Not/Shortlisted/Applied), Some yes/no options etc. each in a different column. Once I open the page I want to be able to extract information about which companies I got shortlisted in, and in which ones I didn't make it.
What is the right technology to do this ? I am thinking of writing a Greasemonkey user script (I have never actually written any, but how hard could it be ?). What other options do I have?
Edit: I don't quite understand why this question has voted to be closed?
I just displayed a use case for something general: On opening a web page, automatically extracting information from the page and display it to the user. What is the easiest and sufficiently powerful way to achieve this?
Since you can't get access to the website's database, Greasemonkey would be your best automation approach. However, this task is likely to be over before you can get a decent script up from scratch.
Your best practical approach is to save the pages and/or copy and summarize the data in MS Excel, or equivalent.
~~~~~~~~~
Here at SO, We will not develop any but the simplest Greasemonkey scripts for you from scratch (unless they are fun somehow ;) ). But, you can sometimes get such help in the "Script requests forum" at userscripts.org.
In order for someone to help you, they will need:
A clear idea of exactly what data gets manipulated, and how.
Access to the target site. Or access to saved snapshots of the target pages. GM scripts are extremely dependent on the details of the target page.
"other option":
ctrl + F
enter shortlisted
enter
ctrl + G <--repeat last search

How can I improve the subjective speed of my application?

Today my co-worker noticed that when adding a decimal place to a progress indicator leads to the impression that the program is running faster than without. (i.e. instead of 1,2,3... it shows 1, 1.2, 1.4, 1.6, ...) I checked it and I was surprised that I got the same impression even though I knew it was faked.
That makes me wonder: What other things are there to create the impression of a fast application?
Of course the best way is to actually make the application faster, but from an algorithmic point of view often there's not much you can do. Additionally I think making a user less frustrated is a good thing, even though it is more or less a psychologic trick.
This effect can be very dramatic: doing relatively large amounts of work to give users a correct and often updating status of progress can of course slow down the actual running time of the application (screen updates, progress display needed calculations, etc) while still giving the user the feeling it takes less time.
Some of the things you could do in GUIs:
make sure your application remains responsive (resizing the forms remains possible, perhaps give a cancel button for the operation?) while background processing is occurring
be very consistent in showing status messages/hourglass cursors throughout the application
if you have something updating during an operation, make sure it updates often (like the almost ridiculous showing of filenames and registry keys during an install), or make sure there's an option to make it do this for users that like this behavior
Present some intermediate, interesting results first. "We've found 2,359 zetuyls matching your request, we're just calculating their future value".
I've seen transport reservation systems do that sort of thing quite nicely.
Showing details (such as the names of files being copied in an installation process) can often make things seem like they're going faster because there's constant, noticeable activity (as opposed to a slowly-creeping progress bar).
If your algorithm is such that it generates a list of results, and you have some way of displaying results as they're generated (as opposed to all at once at the end), do so - the sooner the user has something else to look at besides a spinner, the better.
Allow the user to do something else, while your application is processing data or waiting for a result. In application-scope you could allow to do some refinement of a search query or collect information for preparing next steps. Or just present some other "work" necessary to do or just some hints, documentation, statistics, entertainment..
Use one of those animated progress bars which look like they are doing something even when they aren't progressing. Also, as peSHIr said - print each filename that you copy and update it really fast - you could even fake it by cycling through a large string array N times a second.
I've read somewhere that if the process seems to be speeding up, it seems to be faster than when it's progressing at a steady pace. I can't find the reference right now, but it should be simple to implement.
(10 minutes later...)
A further look down Google lane unearthed the following references:
http://www.azarask.in/blog/post/hacking-memory/
http://blogs.msdn.com/time/
Here is an article about "Expressing time in your UI" and user perception of time. I do not know if it is exactly what you expect as an answer, but it is definitely worth the read.
Add a thread sleep at critical points. With each passing version, reduce the delay.

Advice on GUI layout for security product

We've developed a security product which identifies certain types of unauthorized traffic on a network. The interface for displaying the messages is a Java Servlet generated page.
At this point, the page is a glorified console log. There is a big text box with lines of text added as warnings and messages are generated. A couple of cool features are the page is updated automatically using reverse ajax (DRW) and the latest messages goes to the top of the display.
Is there a way to make the look cooler? Also, we would like to somehow highlight or otherwise emphasize certain more serious warnings.
Any thoughts are most welcome.
Well, you'll want some type of filtering system. Allow users to create filters to filter out certain messages (to ignore, highlight, etc.).
Advanced searching would be useful as well.
Add mouse interactions by letting users click words and search from there, or something similar.
Just my $.02.
Use established GUI ideas, particularly from AV suites. If you've any way of grading the analyses (from a 'good' state, through 'moderate' risk/danger to 'high risk'), then use some form of colour to denote the grading. Ideally, and dependant on the increments, use something akin to
.all-well {background-color: #0c0; /* not using #0f0 because it's a little too bright, for me */
}
.slight-risk {background-color: #f90;
}
.danger-will-robinson {background-color: #c00; /* again #f00 is just too much for my eyes */
}
Without details or screenshots of your app it's hard to suggest anything that isn't as basic and generic as above, sorry. If you add more info, I'll try to offer better suggestions.
I don’t know about “cool,” but a functional layout depends on the how the users use the information. Here’re some random suggestions, the appropriateness of each depending on your users, tasks, and limitations of the technical environment:
A table of messages, like your console, but with separate fields to allow the user to sort, query, filter, and scan the messages on various criteria (e.g., timestamp, IP address). This will allow users to group messages together in order to see patterns that indicate a single problem.
A table of incidents, where your app intelligently groups related messages into a single incident (e.g., a single intrusion) for the users, making the task much more manageable if there are a zillion messages. Users can click or double-click on an incident to see more details (e.g., lists of related messages).
A diagram of the network, with components highlighted or otherwise graphically coded if they have associated messages (or incidents). This may allow users to see relations among messages/incidents based on network location. Users can also intervene directly by interacting with the network through the diagram.
Whatever the layout, a means to “replay” a time period so the user can see with animation how an incident develops, and trace incidents back in time to their origin.
These options can be combined of course to support different tasks.
For highlighting more serious messages, it’s hard to beat color-coding (hue) for making certain things jump out from the crowd. However, you should redundantly code at least one other graphic attribute for accessibility and B&W printing purposes. I’d suggest brightness (e.g., white, amber, red, for increasing levels of severity), size (especially if you can quantify the seriousness), or number (one to three exclaimation marks with increasing severity). Incorporating this coding with a sortable field so users can sort by severity as well as other fields. See http://www.zuschlogin.com/?p=51 for more.
Since your users appear to be network administrators, I’d focus on the professional-IT-versions of AV suites for other ideas, rather than AV programs for consumers/endusers, who have very different issues and knowledge levels.