How to take screenshot of rendered HTML page - html

Our web analytics package includes detailed information about user's activity within a page, and we show (click/scroll/interaction) visualizations in an overlay atop the web page. Currently this is an IFrame containing a live rendering of the page.
Since pages change over time, older data no longer corresponds to the current layout of the page. We would like to run a spider to occasionally take snapshots of the pages, allowing us to maintain a record of interactions with various versions of the page.
We have a working implementation of this (Linux), but the snapshot process is a hideous Python/JavaScript/HTML hack which opens a Firefox window, screenshotting and scrolling and merging and saving to a file. This requires us to install the X stack on our normally headless servers, and takes over a minute per page.
We would prefer a headless implementation with performance closer to that of the rendering time in a regular web browser, but haven't found anything.
There's some movement towards building something using Mozilla source as a starting point, but that seems like overkill to me, as well as a maintenance nightmare if we try to keep it up to date.
Suggestions?

An article on Digital Inspiration points towards CutyCapt which is cross-platform and uses the Webkit rendering engine as well as IECapt which uses the present IE rendering engine and requires Windows, natch. Nothing off the top of my head which uses Gecko, Firefox's rendering engine.
I doubt you're going to be able to get away from X, however. Since CutyCapt requires Qt, it requires either X or a Windows installation. And, similarly, IECapt will require Windows (or Wine if you want to try to run it under Linux, and then you're back to needing X). I doubt you'll be able to find a rendering engine which doesn't require Qt, Gtk, GDI, or Cocoa, and therefore requires a full install of display libraries.

Why not store the HTML that is sent out to the client? You could then use that to redisplay in a webbrowser as a page to show what it looked like.
Using your webanalytics data about use actions, you could they use that to default the combo boxes, fields etc to the values the client would have had, even change the CSS on buttons, etc, to mark them as being pushed.
As a benefit, you don't need the X stack, don't need to do any crawling or storing of images.
EDIT (Re Andrew Moore):
This is were you store the current CSS/images under a version number. Place an easily parsable version number in a comment in the HTML. If you change your CSS/images and use the existing names, increment the version number in the HTML output sent out.
The system that stores the HTML will know that it needs to grab a new copy and store under a new number. When redisplaying, it simply uses the version number to determine which CSS/image set to use.
We currently have a system here that uses a very similiar system so we can track users actions and provide better support when they call our help desk, as they can bring up the users session and follow what they did, even some-what live.
you can even code it to auto-censor sensitive fields when it is stored.

depending on the specifics of your needs perhaps you could get away with using one of the many free webpage thumbnail services? snapcasa, for example lets you generate thousands per month / no charge no advertizing .. (not ever used, just googled 'free thumbnail service') to find this.
just a thot

Related

Download complete website via CLI

I'm trying to reduce the size of my website, but to do that I need a reliable tool to measure the size of my pages.
I used to use Google Lighthouse, in the performance audits it reports the size, but it's not precise, and it's inconsistent with the network tab
I tried several combinations of curl, but I can't make it crawl website correctly
I tried several combinations of wget, but it couldn't handle correctly the gzip or brotli encoding
I came to the conclusion that wget or curl are not the right tools, because they don't evaluate JS, so they can't do conditional loading of assets
I'm trying now with puppetter.js and phanotm.js, but I still haven't managed to do it
Does anyone have a good solution for this?
How to Measure Size
Web browsers make a lot of decisions about what to download based on their particular context (for example, what compression algorithms it supports). It's difficult to replicate those conditions in an external tool, such as curl. So you'll want to use a tool that thinks like a browser (or is a browser).
The server can also choose to send different content based on visitor information (user agent, whether they're logged in, geolocation, etc.) or even completely arbitrary conditions (like a randomized image). So you'll want to look at more than one sample, preferably from many user agents and locations.
Most tools don't provide that kind of power.
The closest thing I can suggest is WebPageTest. It uses an actual web browser to visit your site and reports an analysis of that visit, including total page weight (even broken down by different page events). WebPageTest can be used as an API and even run locally. Output is available as JSON, so you can parse and do custom reporting with CLI apps.
How to Speed Up a Website
The technical question of "weighing" a website aside, there's a broader problem you're trying to tackle: how to speed up your website. There is a lot of information available for performance optimization.
Specifically, there's a lot of discussion about what metrics should be considered when evaluating a page's performance, how much weight should be given each metric, and how to use that information to prioritize optimizations.
When considering page weight, I would highly recommend breaking it down by how many bytes are necessary to accomplish certain tasks. Google recommends thinking about resources in terms of the critical rendering path - the HTML, blocking JS, and non-deferred CSS necessary to construct a web page.
You may have a 1MB page where render-critical assets only make up 10KB of the page - that's a very fast site. Or you may have a 1MB page where 500KB are required for an initial render - not so fast. WebPageTest helps break down those weights by event for you.
I wish I could give more technical detail about using WebPageTest with CLI tools. It's something I plan to explore soon. But for now, hopefully this will give you a good start.
Have you tried PageSpeed Insights ?
Analyze you website and read on optimization guidelines.

Can I make CNC Editor with HTML5?

I would like to make my own CNC Editor.
I am looking for some guidance. I don’t know if it is even possible with HTML5. But it would be great if I can. If possible, please list what I should research and learn.
I don’t need it to be online accessible, I will only have it on my computer. I will be accessing it via local network from multiple different computers. I don’t want it accessing the internet, because it’s not always available.
Desired Features:
⁃ Read and Write files with different extensions (all files used are easily opened in notepad)
⁃ Store and retrieve data from a simple database file.
⁃ Make calculations
⁃ Have a text Editor window
⁃ Have a Display area for simple vector graphics depending on data loaded and provided by user.
It is possible but requires a lot of work. I would say that these are technologies you would need to master in order to pull this off:
Node.js (use express.js) - for storing and retrieving files from database and for reading/writing local files with extensions you want (server-side)
Vue.js or Angular.js or React - for building frontend interface to manipulate your vector graphics. It can also do calculations and It's good with svgs and that kind of stuff.
Electron.js (not mandatory) - it wraps it in native-app like experience. This framework actually gives you ability to write desktop apps for any os and arch.
So as I said, It would be a lot of work but its possible in the end.
Funny coincidence is that my brother is planning to build CNC machine so I might be doing this as well in next couple of months. Feel free to contact me if you need any further help!
UPDATE: You cant do it with just HTML5. It would be like trying to make wooden space shuttle.

windows tool to view website client content without browser

Per the title, I am looking for a tool or some sort of initiative that's already been undertaken by other developers to simply grab data off of websites so one can navigate them without looking at them in the browser. I am fully aware of how most pages work so what I would like to do is just look at the data that's being pulled from them per windows technology that's already (hopefully) been written. Does this make sense? Here is an example of what I would like to see in a tool:
a windows interface that gives me data about a webpage (menus, submenus,
button names/captions, etc...
be able to execute transactions on those pages by specifying what to do
through the tool's interface (click button, download image, etc..)
does anyone know of a tool out there to do such things?
The closest "program" that comes to mind is
WWW::Mechanize
Advertised as
Handy web browsing in a Perl object
This can in fact be used on Windows, however you
will need Perl.

How can web pages be coded so that search engines assign a higher page rank to the latest version?

I frequently use Google to search for .NET documentation, and invariably, the highest ranked pages are for old versions of the .NET framework.
For example, I just did a Google search for "c# extern".
The first result was for Visual Studio 2005.
The second result was for Visual Studio .NET 2003.
I went through several pages and never did come across the Visual Studio 2010 page.
Interestingly, I tried the same search on Bing, Microsoft's own search engine, and Visual Studio 2005 was still the first hit. However, the second hit was the one I was looking for (Visual Studio 2010).
I realize that many documentation pages on MSDN have a menu at the top that allows you to switch versions, but I don't think it should be necessary to do this. There should be an HTML way to convince search engines that two pages are very similar, and one is newer/more relevant than the other.
Is there anything that can be done in HTML to force a documentation page for a more recent version to get a higher page rank than an essentially equivalent page for an older version?
You can't tell Google what page is preferred
(That's basically the answer to your question)
If someone googles c# extern, that person will get the most relevant pages calculated by googles algorithms. It will differ from user to user and where you are located, but far most how links all over the internet are directed. You can not change this with on-page optimization.
Canonical addresses mentioned by Wander Nauta is not suppose to be used in this manner. We use canonical addresses basically if we wish to tell Google or any other bot that two or more pages are the same. This is not what you where asking for. It would remove the older versions from indexing entirely in favor of the page addressed as the canonical address.
Quoted from http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Of all these pages with identical content, this page is the most useful. Please prioritize it in search results.
...
The rel="canonical" attribute should be used only to specify the preferred version of many pages with identical content...
To lead the client correct I would use, as you already described, a good web interface on the page so that the client easily can find what he or she is looking for.
Google also offers sitelinks for your search results that may or may not appear. I would say this is where you come closest to be able to direct your clients to the most relevant page by your standards on the search page.
Quoted from https://support.google.com/webmasters/bin/answer.py?hl=en&answer=47334
...sitelinks, are meant to help users navigate your site. Our systems analyze the link structure of your site to find shortcuts that will save users time and allow them to quickly find the information they're looking for.
In Googles Webmaster Tools you have an option where you can optimize thees links, at least somewhat.
Quoted from Googles Webmaster Tools
Sitelinks are automatically generated links that may appear under your site's search results. ...
If you don't want a page to appear as a sitelink, you can demote it.
Update
You could theoretically specify what version something on your page is in with "microdata" or similar. By doing this you have at least told the bots that there are two items on this site with the same name but in different versions. I don't think this will have any effect in witch order your sites will be listed in the search result thought. But we never know what the future holds right?
If you check out schema.org you'll see CreativeWork has an property named "version" and SoftwareApplication has one named "softwareVersion".
Google uses microdata to create rich snippets. I haven't heard that Google uses it for anything else but that does of course not mean it isn't so.
Google allows you to specify a canonical address for a specific resource, i.e. the version of a given page you want Google to prioritize. It's pretty easy to use.
However, hints like these are always suggestions. That is, the search engine is free to ignore them, if they support them at all.
For that you would need to know the actual algorithms. I'm guessing that most search engines do a comparison on how well the page matches the search but then also take into account the amount of hits a site gets. So say you have a 98% match with 1000 hits and a 96% match with 5000 hits. The second page may still be ranked higher.
As for what you can do, search engines are "blind" so use CSS and avoid tables for layout purposes. This will allow the engine to get a better match with your content. For a workaround with old version, you could redirect traffic coming in to the new version and then have a link to the old version on that page. Essentially setting it up so that only following that link takes you to the old page.
Since at it's core Google's search is based on links (pagerank algorithm), it would certainly help if each page of the old version linked to it's respective page on the new version. This might not solve the problem completely, but it would certainly help.

Adding text / input box rendering to Access for a guided user interface experience

For software used in a call centre guiding agents through a set script they must follow while on telephone calls, with the script branching dependant on answers to questions given - My system uses a MS Access / VBA front end (isnt web based due to speed, phone integration), 'call scripting' is coded in VBA when needed, but what if i want move to a more complete solution?
Is hosting a HTML/ms webbrowser control the obvious platform to build call scripting on?
A manager view will also be needed allowing managers to create scripts, divide it into parts, specify routing rules that determines the path through the script, link input boxes (ie question answers) back to database fields, specify validation rules as well.
Thinking about the complexities of building the manager view that translates the intended script into HTML/Javascript, is creating my own simple doc description language with tags for just the features i need and a 'rendering engine' in VBA a solution you might consider for this?
Ive thought about creating scripts out of standard Access controls, using relational table structure to hold the info of what controls relate to what parts of scripts, validation, routing options etc but i think due to Access' lack of runtime control creation this will be more painful than a rendering engine that takes a script written in my own doc desc language and displaying it.
What suggestions have you for the implementation of this?
The actual user interface requirements of the user-facing part of your system seem to be pretty minimal (ask question, get answer, branch to "next" question). I don't think there's any "obvious" platform to use. As always, a browser based system makes it easier for geographically scattered users to use a centralized system but will probably cost more in terms of development.
The manager-facing part of the application is more interesting and for that I'd probably suggest a desktop application rather than a browser based one. I can see this relying on a lot of drag-and-drop and line-drawing type functionality and that kind of stuff is still easier to do on the desktop, at least in my opinion.
Assuming that you can clearly define the kinds of question routing and decision points that a script has, and these decisions are relatively simple, I probably would write your own specification for how a survey is represented. The manager-facing application would create, edit, and save a specification and the user-facing application would read and step through one.
Given my personal skill set, I'd probably write both applications in Delphi, develop an XML format to represent the survey specification, and consider either XML or relational storage for the back end depending on what you actually wind up doing with the data.
I'm inclined to agree that a web interface is optimal for something like this, however WPF is a great alternative as well with many of the advantages of a web interface along with the power of a desktop application. Both web and WPF would give you considerable amount of control over how the application looks and feels, leveraged with all the power of the .NET framework. One drawback of a web app is that you have less ability to interact with the phone system directly, but I'm sure that's a problem that can be mitigated fairly easily with some AJAX. A big plus for the web platform option is that you'll have access to tons of client-side interactivity libraries like jQuery, which will allow you to polish the application with greater ease; with WPF you would likely find yourself paying for a lot of the fancy UI controls.
There is quite a few survey and question (test creating) systems built in access. I don’t see any real issues in using access as opposed to whatever system.
The advantage of HTML or text based systems is they tend to support a more dynamic type of screens.
On the other hand, for questions and display of text, in access a great trick is to place a sub-form control on a form, and then at runtime simply “set” what form is to be displayed in that sub-form (the source object properity). In fact, in access 2010, the nav control does exactly that and it displays forms as a sub-form.
Also for access 2010 we can create web based applications. In this video you can see that the half way mark, I switch to running the application in a standard browser.
http://www.youtube.com/watch?v=AU4mH0jPntI
However, the above means little here, as I not sure what you mean by some type of rendering engine. Each question + response is simply going to be some text on a screen, and thus you can simply display/change that text by changing the underlying reocrdset.
And, if you want nice formatted text with different fonts etc, well access 2007 now has support for rich text (markup text). So I don’t think you really need dynamic screen creating. Between changing the record source to display whatever text you want, and that of being able to display different forms (templates) on the fly by changing the sub-forms “source object”, you can well change part of your screen to display different text boxes with very little code.
On the other hand, if you have all of the .net tools and want to create a browser based application then you are free to do so. I suppose you could also wait for access 2010 to create a browser application.
If you willing to keep this simple, then access is a great choice. If you need a browser based application, then I don't access is the choice here.