Chrome Lighthouse CLS audit providing significantly different results for different users - google-chrome

In preparation for the next Core Web Vitals update, I am working on decreasing CLS across my companies site. When I run the Lighthouse audit in Chrome, the page I've updated gets a 0.05, under the max 0.1 Google recommends.
However, when my companies SEO specialist runs an audit, she gets scores > 0.3 on the same page. I'd thought maybe it could be a speed issue, but throttling my internet speed didn't change my results at all (at least where CLS is concerned). For that matter, the audit on her end is calling out high level parent div elements as causing high CLS, rather than specific elements within the parent div.
Is there any reason why her scores would be so much worse than mine on the same page? Are there any known issues with CLS testing in Lighthouse I should be aware of?

If two different people are getting two different results there are numerous things to check.
The first one is to check that you are both testing at the same display size (for desktop tests - all mobile tests are same size) as otherwise one of you may get CLS at one breakpoint that does not occur at another breakpoint (or items may be on the screen on one device and not another that contribute to CLS).
The next thing to check is that the computers have a similar CPU power. Lighthouse uses a 4x slowdown when running the mobile test, if you have an i3 vs an i9 this can make quite a difference!
There is a useful document that explains where variability can come from when testing with Lighthouse.

Related

CSS media queries: one file vs. separate files and impact on loading speed

For a site I currently use style.css and a bunch of other stylesheets, 960.css, etc., loaded like this:
<link rel=​"stylesheet" media=​"screen" href=​"css/​style.css">
<link rel=​"stylesheet" media=​"only screen and (max-width:​ 960px)​" href=​"​css/​960.css"​>
....
Now I am concerned about speed. I know I could combine the files into one big file, but that would mean downloading also irrelevant data.
So basically, my question is: What is a better approach, minimizing the amount of requests, or minimizing the amount of data passed to one user?
On a reasonable speed link the latency and overhead involved in the extra request will probably outweigh the gains by not downloading a small amount of (hopefully minified and gzipped) text data that is none-the-less not required for that user to display the page at that resolution. See Ilya Grigorik's excellent post on latency for more details on how this is proving to be the primary performance constraint for many users.
The latency cost of the extra data will be especially true for users on mobile devices (which will power save their radios when not in use), and even more so on mobile 2G or 3G connections which have a relatively high cost in establishing connections (4G apparently substantially improves on this).
The key, as with all these things, is to test and measure - but I would almost certainly expect that bundling the styles would prove faster for your users. Don't forget each valid stylesheet (where the media query evaluates to true) will block the rendering of the page.
It is also worth noting that Ilya (who works for Google so should know) cites that WebKit will still download stylesheets media queries that return false, albeit with a low priority and in a non-blocking manner.
if the media query evaluates to false then the stylesheet is marked as NonBlocking and is given a very low download priority
and
The only caveat, as Scott indicates, is that the browser will download all enabled stylesheets, even though the screen on your device may not ever exceed the [cited] width
Looking briefly at the webkit source it does seem like this still happens, presumably to allow instant response to screen rotation or window resizing.
// Load stylesheets that are not needed for the rendering immediately with low priority.
223 ResourceLoadPriority priority = isActive ? ResourceLoadPriorityUnresolved : ResourceLoadPriorityVeryLow;
224 CachedResourceRequest request(ResourceRequest(document().completeURL(url)), charset, priority);
225 request.setInitiator(this);
226 m_cachedSheet = document().cachedResourceLoader()->requestCSSStyleSheet(request);
For questions like this I can highly recommend High Performance Browser Networking which you can read online for free.
It is basically about the performance of your system.
even if you are on mobile devices the best approach would be to minimize the amount of requests because of a (maybe) slow network connection and a (maybe) slow resource handling.
besides your page is on e.g. a cordova context this approach would be the way to go because the resource were installed directly on the device. multiple files => multiple handles => slow (er) performance.
if you want to minimize the amount of data passed to the user - the amount is IMHO the same because the tag will query the css file on the server and will parse/read/download it. IMHO there is no relevant performance issue. what you can do is to generate and "non-redundant" css file. but thats not really petty :)
So basically, my question is: What is a better approach, minimizing the amount of requests, or minimizing the amount of data passed to one user?
I would say both.
Note that the requests will be cached by the browser so for returning visitors your concerns are irrelevant. In general less data = quicker download times. The best approach would be to serve minimal amount of data needed for a user.

HTML localStorage setItem and getItem performance near 5MB limit?

I was building out a little project that made use of HTML localStorage. While I was nowhere close to the 5MB limit for localStorage, I decided to do a stress test anyway.
Essentially, I loaded up data objects into a single localStorage Object until it was just slightly under that limit and must requests to set and get various items.
I then timed the execution of setItem and getItem informally using the javascript Date object and event handlers (bound get and set to buttons in HTML and just clicked =P)
The performance was horrendous, with requests taking between 600ms to 5,000ms, and memory usage coming close to 200mb in the worser of the cases. This was in Google Chrome with a single extension (Google Speed Tracer), on MacOSX.
In Safari, it's basically >4,000ms all the time.
Firefox was a surprise, having pretty much nothing over 150ms.
These were all done with basically an idle state - No YouTube (Flash) getting in the way, not many tabs (nothing but Gmail), and with no applications open other than background process + the Browser. Once a memory-intensive task popped up, localStorage slowed down proportionately as well. FWIW, I'm running a late 2008 Mac -> 2.0Ghz Duo Core with 2GB DDR3 RAM.
===
So the questions:
Has anyone done a benchmarking of sorts against localStorage get and set for various different key and value sizes, and on different browsers?
I'm assuming the large variance in latency and memory usage between Firefox and the rest is a Gecko vs Webkit Issue. I know that the answer can be found by diving into those code bases, but I'd definitely like to know if anyone else can explain relevant details about the implementation of localStorage on these two engines to explain the massive difference in efficiency and latency across browsers?
Unfortunately, I doubt we'll be able to get to solving it, but the closer one can get is at least understanding the limitations of the browser in its current state.
Thanks!
Browser and version becomes a major issue here. The thing is, while there are so-called "Webkit-Based" browsers, they add their own patches as well. Sometimes they make it into the main Webkit repository, sometimes they do not. With regards to versions, browsers are always moving targets, so this benchmark could be completely different if you use a beta or nightly build.
Then there is overall use case. If your use case is not the norm, the issues will not be as apparent, and it's less likely to get noticed and adressed. Even if there are patches, browser vendors have a lot of issues to address, so there a chance it's set for another build (again, nightly builds might produce different results).
Honestly the best course of action would to be to discuss these results on the appropriate browser mailing list / forum if it hasn't been addressed already. People will be more likely to do testing and see if the results match.

What is the best practice to write Selenium-based integration testing from zero for a complex application?

I am after some advice and pointers on integration testing for a web app. Our project has been running for a number of years, and it is reasonably complex. We are pretty well covered with unit tests, but we are missing a decent set of integration tests. We don't have documented use cases or even a reasonable set of test cases beyond our unit tests. 'Integration testing' today consists of the developer's knowledge of the likely impact of a change and manual, ad-hoc testing of the app. It is really not ideal - we now want to design and automate a solid set of tests to allow us to perform regression testing, and increase our confidence in the quality of the app.
We have finally built a platform (based on Selenium) to allow us to quickly author and automate the execution of the tests. The problem now: we don't have any tests, the page is well and truly blank. The system has around 30 classes which interact with each other and influence the UI. For a new user signing up, there are about 40 properties that can be set, with each once impacting the experience. Over the user life time they will generate even more states. Given so many variables and possible states, it is a daunting prospect to get started, which is probably why it has been neglected thus far.
The pain of not having a decent set of tests is now becoming destructive. I am dedicating time to get this problem fixed - I am after some practical advice on the authoring of the tests. How do you approach it? Do you have any links I may find useful? How can I stop my mind running away with the seemingly infinite number of states for a user's data? How can I flush out the edge cases which are failing (and our users seeming to be finding)?
If it is the sheer amount of combinations that is holding you back in trying to generate testcases, you should definitly take a look at all-pair testing.
We have used PICT from microsoft as a tool to successfully minimize the amount of testcases while still being reasonable confident to have most cases covered.
the reasoning behind all-pairs testing
is this: the simplest bugs in a
program are generally triggered by a
single input parameter. The next
simplest category of bugs consists of
those dependent on interactions
between pairs of parameters, which can
be caught with all-pairs testing.1
Bugs involving interactions between
three or more parameters are
progressively less common2, whilst
at the same time being progressively
more expensive to find by exhaustive
testing, which has as its limit the
exhaustive testing of all possible
inputs.

Website tactics: A question about how many SQL queries are enough

I've recently begun to unveil and slowly roll out a homemade CMS. The site allows a lot of customization with movement towards internationalization and customization onto a level that doesn't require source code. This is a personal project, and the entire intent was to see how far I can push my own programming limits (the question of distrubtion of a CMS that handles blog, webcomic, and a small forum isn't one that I'm willing to consider, not until I clean it up and work on it some more--as well, seeing as it's an amateur project, I doubt it has any gravity compared to other, more refined projects... but those are not topics that concern the topic at hand.)
I've instituted a series of code that allows me to see how fast each page is generated and how many queries are ran; on average, I'm seeing 9-13, upwards to 12 MySQL queries performed per page. Average time to generate a page is somewhere between 10-20 ms. Now, not having any experience with professional design, what is the optimum that I should be striving for?
What are ways to reduce generation time (or, with an average of 15 ms/page, is this not even a concern), or tactics on reducing the number of a queries on a page where most of the content is loaded FROM an MySQL database, including things like menu items.
Mind you, this is a very broad question; it isn't my intent to ask a general question or spark conversation, but to find out ways of reducing the load (if any) on a server that such a system could create.
Using a PHP opcode cache will dramatically cut down on the time taken to open and compile PHP scripts, by skipping the parsing and compilation into bytecode.
Turning on the MySQL query cache is generally (though not always) a good idea.
Rather than focusing on the number of queries, focus on reducing the time those queries take by optimising your queries. It is often much more efficient to have a larger number of small, optimised queries than to try and reduce the number of queries.
Use a profiler such as the one built in to XDebug. Together with an interpreter like KCacheGrind or WinCacheGrind, optimising code really helps when you know what to focus on. It's not worth optimising something that contributes only a negligible amount to your total execution time. It's worth getting to know what everything in *CacheGrind means.
My PHP content management system usually loads a page in about the same amount of time (down to minimum 8ms where everything is a cache hit). But very occasionally, when you do something complex it may take over 500ms. When concerned about user experience the typical time is more important, not so much the outliers, but when concerned about server load the average time is more important, so those 500ms outliers are suddenly quite important.
If you are mainly developing these sites for small companies or other reasons where you don't predict or imagine a high traffic (i.e. nothing like Digg/Facebook/etc) then an average of 15ms should be fine.
May I ask what the 12 queries are for? I imagine they are for getting menus items, getting page content and the like. There are various methods of combining/optimising queries, so if you perhaps post a few I (and other stackers) may be able to help you optimise your queries.
It depends...as it always does with performance questions, if the system currently meets your performance requirement then don't worry too much.
Generally if your page generation time is 15ms it will only be a fraction of the total click to glass time that the user experiences, see the yahoo exception performance pages There will be other things to look at in order to get the fastest possible page load time.
On the server, the chances are that the db is going to be caching the results to nearly all if not all of the queries you are running, hence the very fast page timing. You might want to load up a larger set of data if you haven't already done so to test the app, you might find a performance will degrade with the size of the data set.

How to prioritize bugs?

In my current company there isn't clear understanding between the test and development teams as to how severe a bug should be? There are arguments which go back and forth to reduce or to increase the severity. We are not as of now aware of any documents which lays the rules. The tester raises the bug and assigns priority based on his intuition. The developer would request a change based on his load or some other factor.
How are severity/priority of bugs classified? Are there any standards which guide how software defect priorities needs to be determined based on customer needs, time lines and other things?
Use priority levels that deliberately have nothing to do with severity or impact, and describe only the conceptual position of the bug in the schedule. This field will determine which bugs get worked on, so it will be a target for negotiation.
Use severity levels that deliberately have concrete, verifiable definitions not open to negotiation, that have nothing to do with scheduling or priority. I've worked successfully with the severity definitions used by the Debian BTS, generalised to apply to programming projects in general.
That way, the severity is much more a matter of verifiable fact, independent of a statement of priority. The priority is then free to be tweaked up and down by negotiation or whatever, without affecting the factual information in the severity field.
Attempting to conflate both “severity” and “priority” into a single field will lead to soul-draining arguments and wasted time. The bug reporter needs a firm guide of fact to determine how “bad” the bug is, and this needs to be easily agreed on by independent parties. The priority, on the other hand, is the correct target for negotiation and scheduling games.
I work on emergency control centre systems, so this set of bug levels is a little, well... extreme:
someone dies
total system failure requiring DR invocation
server failure requiring engineer response
failure involving loss of call continuity
failure involving loss of data
incorrect data recorded
application failure - non-recoverable
application failure - non-recoverable, but automatically restarted
does not meet requirement spec, no workaround
does not meet requirement spec, but has workaround
cosmetic - layout etc.
actually a feature request
That's off the top of my head. In case you were wondering, it's from most extreme to least :-)
Some stuff we used before. We split the defect rating into priority and severity.
Severity (set by submitter during submission of defect)
Highest (5): Data loss, hardware damage possible, or a security-related failure
High (4): Loss of functionality without any reasonable workaround
Medium (3): Loss of functionality with a reasonable workaround
Low (2): Partial loss of a function or a feature set (feature still hits the design requirements)
Lowest (1): A cosmetic error
Priority (adjusted by development, management and QA during defect evaluation)
Highest (5): The system is practically unusable with this defect.
High (4): The defect will have a serious impact on the company’s ability to sell and maintain this system.
Medium (3): The company will lose some money if this defect is in the system, but it might be more important to meet the schedule. Fix after release.
Low (2): Do not delay the release, but do fix this problem afterwards.
Lowest (1): Fix as time and resources allow.
Both numbers together create a risk priority number (RPN). Simply multiply severity with priority. Higher result means higher risk. 25 defines the ultimate defect bomb. 1 can be done during idle time or if someone is bored and needs something to do.
First goal: Defects with a rating of highest or high of any kind should be fixed before release.
Second goal: Defects with RPN > 8 should be fixed before releasing the product.
This is of course a little bit artificial but helps to give all parties (Support, QA/Test, Engineering, and Product Managers) a tool to set priorities without blowing away the opinion of other side.
Replace your bug tracking system with fogbugz and get rid of severity field altogether.
See Priority vs Severity
"Been there done that".
I've had this discussion over and over again, on different projects. We've tried to combine priority with severity, but the lesson I've learned: do not combine severity with priority !
We've had a lot of brainstorms and meetings which ended with the words "this is it". Multiple guideline-documents have been created and spread between the different "parties", but after a while we discovered that it didn't work at the end. Different "parties" think different about bugs: our helpdesk has another understanding of priority than the development team or the sales has.
Having both a severity and a priority level will very quickly become very confusion because:
when using numbers (between 1 to 5) one will not know what each number means
what if an issue has the highest possible priority, but the lowest possible severity - and I'm sure that this will happen!
what if someone reduces a severity, does he need to reduce the priority also?
"So what should you do then?":
Only use one kind of indicator for the 'level' of an issue: Doesn't matter how you call it.
Use numbers (eg 1 - 5, but could be more or less depending on your needs) to clearly indicate the importance but combine it with a keyword so that it's clear what it means (eg. 'nice to have', 'show stopper'). For some people prio 1 means the most import, for others 5 does -> therefore a keyword to indicate what a number means is necessary.
Make a distinction between a 'normal issue' or a 'red alert'. In our case a 'Red Alert' must be solved immediately and put in production immediately. A normal issue will follow the normal development-test-deployment-flow. The priority/severity/however-how-you-call-it should only be set for normal issues and will be ignored for 'red alerts'.
*> In practice, a 'Red Alert' can become a
'Normal Issue': the support team
discovered a major bug and created a
'Red Alert'. But after some
investigation we discovered that data
had become 'corrupt' in the database
since it was inserted there directly
and not via the application.*
Choose a good tool that allows you to customize the flow; but most tools do.
As for a standard, IEEE guide to classification for software anomalies although I am not sure how widely this is adopted. IEEE 1044.1-1995
One option is to have the product owner determine the priority of the bug. While there is some general intuition on how "bad" a bug is, it can be the responsibility of the owner of the product to set an order of precidence (i.e. bug A should be fixed before bug B etc...).
The more information (clear and concise) that can be provided to the product owner can assist that individual make those determinations (i.e. how many users have experienced the bug, what features are not available as a result of the bug, etc...)
Must be done now
Must be done before we ship
Minor annoyance (Doesn't prevent the user from exercising the functionality)
Edge case/Remote/Tester-from-Mordor scenario
Well I just made that up... my point being categorizing bugs should not be a weekly hour+ long ritual..
IMHO, prioritizing acc to a flowchart is wasted time. Fix bugs in Cat#1 and #2 - as quickly as they surface. If you find yourself swamped by bugs, slow down and reflect. Defer Cat#3 and Cat#4 if the schedule doesn't permit or higher priority items override.
The critical thing is that all of you have a shared understanding of this severity and expected quality. Don't let compliance to the holy standards of X slow you down from delivering what the customer wants... working software.
Personally I favour the two tier severity/priority model. I know the arguments for a single level but the places I've worked generally I've just seen a two level heirarchy work better
Severity is set by the support team (based on input from the client). Priority is set by the client (with input from the support team).
For severity I use:
1 - Blocker/show stopped
2 - Major functionality unavailable (or effectively unavailable), no practical work around possible
3 - Major functionality unavailable (or ...), work around possible
4 - Minor functionality unavailable (or effectively unavailable), no work around possible
5 - Minor functionality unavailable (or ...), work around possible
6 - Cosmetic or other trivial
Then for priority I just use High, Medium, Low but anything from 3 - 5 levels works (much more than that is just over the top).
I'd generally then order by Priority first and then severity within that. The important thing about this is that the client has the most important say. If they say the way their logo is printing out on a report is the highest priority then that's what gets looked at BUT it gets looked at after the other client's high priority which is stopping them logging in.
Generally speaking I wouldn't release with any high priority issues or any medium priority issues with severity 1 - 4. Obviously in an ideal world you'd fix everything but I've never been lucky enough to have that option.
The tester tells what is broken
The developer estimates how much work it will be to fix
The customer decides the business value, i.e. the priority.
Set the requirements of the project so you can base the priority of a fix on the priority of the requirements interfered by the bug.
I had the same issue with one of our customers. In the end we set up a document together describing what kind of bugs would match to a certain severity. Aside from an occasional discussion using this document as a guideline appears to work.
But be well aware that test teams and development teams may have very different opinions on what is a severe bug and what is not. From the point of view of the testers a small layout bug can be high priority when a developer would just say that no one will notice.
In our document those bugs can be high priority if they are "brand damaging", i.e. if the layout bug is in the logo or one of the products then it is severe - if it's just a paragraph on the page that is 2 pixels off then it's not.
I use the following categories both for features and bugs:
Showstopper, the program (or a major feature) will not work
Must have, a significant part of the customers will be bothered by this
Would have, some customers will be bothered
Nice to have, a few customers want this
Normally you plan to fix 1, 2 and 3, but 3 is often postponed to a next release due to time constraints.
I think this is the scale we used at a previous job:
Causes loss of files or system instability.
Crashes the program.
Feature doesn't work.
Feature doesn't work, but there are workarounds.
Cosmetic issue.
Request for enhancement.
Sometimes this was abused - if a feature was so poorly designed that someone couldn't figure out how to use it, that was classified as a 6, and it never got fixed.
I agree with the FogBugz folks that this should be kept super simple: http://fogbugz.stackexchange.com/questions/352/priority-vs-severity
I made up this scheme, which I find easy to remember:
pS: seconds matter, eg, server is on fire
pM: minutes matter, eg, something is broken
pH: hours matter, ie, don't go to bed till this is done
pd: days matter, ie, normal priority
pw: weeks matter, ie, lower priority
pm: months matter, ie, no hurry
py: years matter, ie, maybe/someday, ie, wishlist
It roughly parallels Debian's scheme: http://www.debian.org/Bugs/Developer#severities
I like it because it straightforwardly combines priority and severity into a single field that's easy to set a value for.
PS: You can also pick intermediate urgencies like "pMH" for in between "minutes matter" and "hours matter". Or "pHd" is in between "hours mattter" and "days matter" -- roughly, "don't literally pull an all-nighter for it but don't work on anything else till it's done".