In a situation where there's no frames used I see a minimum of 2 documents anywhere I look even when there's no frames in use.
As I write this question I checked StackOverflow - 12 Documents. What constitutes a document here?
Related
When looking at code duplication metrics over a long period of time (>10 years) are there guidelines / best practices for what level of code duplication is "normal" or "recommended"?
I have great difficulty with this question as if the code quality was great then nobody needs to maintain it so who cares? But, in general terms, are there references on "normal". Say for 10 lines of duplication threshold.
Is a, say X%, duplication unusual or normal? If normal does that mean that there are healthy profitable projects our there with this level of duplication.
Perhaps the answer is if there is a study that includes code duplication as one of the metrics against success / average / failure? Perhaps people can share their success experience in maintenance costs for a level of code duplication?
In my opinion there is no general answer to this.
You should inspect every finding of the tool and decide if it is a false positive or justified (may be some standard coding pattern or generated code parts) or does contain problematic copy pasted code that should be refactored and extracted into a own function/module/whatever.
In case of false positive or intended code there should be a tool specific way to suppress the warning on this occurrence or on a general base on a specific pattern.
So in future runs, you always should get warnings if new duplicates are found (or older unfixed true positives).
May be on false positives consider filing a bug report to the author (crafting the smallest possible code-configuration that still throws that warning).
I apologize for the slightly obnoxious nature of my question.
A few years ago, I came across a game that could be played on Wikipedia. The goal is to start from a random page and try to get to the 'Adolf Hitler' page by utilizing internal wikilinks within 5 clicks(6 degrees of separation). I found this to be an interesting take on the Small-world experiment and tried it a few times, and was able to reach the target article within 5-6 clicks almost every time (however, there was no way for me to know if that was the shortest path or not).
As a project, I want to find out the degree of separation between a few(maybe hundreds, or thousands, if feasible) of random Wikipedia pages and Adolf Hitler's page in order to create a Histogram of sorts. My intention is to do an exhaustive search in a DFS manner from the root page (restricting the 'depth' of the search to 10, in order to ensure that the search terminates in case it has selected a bad path, or is running in cycles). So, the program would visit every article reachable within 10 clicks of the root article, and find the shortest way in which the target article is reachable.
I realise that the algorithm described above would certainly take too long to run. I have ideas for optimizations which I will play around with.
perhaps I will use a single source shortest path BFS based approach which seems to be more feasible considering that the degree of the graph would be quite high(mentioned later)
I will not mention all ideas for the algorithms here as they are not relevant to the question as in any possible implementation, I will have to query (directly by downloading relevant tables on my machine, or through the API) the:
pagelinks table which contains information about all internal links in all pages in the form of the 'id' of page containing the wikilink and the 'title' of the page that is being linked
page table which contains relevant information for mapping the page 'title' to 'id'. This mapping is needed because of the way data is stored in the pagelinks table.
Before I knew Wikipedia's schema, naturally, I started exploring the Wikipedia API and quickly found out that the following API query returns the list of all internal links on a given page, 500 at a time:
https://en.wikipedia.org/w/api.php?action=query&format=jsonfm&prop=links&titles=Mahatma%20Gandhi&pllimit=max
Runnig this on MediaWiki's API sandbox a few times gives a request time of about 30ms for 500 link results returned. This is not ideal as even 100 queries of this nature would end up taking 3 seconds, which means I would certainly have to reduce the scope of my search somehow.
On the other hand, I could download the relevant SQL Dumps for the two tables here and use them with MySQL. English Wikipedia has aroud 5.5 million articles (which can be considered as vertices of graph). The compressed sizes of the tables are: around 5GB for the pagelinks table and 1GB for the page table. A single tuple of the page table is a lot bigger than that of the pagelinks table, however (max size of around 630 and 270 bytes, by my estimate). Sorry I could not provide the number of tuples in each table, as I haven't yet downloaded and imported the database.
Downloading the database seems appealing because: since I have the entire list of pages in the page table, I could resort to a single pair shortest path BFS approach from Adolf Hitler's page by tracking all the internal backlinks. This would end up finding the degree of seperation of every page in the database. I would also imagine that eliminating the bottleneck (internet connection) would speed up the process.
Also, I would not be overusing the API.
However, I'm not sure that my Desktop would be able to perform even at par with the API considering the size of the database.
What would be a better approach to pursue?
More generally, what kind of workload requires the use of an offline copy of the database rather than just the API?
While we are on the topic, in your experience, is such a problem even feasible without the use of supercomputing, or perhaps a server to run the SQL queries?
I am building an application which allows users to post content. This content can then be commented on.
Assume the following:
A document for the content is between 200KB and 3MB inside depending
on the text content.
Each comment is between 10KB and 100KB in size.
There could be 1 comment, or 1000. There are no limits.
My question is, when storing the content, should the individual comments be stored inside the same document or should they be broken up?
I'd certainly keep the post content and the comments separate, assuming there will be parts of the application where the posts would be previewed/used without comments.
For the comments themselves (assuming they are separate) I'd say many smaller ones would typically be better, if only due to the consideration that there exists a post with 100s or 1000s of comments, you're not going to want to use them all immediately, so it makes sense to only get the documents as they are required (to be displayed), rather than loading MBs of comments to only use <100KB of what has been gotten at the time.
You might not require comments to be stored individually, but providing your keys are logical and follow a predictable scheme, I see no reason not to store them individually. If you would want some grouping, however, I'd probably avoid grouping more comments per document than would be typical for your usage.
If you are using views there perhaps may be some benefit to grouping more comments into single documents, but even then that would depend on your use case.
There would be no software bottlenecks from Couchbase for storing lots of documents individually, and the only constraints would be hardware constraints that would be very similar; whether you had lots of small documents or fewer large documents.
I would go with a document for each answer as Couchbase is brilliant in organizing what to keep in memory and what not. Usually an application isn't displaying all comments when there is more than 25 (I think that's a number most applications use, display them in chunks of 25 with newer at the top eg). So if the latest 25 comments are kept in memory while older ones are subsequently automatically written to disk you keep an ideal level between how you use your memory (always critical with Couchbase) and access time for you application. Perfect balance I'd say, I had a similar decision to make in my application and it worked perfectly.
As mentioned in title, I want to find out if there is really some performance overhead when using a lot of ID tags in html.
I know the difference between CLASSes and IDs, but I'am not sure about their performance. As far as I know, IDs have specific functionality both for JS and Browser. Browser stores them in it's memory somewhere, and JS uses it to get it from Browser's memory to access them much faster than traversing the whole source code in searching of a specific CLASS.
So, if I don't need to access the ID's with JS or anything else, will be it reasonable to use them in HTML markup?
The simple answer is yes but usually not much unless there are many
hundreds or thousands.
The detailed answer to this question (as stated) is "it depends".
It depends on:
Your definition of 'a lot'.
Some folks would consider 100 a lot, others a 1000, others 10,000
Which browser being used and the version of the browser.
The machine being used, how fast the cpu is, etc.
The OS being used.
The internet (regional/local) speed at that time to download all the div tags.
Where the div's are and if any of the page can load without them.
In conclusion:- given we're talking about web apps and many differences based on user clients, keep the number of div's low if possible.
I have never noticed a degradation of performance as the number of id's are used in HTML tags.
The thing that would degrade performance in my opinion would be the use of client side scripting to manipulate the HTML the ID's are assigned to in mass.
This is my opinion based on experience. No research completed on the subject.
I was building out a little project that made use of HTML localStorage. While I was nowhere close to the 5MB limit for localStorage, I decided to do a stress test anyway.
Essentially, I loaded up data objects into a single localStorage Object until it was just slightly under that limit and must requests to set and get various items.
I then timed the execution of setItem and getItem informally using the javascript Date object and event handlers (bound get and set to buttons in HTML and just clicked =P)
The performance was horrendous, with requests taking between 600ms to 5,000ms, and memory usage coming close to 200mb in the worser of the cases. This was in Google Chrome with a single extension (Google Speed Tracer), on MacOSX.
In Safari, it's basically >4,000ms all the time.
Firefox was a surprise, having pretty much nothing over 150ms.
These were all done with basically an idle state - No YouTube (Flash) getting in the way, not many tabs (nothing but Gmail), and with no applications open other than background process + the Browser. Once a memory-intensive task popped up, localStorage slowed down proportionately as well. FWIW, I'm running a late 2008 Mac -> 2.0Ghz Duo Core with 2GB DDR3 RAM.
===
So the questions:
Has anyone done a benchmarking of sorts against localStorage get and set for various different key and value sizes, and on different browsers?
I'm assuming the large variance in latency and memory usage between Firefox and the rest is a Gecko vs Webkit Issue. I know that the answer can be found by diving into those code bases, but I'd definitely like to know if anyone else can explain relevant details about the implementation of localStorage on these two engines to explain the massive difference in efficiency and latency across browsers?
Unfortunately, I doubt we'll be able to get to solving it, but the closer one can get is at least understanding the limitations of the browser in its current state.
Thanks!
Browser and version becomes a major issue here. The thing is, while there are so-called "Webkit-Based" browsers, they add their own patches as well. Sometimes they make it into the main Webkit repository, sometimes they do not. With regards to versions, browsers are always moving targets, so this benchmark could be completely different if you use a beta or nightly build.
Then there is overall use case. If your use case is not the norm, the issues will not be as apparent, and it's less likely to get noticed and adressed. Even if there are patches, browser vendors have a lot of issues to address, so there a chance it's set for another build (again, nightly builds might produce different results).
Honestly the best course of action would to be to discuss these results on the appropriate browser mailing list / forum if it hasn't been addressed already. People will be more likely to do testing and see if the results match.