this question on wkhtmltopdf has a specific component and a more general component to it.
generally: i am trying to extract a wide range of webpages into pdf files and i want wkhtmltopdf to work in as many cases as possible. its a pretty good tool but i often meet problems when it couldn't convert webpages. do you guys have a go-to set of flags that you use with wkhtmltopdf?
specifically: for example, a webpage that isn't anything far-out, but i am having problems with is http://gizmodo.com/microsoft-surface-book-review-so-good-i-might-switch-1737680767. when i run wkhtmltopdf without any flags (in Windows), i get the following:
>>wkhtmltopdf http://gizmodo.com/microsoft-surface-book-
review-so-good-i-might-switch-1737680767 blah.pdf
Loading pages (1/6)
Error: Failed loading page http://gizmodo.com/microsoft-surface-book-review-so-g
ood-i-might-switch-1737680767 (sometimes it will work just to ignore this error
with --load-error-handling ignore)
Warning: A finished ResourceObject received a loading progress signal. This migh
t be an indication of an iframe taking too long to load.
Warning: Received createRequest signal on a disposed ResourceObject's NetworkAcc
essManager. This might be an indication of an iframe taking too long to load.
Exit with code 1, due to unknown error.
if i follow the instructions and use the --load-error-handling ignore flag, the PDF file is generated, but its empty. how do i get wkhtmltopdf to work with this webpage?
i tried to look at other tools such as phantomJS with rasterize.js, but it has its own set of problems...
thanks guys!
This happens when Javascript is enabled and it is too slow to complete. If you need to run javascript to solve this problem add:
--javascript-delay 100000
which adjust the wait time for Javascript to complete (it's in milliseconds). So in the example above it waits for 100 secs. Note if you run a multiple document conversion at once, this setting applies to the whole run, and not to each individual document. Therefore if, say, you convert some 100 input htmls in a single pdf output, you may need a longer delay.
I also add to my scripts:
--no-stop-slow-scripts
which enables: Do not Stop slow running javascripts.
Turns out its actually quite simple!
simply use the "-n" flag! works like a charm!
Related
I am using pyscript to do image processing for a website I am making on hamming and reed solomon codes. It does however take a long time to load. I have a css loading animation but it freezes while the python image processing code is running. Is there a way for me to run my python scripts while still retaining HTML animation and function?
I have looked into multiprocessing and threading. Both seem unavailable in the current state of pyscript, pyodide and html. I am considering changing the css to a gif, but this doesn't fix other interactables on the website.
Python running in PyScript, just like JavaScript, is inherently limited to the single browser main thread (but see below). While the main thread is busy working on a synchronous (blocking) task, liking running a single function, all other interactivity on the page will be suspended.
The (current) way around this is to make your code either asynchronous, writing your code as a coroutine to periodically yield execution back to the Browser's event loop. For example, you could take a multi-step process and break it up with await-able events like asyncio.sleep():
import asyncio
async def myLongFunction():
doThing1("foo")
await asyncio.sleep(0)
doThing2("bar")
await asyncio.sleep(0)
return doThing3("baz")
asyncio.ensure_future(myLongFuction())
Simpler solutions may be available in the future. There's quite a bit of work currently on allowing PyScript to execute code in a web worker, which wouldn't block the main thread while executing, although the results must be passed as structured messages back and forth.
Additionally, having access to pthreads in Pyodide (the primary runtime underpinning PyScript) is a long-standing goal, but one that's inching closer to being possible.
I'm running chromium as follows, making it output generated assembly code and loading a specified .html:
./chrome --js-flags="--print-code" ~/example.html
Is there a way (command line parameter?) to infer whether the page has finished loading, i.e. all assembly code has been outputted? Ideally, by passing this information via stdout.
Thanks!
In short: no.
One reason is that the statement "the page has finished loading" is so vague that it is effectively not a definition at all. Many modern websites load in stages: at first some server-rendered static HTML is downloaded and displayed, then dynamic content is loaded asynchronously, often again in several steps (e.g., primary content first, ads when idle). What if there are several <iframe>s? What if there's a timer on the page that loads more things (e.g., a picture slideshow, notifications, new ads) after X seconds? What if user interaction (clicking, scrolling, etc) triggers more loads? With all these cases, the browser has no way of knowing that "this page has finished", and many pages are never "finished".
Another issue is that the statements "the page has finished loading" and "all assembly code has been generated" are very different things. In fact, V8 generally tries not to compile any optimized code while initial load is still in progress, because that would create slowdowns and jank without providing much benefit. Instead, optimized code is compiled later, for JavaScript functions that are observed to be run a lot. Since optimization depends on what the JavaScript code is doing, there's in general no way to predict whether code generation is finished. I've seen infinite-scroll websites where every scrolling event caused optimized compilation of some more code.
That said, for specific scenarios you can get some approximation: if you have control over example.html, you can emit a console.log("MY_MARKER") at a time of your choosing. If you then run Chrome with --enable-logging=stderr, you can find the console.log() statements (along with a bunch of other stuff) on stderr.
I really want to inject my C++ program into another (compiled) program. The way I want to do this is changing the first part of bytes (where the program starts) to goto the binary of my program (pasted into an codecave for example) and when it is finished running to goto back where it went before the injected program started running.
Is this is even possible? and if it is, is it a good/smart idea todo so?
Are there other methods of doing so?
For example:
I wrote a program that will write the current time to a file and then terminates, so if i inject it to Internet Explorer and launch it, it will first write its current time to a file and then start Internet Explorer.
In order to do this, you should start reading the documentation for PE files, which you can download at microsoft.
Doing this takes a lot research and experimenting, which is beyond the scope of stackoverflow. You should also be aware that doing this depends heavily on the executable you try to patch. It may work with your version, but most likely not with another version. There are also techniques against this kind of attack. May be built into the executable as well as in the OS.
Is it possible?
Yes. Of course, but it's not trivial.
Is it smart?
Depends on what you do with it. Sometimes it may be the only way.
some stats before i can state the situation,
total JS code = 122 MB
minified = 36 MB
minified and gzip = 4 MB
I would like to get the entire 4 MB down in one shot (with a loading progress indicator on the page), uncompress them, but not parse them yet. We don't want the code expanding in browsers memory when a lot of it might not be required at this point. The parsing should happen when a script tag with the corresponding js file name is encountered.
Intention: faster one shot download of js files, but keeping the behaviour unchanged from the browser perspective.
Do any such solutions exist? Am I even thinking sane?
If yes, I know how to get the gzip, I would like to know how to keep them in the browser cache so that when a script tag is encountered the browser doesn't fire a XMLHttpRequest for it again.
The trick is to leverage HTTP caching directives. For a starter take a look at this. You should only need to fetch your JS code once because you can safely set the cache directive to instruct the browser to hold on to the JS file indefinitely (subject to space). Indefinitely in this context typically means the year 2035.
When you're ready to update all your browser-side caches with a new version of the JS file simply use a cache busting Query String. Any serial number or time and date will do, or a simple version number eg;
<script src="/js/myfile.js?v2.1"></script>
Some minification frameworks handle the cache-busting for you. A good technique for example is those that MD5 the contents and use that as the cache buster query string. That way, whenever your source JS changes the browser will request the new version (because the QS is embedded in your HTML script tag) and then cache for as long as possible again.
XMLHttpRequest will honour the caching primities you set.
In the other part of your question, I believe what you're asking is whether you can download one combined script file and then only refer to parts of it with individual script tags on the page. No - I don't believe you can do that. If you want to refer to individual files you would need to have a HTTP URL and caching directives for each piece of GZIPped content you want to use separately. However, you might find this is as much or maybe even more performant than one big file at first depending on how much parallelisation you can achieve.
A neat trick here is to pre-load a lot of what you need. Google have been doing this on the home page for years. Basically, they pre-load stacks of resources (images certainly, but possibly also JS). So whilst you're thinking about what search query to enter, they are already loading the cache up with stuff you'll want on the subsequent page.
So you could use XMLHttpRequest to fetch your JS files (without parsing them) well before you need them. Then by the time your <script/> tag refers to them they'll already be downloaded and you just need to parse them.
In addition to cirrus's point about using HTTP caching, you could break that still-pretty-large 4mb file down and only load them when that functionality is required.
It's more HTTP requests, but 4MB is a big hit in one go.
Suggest something like require.js to load in the appropriate files when they are needed:
http://requirejs.org/docs/start.html
When I am debugging broken code, after a while the browser announces that the Flash plugin has crashed, and I can't continue debugging my code. Can I prevent the browser from killing Flash?
I am using Firefox.
Going to the debugger on a breakpoint makes the plugin "freeze". This is intentional, it's a breakpoint after all!
However, from the browsers perspective, the plugin seems to be stuck in some kind of infinite loop. The timeout value varies, my Firefox installation is set to 45 seconds.
To change the timeout value go enter about:config in the url field and look for the setting dom.ipc.plugins.timeoutSecs increase this or set it to -1 to disable the timeout altogether.
When the plugin crashes, it does in fact not so, because the browser is "killing" it, but rather the plugin terminates itself when a fatal error occurs. This is necessary to prevent the browser, or even your entire machine from crashing - there is no way to tell what is going to happen after an error like that. And besides: After the first uncaught error, your program will likely not be able to perform even correct code the way you intended, so you won't do any good by continuing a broken debugging session. So it is not a flaw, it is actually a good thing this happens!
However, you can do some things in order to work more effectively (and make your programs better). The most important I can think of right now are:
Learn to use good object-oriented programming techniques and get acquainted with design patterns, if you haven't already done so.
Take extra care to prevent error conditions from happening (e.g. test if an object is null before accessing its properties, give default values to variables when possible, etc.)
Use proper error handling to gracefully catch errors at runtime.
Use unit tests to thoroughly test your code for errors one piece at a time, before debugging in the browser. Getting to know FlexUnit is a good place to start.
EDIT
I should also have said this: A Debugger is a useful tool for stepping through your code to find the source of an error, such as a variable not being properly initialized, or unexpected return values. It is not helpful when trying to find out what's happening after a fatal error has occurred - which will also not help you to fix the code.