HTML2PDF Conversion [closed]

HTML2PDF Conversion [closed] - html

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We're developing software for both Linux and Windows that requires CVS files to be generated into PDF reports. I've written a program in C to turn the CVS files into HTML files (td, tr etc.) and am then converting the HTML into PS using html2ps and then ps2pdf under Linux.
However as mentioned above we're also developing for Windows and while I'm aware that html2ps and ps2pdf are available under Windows they have a few dependencies which are going to cause headaches for our clients (namely Perl and Ghostscript). Are there any native Windows console applications that will convert HTML into PDF that can be distributed as single executable files with no major dependencies?

I would like to add a new entry that I've used recently - wkhtmltopdf - it is an open source project that uses webkit to render, which means that it has all the latest and greatest available including CSS3, SVG, and can even let javascript run before creating the pdf. It doesn't have the same level of polish as princeXML, but its the best FOSS solution I've found.
I haven't used it for multi-page documents yet, but I believe it does have support for css page-breaks.

I've used Prince XML with Java and it is extremely powerful and easy to use, but it's also commercial.

I've used ExpertPDF's Html2Pdf converter component. Easy to make .NET app to convert.

Check Pisa. Its completely written in python so should work on Windows and Unix. The license is GPL but commercial license is also available.

We've used Apache FOP in the past to convert XML docs to PDF. perhaps not quite what you are looking for but It might be an option?

I had lots of luck with HTMLDOC. It is open source and available on many platforms and has a commercial version if you want to pay for it.

I've used Aspose which has both .NET and Java libraries to use for PDF conversion. I think it's great to use, but there's definitely a cost involved.
Since you're looking for something open source, I might suggest iText. I hear it's good, but haven't used it myself.

PDFLib could be bundled with a native application written by yourselves to create PDFs. It's a great library - used to be free but it looks you have to pay for it these days.
Another option, but it doesn't exactly fit your non-dependency requirement, is iTextSharp.

As Russell says wkhtmltopdf is probably the best bet. I've created a free online service to convert HTML to PDF files http://www.html2pdfrocket.com which uses wkhtmltopdf but makes the process easy and cross platform.
I've added examples on how to conversion HTML to PDF for PHP, C#, RUBY and HTML. You could trigger it in JavaScript if you wish.
It's being used by the heart foundation and others to create PDF files in real time, for example PDFs of recipes, invoices, receipts etc - although you can download and cache the PDF output if you wish.
Hope you find it helpful and please write to me there if you have feedback or if you need help getting kwhtmltopdf working in your own env.

Related

Include Highcharts in open-source project [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
I would like to include the Highcharts library in an open-source project, scala-notebook, and I'm not sure, whether it's allowed according to the Highcharts licence.
scala-notebook is a web-repl (read-eval-print-loop) or web-console (similar to IPython). One can create new notebooks and write code there that will be executed on the server side, and results would be rendered on the page (in the output section). I would like to give the user the ability to define chart data in the code and then it would be rendered as a chart using the Highcharts library. It's distributed under the Modified BSD License
(also known as New or Revised BSD).
So my questions are:
Am I allowed to add the Highcharts library in the project distribution?
If yes, then what should I include in order to correctly cite Highcharts licencing information in my project (for example I can add it to the help section of the page and/or add the license to the root of the project's file tree, etc.)?
My project is open-source, so I can imagine, that someone will download it and will deploy it internally at his/her company. I also want to make sure this use-case is permitted (according to the license).

IF you are using it non-commercially, THEN you are allowed to use it according to the CC BY-NC.
As far as I can tell you should be allowed (since it doesn't have the share-alike clause) to redistribute under any license you please; if this is morally justified is another question.
Note that the Creative Commons licenses are not aimed for software, so the waters with regards to linking, combining with other licenses etc. are a bit murky.
It might be best to ask the people from Highchart your question (or even send them a link to this Stack Overflow question).

You can use it with some open source projects, however you can not use it with Free Software.
Unfortunately it is a copyright violation to use Highcharts with GPL code as the commercial restriction violates the GPL.
This is a problem unfortunately, as even the Highcharts website potentially violates the GPL by including Highcharts with the Joomla code, although there is an argument that as long as Highchart does not distribute the code from its website it might be in the clear.
Theres no concievable way to use Highcharts with an AGPL website.
The problems with combining "not for commercial" and GPL code are explained here.
https://softwareengineering.stackexchange.com/questions/214904/is-free-for-non-commercial-use-license-compatible-with-gnu-gplv3-license

At the moment Highcharts offer an OEM License which "allows you to distribute Highcharts in your software or hardware product", maybe the've added this after (and because?) this question arose. The OEM license agreement will give you information on how to use it.
Anyway, I don't think any other license of the product will allow you to do this.

If project is open source do you bother looking at the sources? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I was pretty surprised to find out that raw sources of my little open source project are getting downloaded more often than the compiled and ready to use library (jar file in this case, platform independent). I wonder what are the reasons behind that? Lack of trust? Curiosity? Compiling with custom settings? Attaching sources for debugging?
Personally I usually don't bother downloading and looking at sources unless something is not working or I don't understand how it works.

I often download sources just to see how other people have implemented certain things. Reading (and understanding) other peoples source code is a good way of becoming a better programmer yourself.
As for the relatively high number of downloads, perhaps your library is included in other projects like a Linux distributions? Such projects usually download and build from source themselves so that they can properly package it.

The first reason would be for customizing applications.
Also its not a good practice to download some code and use it straight away without looking at how the code works. There will be something for you to learn from the code.
Also you might not need the whole functionality of the project. If the project is too big and you need to use only some functionality in it it would be a great idea to trim the project to your needs and then use it.

For every piece of software of long term interest for my company, I look at the sources to assess the quality. The rationale behind it is that badly written software is usually also bad to use and maintain and thus a business risk in the long term.
Even with most commercial software like ERP systems it is no problem to get a look at the source. Only for COTS (say MS Office) it is hard to get the source.
I also check source for every hiring decision.
An other reason why you see so many source downloads might be automated build systems like FreeBSD Ports which download and compile automatically.

I look at the source just to learn how the program works.
As silly as it might seems, the open source software ( such as open source CRMs) is notorious for the lack of documentation. The only way to find out how it works is to experiment with it. When even experiment fails, it's the time to fire up your IDE and read the source!!

Maybe the answer will be disappointing, but the relatively high number of source downloads could mean that the application is packaged in a port-based distribution like Gentoo, FreeBSD or MacPorts where every package is downloaded and compiled on a local machine during installation.

If it's a framework, I always download sources. I use them for debugging and to see how they've implemented certain things. If it's a standalone application, I generally don't look at the source unless there is a problem or the application does something unique.

As you say your binary is a jar, it sounds like it is a Java-library (rather than an application). Developers often use source: to include it in the IDE to debug in the library and lookup certain functions. Also many developers include the sources in their build-process to compile also the dependencies. That may be an explanation.

The number one reason is compiler settings. You can't imagine the amount of pain caused by linking a static library compiled with some incompatible settings. Compiling on your own with checked settings simplifies life greatly. Plus when you decide to change the compiler for the better one you don't need to have the old static library - it will be compiled by the new compiler two.
The number two reason could be that people want to see how some things work inside. For example, they want the same or similar functionality in their commercial closed-source project and can't just borrow code because of the viral license. However they can see how it works and get inspired - that't why they download the source and read.

I have downloaded libraries and compiled them my self but I have not actually looked at the code. When I use a library it is good to know that I can make changes and have the source on hand. I have on occasion taken just a file or two if it is a massive library and I only need a single functionality from a large library.

Some reasons could be:
Distrust of binary downloads due to trojans, etc
Taking a look at how you've implemented something
Checking out the quality of your code :)

Since this is a library, the need for comprehensive documentation is much higher than for a standalone app. I often find myself looking up the code of a library to figure out certain things sometimes left out of the docs, e.g. time/space complexity of certain functions.

We use some open source packages for our commercial application. I always download and build from source.
If our hosting platform changes in
the future, it might change to
something that does not have a
precompiled binary. I want to be
able to use the same package/version
on the new platform.
If the package goes dormant or
becomes unsupported, I want to be
able to apply a change or fix if
absolutely necessary.
If something is going wrong on the
server (memory leak, CPU spike,
etc.), I want to be able to add
logging or instrumentation code to
identify or eliminate the package as
the source of the problem.

I can of course only answer for myself, but it is not seldom that i download the binaries (assuming I trust the project which is usually the case), and the when I debug I download the sources. But I have a tendency to delete the sources when I think I'm done with them and since you are never really done I might have to redownload the sources later and thus causing the source downloads to be higher.

What is a good network graph library for language X? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 months ago.
Improve this question
I have noticed that a recurring question is: “What is a good network graph library for language X”. I have played with quite a few of the libraries and I can share my experiences with you.
Python:
NetworkX is a robust library which has built-in visualization but also has an interface to Graphviz using pyGraphviz. (pyGraphviz and NetworkX are written by the same author). NetworkX is open source and a very easy to use.
Perl:
Circos is developed to visualize genomes and other highly complex datasets. It will always use a circular layout but that it often the most appropriate layout if your network is really large and its ‘modularity’ score is low. Circos is open source.
.Net:
NodeXL is developed by Microsoft Research and is both an add-on for Excel and a .Net 3.5 library. It’s pretty open (for Microsoft’s standards) and uses Fruchterman-Reingold algorithm for visualization.
Java:
JUNG2 has recently been released and is also a robust library. Has extended visualization and key metrics support. JUNG2 is open source.
UbiGraph:
UbiGraph has interfaces to different languages including Python (and NetworkX has UbiGraph support), Ruby, PHP, Java, C, C++, C#, Haskell, and OCaml. It has very neat 3D visualization of network graphs using an XML-RPC server. The basic version is free, you have to pay for the professional version.
Standalone:
You can always use an off-the-shelf package such as: Graphviz (Win, Linux, OSX), Pajek (Win), UCINET (Win), or even Visio (Win).
I am sure there are many more packages, but these are the ones that I have used myself. What other libraries or packages are available?

You should add graph-tool to the python list. It is very complete, and it is implemented in C++, with the Boost Graph Library, making it orders of magnitude faster than python-only alternatives, such as NetworkX.
Disclaimer: I'm the author of graph-tool. :-)

For Clojure, there is loom. Its WIP but looks good.

The Stanford Network Analysis Project (SNAP) was written in C++ and designed with performance in mind to analyze large data sets. The project has been extended with a Python library, and it has comprehensive documentation.
Note also that the project is a good resource for empirical data sets from a variety of domains.

In Java, prefuse is by far the best graph drawing package. It has a very fast force-directed layout algorithm, and since you can tweak the parameters in real time and drag nodes around to get the graph looking the way you want, you’ll be able to explore and arrange much larger graphs than with any non-interactive system.
Try out this demo applet and you’ll fall in love with it too...

If you like the examples on this page, take a look at Mathematica’s graph plotting capabilities. The author of the gallery page, Yifan Hu, used to work for Wolfram Research, where he developed graph drawing algorithms for enormous graphs. Those algorithms are now integrated into Mathematica. Depending on how you intend to use the graph drawings, you could get a huge benefit by being able to use Mathematica to analyse your graphs; see for example this blog post.

yFiles is a suite of layout algorithms that offers the broadest range of different automatic sophisticated layout styles. It's a commercial offering and is available for several popular platforms and languages: Javascript, Java, C#, and more.
There is an interactive online demo that shows many of the available algorithms and the libraries can be evaluated for free.
Disclaimer: I work for the company that creates these libraries, however on SO I do not represent my employer. This recommendation is based on my own opinion. I have seen many different layout suite implementations for the above languages in the last 15 years and I don't know of any other implementation available that is as complete and extensible as this one.

Super-fast screen scraping techniques? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I often find myself needing to do some simple screen scraping for internal purposes (i.e. a third party service I use only publishes reports via HTML). I have at least two or three cases of this now. I could use apache httpclient and create all the necessary screen scraping code but it takes a while. Here is my usual process:
Open up Charles Proxy on the web site and see whats going on.
Start writing some java code using Apache HttpClient, dealing with cookies, multiple requests
use Jericho HTML to deal with parsing of the HTML.
I wish I could just "record my session" quickly and then parametrize the things that vary from session to session. Imagine just using Charles to grab all the request HTTP and then parametrize the relevant query string or post params. Voila I have a reusable http script.
Is there anything that does this already? I remember when I used to work at a big company there used to be a tool we used called Load Runner by Mercury Interactive that essentially had a nice way to record an http session and make it reusable (for testing purposes). That tool, unfortunately, is very expensive.

HtmlUnit is a scriptable, headless browser written in Java. We use it for some extremely fault-heavy, complex web pages and it usually does a very good job.
To simplify things even more you can run it in Jython. The resultant program reads more like a transcript of how one might use a browser than hard work.

You don't mention what you want to use this for; One solution is to simply "script" your web browser using tools like Selenium if having a web browser repeat your actions is an acceptable solution. You can use the Selenium IDE to record what you do and then alter the parameters.

I wish I could just "record my session" quickly and then parametrize the things that vary from session to session.
If you have Visual Studio test edition it's web test function does that exactly. If you aren't using VS or want a stand alone tool I have had great success with OpenSpan. It is more than just web, it does windows apps, and java!

Selenium would be my 1st pick, as the IDE lets you do a lot of things the easy way by "recording" a session for you. But, if you're not happy with what it provides, you can also use the Python module called Beautiful Soup to programmatically walk through a website.

Python and Perl both have a module called Mechanize (WWW::Mechanize for perl) that makes it easy to do browser behavior programmaticly (filling out forms, handling cookies, etc).
So, Python + BeautifulSoup (great html/xml parser) + mechanize (browser functions) = super easy/fast scraper

I used DomInspector for manually inspecting the site of interest to parametrize it's structure. Then simple Apache HttpClient and hand-made parser using this parametrized structure. Basically I could extract any info from any site automatically with a little tweak of parameters.. It's similar to how SAX parser works, all you need to tell it is at what sequence of tags you want to start grabbing the data. For example, google have pretty standard format of search results.. So, you just run to the third occurrence of 'tab' and start getting text from the first 'div' up until the end '/div'

Internet Explorer supports Browser Helper Objects (BHOs). They can access IE' HWND (window handle) and it's easy to scrape the pixels from there. The IWebBrowser2 COM interface also gives you access to the HTTP requests, and you can get back the parsed HTML document via IWebBrowser2::Document = IHTMLDocument / IHTMLDocument2 /IHTMLDocument3

Using FireFox, it should be possible to implement much of it with its powerful support for addons and enhancements, however that wouldn't really mean to run "headless", but really be a real scripted browser. Also, I seem to recall having read that google's chrome browser uses a similar technique to do automated regression testing.

I can't personally vouch for it, but there is a free firefox plugin: DejaClick
I installed it the other day and did some remedial recording, playback, and script editing activities with it. It pulled them off without much of a learning curve. If your end goal is to show something in a web browser, then it should suffice.
They offer web transaction monitoring services, implying that you can export the scripts for other uses, but they may be too proprietary to use outside of your web browser / their paid service.
http://www.dejaclick.com/

Codeplex/Sourceforge for internal use [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a free/open source collaborative project manager that can be deployed internally in my workplace that would act similar to Codeplex or Sourceforge. Does anyone know of something like this, and if so do you have experience with it.
Requirements:
Open Source or Free
Locally Deployable
Has the same types of features found in Sourceforge / Codeplex
Issue/Feature Tracking
Community Interaction (ie. Voting, Roles, etc.)
SCM Integration (Optional)
.NET/Windows Friendly (Optional)
Every business ends up having internal utilities, and domain specific apps that developers create to make life easier. Given the input of the internal developer community they have the potential to become much better (can you say GMail...), and I would simply like to foster such an environment internally by providing an easy place for that interaction to take place.
UPDATE:
So I like what I am seeing in both Trac and GForge, but both are heavily geared towards UNIX/Subversion environments. I should have specified this, but we are a MS shop from top to bottom. How practical do you think it is going to be to try and use these in a MS .NET environment? Would that be like trying to shove a square peg through a round hole?

I like redmine for this: http://www.redmine.org. The only thing it's missing from your criteria is voting, but there might even be a plugin for this.
Trac is also popular (http://trac.edgewall.org) but it lacks suport for aggregation of data across projects.

Try GForge, it's a SourceForge fork and has most of its features.

I agree, Trac should work. IMHO setting up Subversion should be relatively easy on Windows too, there are great Windows clients for it (tortoiseSvn), and Trac runs on python, so it will work on Windows too.

Other advantages of Sourceforge Enterprise are these plugins. There are extra plugins for Visual Studio wich can be found here and here.

SourceForge Enterprise Edition 4.4 is available for free for up to 15 users. We use it for our development team and another development team where I work.
It's been working great for us. It has subversion and cvs built in (whichever you wish to use). If you plan on accessing it over the internet you might want to enable HTTPS. I had to do a little finagling to get HTTPS to work correctly (finding the right CentOS packages to install). If you wanted to use this solution with HTTPS I wouldn't mind if you sent me a message asking for help.
It comes with a VM for VMWare Player:
http://www.collab.net/downloads/sfee/index4.4.html

Launchpad has support for Code Hosting and version control, Bug tracking, Blueprints, Answers, Polls, Translations, etc.
Launchpad is used by the Ubuntu Project.
A few weeks ago, Launchapad was released as open source.

I was just wondering the same thing, something like Trac but in .NET, after a quick GOOGLE search (I have never tried these tools) I found
sharpforge (This no longer looks free!)
I like how the site .netTiers looks.
They use screwturn wiki.
It is totally free if you fulfill all GPLv2 statements.

Assembla and BeanStalk are nice, both have things like; wiki, discussion, alerts, chat, ticketing, Trac, Git and Subversion

What about Trac? It's pretty simple, but does it's Job for a lot of Open Source projects.

I would concur on the Trac suggestion. I use it both for an open source project and for an internal project. It has decent issue tracking and integration with Subversion which allows links between tickets and subversion checkins. It also has an integrated wiki, which can be of some use for documentation. Although we do not use it for voting / community type features, I know there's a number of addons to it that might serve this purpose.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008