Approach to develop a package in R to convert PDF to HTML - html

I'm working on a project to convert PDF to HTML using R. I know there are no packages in R to do that.
I would highly appreciate if any experts could provide some suggestions or approach. I have an approach to do that with the help of python but I'm looking for some better style.

Two suggestions:
Have a look at an existing (open source) tool that does this. It will enable you to learn. https://github.com/itext/i7j-pdfhtml
Don't re-invent the wheel. Use language bindings to call an existing library from R.
Have a look at https://darrenjw.wordpress.com/2011/01/01/calling-java-code-from-r/.
Where the author explains how to call Java from R.
If you were to go for this approach, you could use iText pdfHTML

Related

Is this a job for Yeoman?

I use Yeoman, and I dig it.
However recently I have been wanting more complex code generation tools - now I know I can build custom generators, but I am wondering if people think this is the role/job/whatever that Yeoman is built to play.
Examples are,
Generating a base REST API (in Node) from a JSON schema
Generating MySQL DB Schema from JSON schema etc.
Although I could bend Yeoman to do this - do people think this is a realistic direction?
Is there a better tool for the job?
(Currently I have a bunch of custom Node scripts that suffice).
My humble opinion:
Yeoman is first and foremost a front end tool to create webapps.
Your task seems to be backend related.
You can still use grunt to scaffold your project though.
http://gruntjs.com/project-scaffolding
Cheers

JSON to JSON transformation (preferably inside Apache Camel)

I have somewhat unique requirement, which I could not find an answer to so far. I need a JSON to JSON transformation. Preferably, if I could plug it into Apache Camel, that would be wonderful.
As a side note, I would also welcome any suggestion to optimally store the JSON to JSON mapping. Is there any XSLT-based way of achieving this?
Thanks!
Mario
ZORBA with jsonIQ : http://www.jsoniq.org/
it's a native library, but with high performance. You have examples in the web page.
There is a simple design here: https://rawgithub.com/chunqishi/edu.brandeis.cs.json2json/master/docs/design-2014-04-09.html
May be you can improve it by source code, https://github.com/chunqishi/edu.brandeis.cs.json2json.
I know this is an old question, but to refresh the answers, starting from Camel 2.16 there is a new component for JOLT integration. It is very powerful !

What is the best way to open and read an excel file from a Flex Application

What is the best way to open and parse and excel file from a Flex Application built using Adobe Flash Builder 4.5. I have done tons of research, most lead me to external libraries, I could deal with that if that is the best approach, but prefer a built in method.
Thanks
There is no "built-in method" to read Excel files in Flash. If you've found some libraries that claim to do it, give them a shot; you really don't want to try doing it yourself.
Try this library.
http://code.google.com/p/as3xls/
If it's an option, it may be simpler to export the Excel file as a .CSV and then load and parse that. It's a much simpler file format, and if your Excel sheet contains simple data it is going to be much easier to do.
Otherwise the library that rejo mentioned (http://code.google.com/p/as3xls/) will help. Excel is a complicated file format, it's best not to go it alone.

Resources on how to design a framework

Are there any resources on how to design frameworks, i.e. tips and tricks, best practices, etc..
For .NET there's
Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries
http://www.amazon.com/Framework-Design-Guidelines-Conventions-Libraries/dp/0321545613
You can also study frameworks like Spring.
The google tech talk lecture How To Design A Good API and Why it Matters provides many insights on how to design a good API.
In regards to PHP ehre are some Tips from me:
Use MVC as your framework type.
MVC (Model-View-Controller) is the best way to create a framework, keeping your Logic and Models separate to your Views etc is the best way to accomplish a fresh clean application.
I believe thatStack Overflow uses a MVC pattern, Not sure if its PHP / ASP tho.
Make your code as open as possible.
Meaning that practically any object is accessible throughout the application.
A way i achive this is by creating a static class that as a global scope to overcome the problem, for example:
class Registry{....}
Registry::add('Database',New Database);
Registry::add('Input',New Input);
Registry::add('Output',New Output);
then anywhere throughout the application you can easily get objects like so:
Regsitry::get('Database')->query('Select .... LIMI 10')->fetchObject();
Do not use template engines
In my eyes template engines are not the best as PHP is itself a template engine, there's no need to create a lot of code to parse your templates and then have PHP parse it again, its logical.
Instead create an system where the user will tell the View what template file to output and check the catch for that, if its not in the cache then that object will transfer it to another object called lets say ViewLoader, Witch within the __Construct it includes the php template file, but also has other methods like url() and escape() etc so in tempalte fiels you can then use
$this->url('controller','method',$this->params);
Hope this helps you!

What's the project of choice for compiling GPB to AS3?

Inside a Java project I use Google Protocol Buffers (GPB) for serializing my objects. I can use the same .proto files in auxiliary Python code, which is great. Now I'm adding a Flex client to the whole thing and I'd like to use the same .proto files once more.
It seems there's a couple of projects out there which compile .proto files to Actionscript. From a few glances at the projects' homepages, it seems to me that protobuf-actionscript3 is actually the most advanced and most "alive" of these projects.
Has anybody had practical experience with GPB to AS3 compilers and which one(s) can you recommend (or recommend against)?
If you're sure you want to use GPB, then protobuf-actionscript3 is your best option. It builds on the semi-successful protocol-buffers-actionscript project: http://code.google.com/p/protocol-buffers-actionscript/
If you're open to looking at other formats, there's always Adobe's own AMF3. It seems to have a good amount of community support behind it.
The only choice now is https://code.google.com/p/protoc-gen-as3/. All the other Protobuf/AS3 projects are out-of-date, and lack of features.