Objective-C - Parsing a .csv, extracting and inserting information, then displaying the .csv as an interface for editing - html

This question has been troubling me for the past week. Below, I will list my issue, and the research I have put into it.
The scenario: I was given a .csv file with 5000 rows and three columns. The three columns are defined as:
Site ID|Site Name|Site URL
My task: To create an HTML interface for the designers of the company to rate each site on a scale of 1-5.
My plan of action: I am a new hire. I am getting accustomed to the language I was hired for, which was Objective-C.
My algorithm for the project was to:
Parse the .csv
Remove the "Site Name" variable
Create a new .csv that contains the below variables: Site ID|Site URL|Rating|Image
Display the new .csv (with all aforementioned items) as an HTML page where there are toggles for "Ratings", which when pressed, will log the rating into the .csv which it was imported (or loaded) from.
The "Image" section I will be using a piece of software by the name of Paparazzi (on the Mac OS X operating system) which takes a fully formatted screenshot of the main page and saves it as a PNG file. I plan on using the file extension URL (which is stored locally) and load it into the "Image" column, thus when the designer clicks on the image, he is able to load the image that is stored locally.
My issue: As Objective-C is not entirely a scripting language, I am confused with some of the libraries I may need and/or methods I can implement this. I have the algorithm, but I am wholy unsure with the implementation.
My questions: If you have done a project similar to this before with Objective-C, what tips can you provide for me? How does one load the .csv as a HTML interface where upon edit, it will save this edit into the .csv? Will I need any servers for this, or is everything executable from just a machine? How do you grab an image (stored locally), extract its file extension, and load it onto the .csv?
The most important question: Is this achievable through Objective-C? My reasoning behind it is, I want to advance my knowledge of OC through a task like this. Yes, using Python is easier, but is it possible to do this with Objective-C?
Thank you.

It certainly is achievable, but I doubt you'd really want to go this way. If I understand it correctly, you want to serve the HTML page to others via web browser - that would mean either writing a (simple) http daemon, that would run on the server or writing a CGI script that would communicate with a standard http daemon. Python/PHP/Ruby do this for you readily, so there is much less room for possible errors.
As for
As Objective-C is not entirely a scripting language
I would perhaps rephrase it as
As Objective-C is entirely not a scripting language

Related

Pretty-print Lua source-code in external file, without embedding it in the HTML file

Since my experience with HTML is fairly rudimentary (and pretty old), I am not sure if my requirement is realistic.
Lets say that I have quite a few files containing Lua source-code, and all of them have the ".lua" extension and available in a particular subdirectory. What I'd like to do is create a static index.html file, which when loaded in a browser, would show the list of the lua source-code files in a drop-down. Once one of the source-code files is selected, I'd like that the file gets loaded into an "area" on the same page, and is pretty-printed, i.e. with syntax-highlighting in browser. I was wondering if I could use something like the google-code-prettyfy for the syntax-highlighting part ? Also, I am not clear if an external lua sourcecode file can be loaded, and displayed within a certain region of html page as being rendered. If yes, would appreciate elaboration on the how part.
A tool like LDoc can be used to accomplish a lot of what you want, much as Doxygen would be used for a C language source kit.
Both are heavily driven by inclusion of specially formatted comments that carry documentation.
I know Doxygen can fold source code into the generated document set, I don't recall about LDoc. Both are actively under development.
It isn't necessarily a bad idea to use both tools on a project, especially if you have C source code implementing Lua modules. You could use Doxygen to build the overall document tree for your engine and C modules, and LDoc to build documentation of the Lua parts. It should be possible with a little care and configuration of both tools to get them to play well together.

HTML5: accessing large structured local data

Summary:
Are there good HTML5/javascript options for selectively reading chunks of data (let's say to be eventually converted to JSON) from a large local file?
Problem I am trying to solve:
Some existing program locally and outputs a ton of data. I want to provide a browser-based interactive viewer that will allow folks to browse through these results. I have control over how the data is written out. I can write it all out in one big file, but since it's quite large, I can't just read the whole thing in memory. Hence, I am looking for some kind of indexed or db-like access to this from my webapp.
Thoughts on solutions:
1. Brute-force: HTML5 FileReader API has a nice slice() method for random access. So I could write out some kind of an index in the beginning of the file, use it to look up positions of other stored objects, and read them whenever they're needed. I figured I'd ask if there are already javascript libraries that do something like this (or better) before trying to implement this ugly thing.
2. HTML5 local database. Essentially, I am looking for an analog of HTML5 openDatabase() call that would open (a read-only) connection to a database based on a user-specified local file. From what I understand, there's no way to specify a file with a pre-loaded database. Furthermore, even if there was such a hack, it's not clear whether the local file format would be the same across browsers. I've seen the phonegap solution that populates the browser local database from SQL statements. I can do that too, but the data I am talking about is quite large (5-10GB): it will take a while to load, and such duplication seems rather pointless.
HTML5 does not sound like the appropriate answer for your needs. HTML5's focus is on the client side, and based on your description you're asking a lot out of the browsers, most likely more than they can handle.
I would instead recommend you look at a server-based solution to deliver the desired goal/results to the client view, something like Splunk would be a good product to consider.

Extracting data from PDF or Word using PHP, Java

I need help on this...
Especially since I don't know where to start..
I am an IT undergraduate and, along with my groupmates, is now undergoing on-the-job training in a company.
SCENARIO:
The company asked us to create a program that will generate a report and store it in a database.
The database that will be used is MySQL.
As for what language to use, we are considering VB.Net, Java, PHP.
The program must be able to :
generate a report that will be sent through email to an office
store in a database
collect all reports, collate those reports
generate a new report which will then be sent to their main office
then store it in their own databse...
For now,
we are still trying to determine how the program will run and what language will be used that has the capability of reading and extracting data from a text file (can either be a word document or a PDF file).
The company also wants the program to be online-ready for future expansion.
Now, our problem is
Is there a way to extract data from a PDF or Word file using either Java, PHP, VB then store it in the MySQL DB?
if there is, can it be implemented without using any 3rd party software?
the reason why we chose to use either a PDF or Word file type is that, the file should be printable for archive purposes.
What programming language can we easily use to be able to achieve our problem above?
I would like to apologize if the info I am giving is a bit messed up. I will be giving additional information once we are able to talk wth the company this week.
If there is a problem with the way I posted this, please forgive me. I am just trying my best to provide you with the information the best I could.
I'll answer for Java as it is what I use at work.
You can easily extract text from Word files or build a new Word file with Apache POI
As for PDF, iText or PDFBox both does a pretty nice job.
Why can't you use 3rd party software? If you could, I would recommend something like How to read PDF files using Java?.
Or, to read a .doc file: http://www.roseindia.net/tutorial/java/poi/readDocFile.html
Anyway, if you can't use 3rd party tools, why not read the specifications and figure out how to extract the text from PDF, DOC, and DOCX files?
Here you can find DOC specifications: http://msdn.microsoft.com/en-us/library/cc313118.aspx
Here you can find the PDF format specification: http://www.adobe.com/devnet/pdf/pdf_reference.html
Good luck!

How can I create a well-formatted PDF?

I'm working on automating our company invoicing system. Currently all data is stored in our local MySQL database and someone manually updates an excel spreadsheet and then merges this data into a MS Word template. The goal is to automate this process so that the invoice can be generated from our intranet website as a PDF.
My original plan was to create a template in HTML/CSS and use wkhtmltopdf to generate the PDF but I ran into problems with getting a repeatable header and footer on each page. thead and tfoot aren't supported by Webkit and the fix suggested in this other question does not seem to work either.
So I then stumbled on using XML and XSL-FO, the latter I know nothing about. Is this the best path to take? Are there any libraries or utilities out there that will make converting my HTML+CSS into XML+XSL-FO easier? Are there any other alternatives I'm overlooking?
EDIT
Currently the server is CentOS Linux with a MySQL database. All other code is currently in PHP currently but that may change as the whole system is being revamped. Linux and MySQL will almost certainly remain, though.
For your requirement, XSL-FO might just do the trick. It is much cleaner to produce the pdf's directly from the data, then going the cumbersome html path, unless you need to display the html as well, then you might consider converting from html to pdf, but it will always be messy.
You can get xml results from mysql quite easily (mysql --xml) and then you write one (or several) xsl-fo stylesheet for the data. then, you cannot only produce pdfs, but also postscript files or rtf's with some processors.
XSL-FO has its limitations tho, but for your situation, it should suffice.
I admit, the learning curve can be steep, and maintaining xslt-stylesheets can get very tiring, but as you start knowing more about it, you end up writing less code.
another possibility is to do the whole thing in e.g. java or c# - send select statements and loop the results and iteratively build the pdf using a library like iText.
You could try JODReports or Docmosis as less-code intensive options. You supply Word or OpenOffice Writer documents to act as templates and use these engines to manipulate/populate the templates then spit out the documents in the format(s) you require. This may mean your existing Word-templates can be used directly which should save you some effort/time.
iText is another library that will let you build and pump out PDFs from code. It's pretty good.
If you cloud use ASP.NET for web you can use free ReportViewer library and designer for automated of publishing PDF-s.
Here is some references:
http://gotreportviewer.com
http://weblogs.asp.net/srkirkland/archive/2007/10/29/exporting-a-sql-server-reporting-services-2005-report-directly-to-pdf-or-excel.aspx
If you're OK using .NET and C#, you could use DotPdf from Atalasoft (obligatory disclaimer: I work for Atalasoft and wrote most of DotPdf). The Generating namespace is geared for exactly what you're trying to do: automate report generation. From the very basics, you could just create docs directly with the toolkit or you can create template documents that have unpopulated text fields that you can reload and fill later (see here and here for examples).

What are HTML applications?

I recenty stumbled upon some files described as "HTML Applications" on my Win XP machine.
What are they?
Who would ever use them? Why do I have like 2 or 3 of them on my PC?
How do they generally work? I mean hey - HTML is for adding formatting to text - HTML Applications? What the? Microsoft?
HTAs are good for things like VB scripts that you want an interface for other than MsgBox or a console window.
Since it's HTML, you can use buttons, text areas, check boxes, etc to show information to the user and get input from them, and use CSS to style it all. Since HTAs run on the local machine, you have access to everything you can do with VBScript for computation and file access, WMI for system management, program automation with COM objects, data access with ADO, and so on.
I once wrote an HTA that installs, updates, and compares Word templates on a user's machine from a common folder. The user can see their template folder next to the common folder to know if they are up to date, and hit the Update button if not.
Another one manages and verifies the installation of a program on a user's computer, copying over the exe if necessary, making sure registry entries are set correctly, putting a shortcut on the desktop, letting the user test and see the results of the installation, and so on. It also logs all of this info to a common place for me to check on.
One of my biggest HTA projects was a Project Manager system. The interface showed me all of the Excel, Word or Access projects I had going on. It would open the selected project in its particular environment, and showed me all of the pieces of it. It allowed me to import and export code modules from a common library using VBE automation (the Visual Basic Editor COM interface).
I'm about to put one together to show current and "dead" printer drivers on a user's machine. With me coaching them over the phone, they will run the HTA which will list all of the installed printers. They will put a check mark next to the ones they want to keep, then hit a button to delete all of the others. Fairly easy for them, and saves me from going to each and every PC to fix this.
Many of these kinds of things only make sense in a Windows environment, but you can write some pretty general purpose stuff with it too. Anything you can express in VBScript or JScript (JavaScript) and want an HTML/CSS front on is a good candidate for an HTA. I also even wrote a basic network chat system in it at one point.
There are lots of little HTAs around for converting data from one format to another, say converting comma separated data to columnar, or adding or removing various kinds of formatting like quote-printable escape codes, converting hex formatted text into plain text, and on and on. Copy text into one input text area, check a few options and press the Go button, then copy the converted data from the output text area. One I wrote was an SQL formatter. It would take SQL code and wrap it up as either a VB or Delphi string, and also
go from wrapped back to plain SQL code, with basic indenting and "pretty printing" to clean it up.
I don't do as much with HTAs as I used to, but still think they are a pretty cool technology for the kinds of jobs that fit in that niche.
See here for Introduction to HTML Applications (HTAs).