GEDCOM to HTML and RDF - html

I was wondering if anyone knew of an application that would take a GEDCOM genealogy file and convert it to HTML format for viewing and publishing on the web. I'd like to have separate html files for each individual and perhaps additional files for other content as well. I know there are some tools out there but I was wondering if anyone used any tools and could advise on this. I'm not sure what format to look for such applications. They could be Python or php files that one can edit, or even JavaScript (maybe) or just executable files.
The next issue might be appropriate for a topic in itself. Export of GEDCOM to RDF. My interest here would be to align the information with specific vocabularies, such as BIO or REL which both are extended from FOAF.
Thanks,
Bruce

Like Rob Kam said, Ged2Html was the most popular such program for a long time.
GRAMPS can also create static HTML sites and has the advantage of being free software and having a native XML format which you could easily modify to fit your needs.

Several years ago, I created a simple Java program to turn gedcom into xml. I then used xslt to generate html and rdf. The html I generate is pretty rudimentary, so it would probably be better to look elsewhere for that, but the rdf might be useful to you:
http://jay.askren.net/Projects/SemWeb/

There are a number of these. All listed at http://www.cyndislist.com/gedcom/gedcom-to-web-page-conversion/
Ged2html used to be the most popular and most versatile, but is now no longer being developed. It's an executable, with output customisable through its own scripting syntax.

Family Historian http://www.family-historian.co.uk will create exactly what you are looking for, eg one file per person using the built in Web Site creator. As will a couple of the other Major genealogy packages. I have not seen anything for the RDF part of your question.

I have since tried to produce a Genealogy application using Semantic MediaWiki - MediaWiki, the software behind Wikipedia, and Semantic MediaWiki includes various extensions related to the Semantic Web. I thought it is very easy to use with the forms and the ability to upload a GEDCOM but some feedback from people into genealogy said that it appeared too technical and didn't seem to offer anything new.
So, now the issue is whether to stay with MediaWiki and make it more user friendly or create an entirely new application that allows for adding and updating data in a triple store as well as displaying. I'm not sure how to generate a family tree graphical view of the data, like on sites like ancestry.com, where one can click on a box to see details about the person and update that info or one could click on a right or left arrow around a box to navigate the tree. The data comes from SPARQL queries sent to the data set/triple store both when displaying the initial view and when navigating the tree, where an Ajax call is needed to get more data.
Bruce

Related

Newbie - How to display info from a .dat file?

I'm pretty new to this so I'm not sure if this is a simple request or not but here goes:
I am working on a school website and under each program page is a list of course codes. What I'm looking for is when I click on said course code (ex. HEL2106), to have a lightbox-type of popup that displays program info about said course code. What I have is a .dat file that has all the course codes and descriptions in it, so I would like to use some sort of HTML/CSS/JS that will pop this up and display the correct info about the clicked course from the .dat file.
I'm not 100% sure on how to go about this so if anyone has any suggestions at all, that would be really helpful.
If you need any other details from me, let me know.
Thanks,
(File Info* The .dat file is pretty much just a notepad document with each course code & description in sequence)
Just to let you know, you need to search and learn about a lot of things first.
For data access on a website, you need access to a database. If you don't know about SQL (or any other query language), Query, Database, Tables, Server ... then you should start there.
To read those databases, you need to write code (ASP.NET, PHP, etc) that runs on a web server (Apache, IIS, etc).
If you want to create a website, I recommend you start working with WordPress, Joomla or other CMS (Content Management System) for you to familiarize with a lot of things before jumping to the advance stuff.
YouTube is a very good friend and teacher! :) Start by looking some tutorials there. Hope this will guide you to what you need.
I have no idea what your level of experience is based off your question so I will assume you have a basic understanding of HTML,CSS and JS. If not, then I would recommend Exel Gamboa's answer.
It sounds like you're looking for something like http://fancybox.net/
Of course, it is typically used for displaying images but it could be easily modified for your purpose.
Now about your .dat file. When storing data for large websites, it's typically best to use SQL for databases. This allows you to access data and store it in an organized manner.
As a final recommendation I'd take a look at using a CMS for your website. (Wordpress, WolfCMS, perch, etc...)
Hope this helps.

How can I get started on programmatically analyzing web site content?

I've been looking for a new hobby programming project, and I think it would be interesting to dabble in ways to programmatically gather information from websites and then analyze that data to do things like aggregate or filter it. For example, if I wanted to write an application that could take Craiglist listings and then do something like display only the ones matching a specific city not just a geographical area. That's just a simple example, but you could go as advanced and sophisticated as how Google analyzes a site's content to know how to rank it.
I know next to nothing about that subject and I think it would be fun to learn more about it, or hopefully do a very modest programming project in that topic. My problem is, I know so little that I don't even know how to find more information about the subject.
What are these types of programs called? What are some useful keywords to use when searching on Google? Where can I get some introductory reading material? Are there interesting papers I should read?
All I need is someone to disabuse me of my ignorance, so that I can do some research on my own.
cURL (http://en.wikipedia.org/wiki/CURL) is a good tool to fetch a website's contents and hand it off to a processor.
If you are proficient with a particular language, see if it supports cURL. If not, PHP (php.net) may be a good place to start.
When you have retrieved a website's content via cURL, you can use the language's text processing functionality to parse the data. You can use regular expressions (http://www.regular-expressions.info/) or functions such as PHP's strstr() to find and extract the particular data you seek.
Programs that "scan" other sites are usually called web crawlers or spiders.
I recently completed a project that uses Google Search Appliance that basically crawls the whole .com domain of the web server.
GSA is very powerful tool that pretty much indexes all the urls it encounters and serves the results.
http://code.google.com/apis/searchappliance/documentation/60/xml_reference.html

MySQL-based wiki that is suitable for custom applications?

I develop an online, Flash-based multiplayer game. It is a complex game, and requires a lot of documentation to fully explain it to our users. Ideally, I would like to find MySQL-based wiki software that can provide these editable documentation pages outside of Flash (in the HTML realm) but also within Flash for convenience, and so that players can refer to the information without interrupting their game or having to switch back-and-forth between browser tabs. I am expecting that I would need to do a lot of the work on the Flash side myself, as far as formatting, for example, but I would like to feel comfortable in querying the wiki's database to get info directly. I guess this means that I need a wiki that is structured relatively "flat" or intuitively so that I can do things like:
Run a MySQL query that returns a list of all the articles (their titles and IDs) in the wiki
For each article ID in the wiki, return the associated content
This may mean that I have to limit the kinds of formatting I put into the wiki -- things like tables would probably be omitted since they would be very difficult, if not impossible, for me to do on the Flash side. And that is fine!
Basically I am just looking for suggestions for wiki software that is pretty easy to use, but mostly is technically simple enough on the back-end that interfacing with it directly via MySQL is not difficult. When interfacing with the database directly, I only need to READ data. Any time the wiki would be edited or added to would be done via the wiki's actual front-end application.
Thanks for any suggestions!
MediaWiki is the best-known and best-supported MySQL-based Wiki, used for plenty of complex game documentation projects like MinecraftWiki. The database is not all that simple, but it's well documented and basic read operations aren't too hard. For example, here's how to fetch the current content of the page "MyPage":
SELECT old_text FROM page,revision,text WHERE page.page_title="MyPage" AND
page.page_id=revision.rev_page AND revision.rev_text_id=text.old_id;
(And yes, old_text is the current content of the page. Don't ask me why!)
Your main problem will be figuring out how to parse MediaWiki markup, there are plenty of parsers for it but I'm not aware of anything that would work in Flash.

Generate HTML and PDF

I'm thinking of the way I'd generate a university newspaper both in PDF and HTML (a website) where every news would contain picture(s) and wonder if there any tools to approach this problem declaratively so that unexperienced users would prepare structured data (text + pictures) and get PDF and website on output on their own with no programmers' intervention. I suspect it can be some sort of XSL-FO, XML editing/processing software.
P. S. A free tool(s) would be a best solution.
Thank you.
For this, a very good approach would be to use DocBook
to write your articles, than let the tools generate HTML and PDFs you need - with just some tuning of the look and feel output from your side.
For DocBook there are many available tools, but a very good one that is free for open source and academics is XMLMind
If your articles are more technical oriented, than DocBook is the quasi-standard (even many publishing houses like O'Reilly use it)
Of course, in the "pure" academics domain, LaTex is quite the standard (and allows to have output in allot of formats), but requires quite allot to learn it, and there are no true WYSIWYG tools to write the articles. If you intend to send the articles to some research papers too, than they are very glad to accept your LaTex input.
we (swansea University) used a content management system to achive this - DotNetNuke in our instance.
We wanted multipage newletter where a summary on the newletter and click more for the fuller article. The content management system allowed normal users to use the software to construct the newletter, they simple created a new child site every month.
We had the newletter emailed out, we we simply grabbed the html from the main page and sent to a distribution list.
Something worth considering - cost = £0
The obvious way of doing it would be to generate all your content in XML, then use either a commercial XSL-FO processor for PDF, or apache FOP to generate the PDF, and whatever your XSL processor is to generate the html.
Very similar to Simon Thompson's answer:
You can use Drupal and its print module.

Using Semantic MediaWiki for tabular data

Am I completely off-track to think about using Semantic MediaWiki to store (and organise, report on, etc.) 'tabular' data such as financial transactions or weather readings that would usually live in a spreadsheet or database?
It seems that one would need a separate, tiny, page for each tuple; but then, that's by design and perhaps it's perfectly okay.
I ask, simply because SMW seems like such a quick and easy way to get a collaborative data repository up and running.
Semantic MediaWiki is better suited for keeping track of Factual or Encyclopedic data, where you can have pages about everything you need to know about a certain topic.
For tabular or numerical data such as measurements, financial, sensor data, you would indeed need to create little pages about each data point, which is not practical in many cases.
However, there are extensions to Media Wiki that allow you to integrate external data sources (in MySQL databases or CSV files somewhere) with MediaWiki pages. This can allow you to have the best of both worlds - dynamic access and queries of tabular data and semantic annotations of pages around them.
Take a look at :
http://www.mediawiki.org/wiki/Extension:External_Data
No, I don't think it's such a bad idea.
Using SemanticForms you could enter lots of little data pages quickly and easily (for example, an invoice might require additional pages for each line item, but they could all be entered from one form using the 'multiple' feature of the for template form tag). So although I've never tried logging weather data in SMW, I think it would be pretty easy. I don't see what the problem would be with storing data across so many pages; it's easy enough to combine it in whatever formats you require.
Give it a go and let us know how it goes!
You can use either the Semantic Internal Objects extension (SIO), or SMW's built in subobjects (the former works well with the already mentioned External Data extension), to store multiple semantic objects (could be the rows of your spreadsheet) in one page.
However, unless you are really looking for a collaborative tool with semantic capabilities, I doubt SMW is the best suited piece of software for your task.
edit (november 2015): Since SMW version 1.9, there nothing that SIO can do that the built-in subobjects can't, so I would recommend the latter.