Using Semantic MediaWiki for tabular data - mediawiki

Am I completely off-track to think about using Semantic MediaWiki to store (and organise, report on, etc.) 'tabular' data such as financial transactions or weather readings that would usually live in a spreadsheet or database?
It seems that one would need a separate, tiny, page for each tuple; but then, that's by design and perhaps it's perfectly okay.
I ask, simply because SMW seems like such a quick and easy way to get a collaborative data repository up and running.

Semantic MediaWiki is better suited for keeping track of Factual or Encyclopedic data, where you can have pages about everything you need to know about a certain topic.
For tabular or numerical data such as measurements, financial, sensor data, you would indeed need to create little pages about each data point, which is not practical in many cases.
However, there are extensions to Media Wiki that allow you to integrate external data sources (in MySQL databases or CSV files somewhere) with MediaWiki pages. This can allow you to have the best of both worlds - dynamic access and queries of tabular data and semantic annotations of pages around them.
Take a look at :
http://www.mediawiki.org/wiki/Extension:External_Data

No, I don't think it's such a bad idea.
Using SemanticForms you could enter lots of little data pages quickly and easily (for example, an invoice might require additional pages for each line item, but they could all be entered from one form using the 'multiple' feature of the for template form tag). So although I've never tried logging weather data in SMW, I think it would be pretty easy. I don't see what the problem would be with storing data across so many pages; it's easy enough to combine it in whatever formats you require.
Give it a go and let us know how it goes!

You can use either the Semantic Internal Objects extension (SIO), or SMW's built in subobjects (the former works well with the already mentioned External Data extension), to store multiple semantic objects (could be the rows of your spreadsheet) in one page.
However, unless you are really looking for a collaborative tool with semantic capabilities, I doubt SMW is the best suited piece of software for your task.
edit (november 2015): Since SMW version 1.9, there nothing that SIO can do that the built-in subobjects can't, so I would recommend the latter.

Related

How can I handle relational data in Meteor?

I am learning Meteor using the Discover Meteor book.
I come from a PHP and MySQL background, and the application I am thinking of doing as a side-project is a real-time Backgammon web application. While Meteor's reactivity is a very, very big plus, I am stumped on how I can handle relational data (e.g. games, users, tournaments, friends, teams, etc).
I have read a lot of answers (ranging from old to very old) on StackOverflow on how one can use MySQL with Meteor. My search has led me to numtel/meteor-mysql. However, when I look at the examples provided in that repository, it is nowhere as clean as Meteor's own implementation of MongoDB.
My options, as I understand them, are the following:
Use MongoDB, and rewrite a lot of the features present in RDBMS in Javascript
Use an RDBMS that is not as well-supported in Meteor as MongoDB
IMO, option two is much less work, and I think might lead to less problems in the future. Take the problem in the epilogue of Why You Should Never Use MongoDB, for example.
We could also model this data as a set of nested hashes. The set of information about a particular TV show is one big nested key/value data structure. Inside a TV show, there’s an array of seasons, each of which is also a hash. Within each season, an array of episodes, each of which is a hash, and so on. This is how MongoDB models the data. Each TV show is a document that contains all the information we need for one show.
But then, how would you query for the TV shows that someone has starred in?
Back to my original question: is there something I'm missing here? Handling relational data is something that a lot of applications will need to do, but I can't seem to find a clean solution
It will be much less work if you go with option 1 in my opinion.
It won't be difficult to learn to use MongoDB, and since MongoDB uses JSON objects and is supported natively by Meteor and all it's packages, it will be much less work.
I advise having a look at the aldeed packages: collection2 and simple-schema to structure your collections. I also advice using the collection-helpers package to help with joins.
If you have a posts collection with name, authorId and content fields, then to get the author of the post, you'd write Meteor.users.findOne(userId).
Hope that clears things up a bit and gets you on your way.

GEDCOM to HTML and RDF

I was wondering if anyone knew of an application that would take a GEDCOM genealogy file and convert it to HTML format for viewing and publishing on the web. I'd like to have separate html files for each individual and perhaps additional files for other content as well. I know there are some tools out there but I was wondering if anyone used any tools and could advise on this. I'm not sure what format to look for such applications. They could be Python or php files that one can edit, or even JavaScript (maybe) or just executable files.
The next issue might be appropriate for a topic in itself. Export of GEDCOM to RDF. My interest here would be to align the information with specific vocabularies, such as BIO or REL which both are extended from FOAF.
Thanks,
Bruce
Like Rob Kam said, Ged2Html was the most popular such program for a long time.
GRAMPS can also create static HTML sites and has the advantage of being free software and having a native XML format which you could easily modify to fit your needs.
Several years ago, I created a simple Java program to turn gedcom into xml. I then used xslt to generate html and rdf. The html I generate is pretty rudimentary, so it would probably be better to look elsewhere for that, but the rdf might be useful to you:
http://jay.askren.net/Projects/SemWeb/
There are a number of these. All listed at http://www.cyndislist.com/gedcom/gedcom-to-web-page-conversion/
Ged2html used to be the most popular and most versatile, but is now no longer being developed. It's an executable, with output customisable through its own scripting syntax.
Family Historian http://www.family-historian.co.uk will create exactly what you are looking for, eg one file per person using the built in Web Site creator. As will a couple of the other Major genealogy packages. I have not seen anything for the RDF part of your question.
I have since tried to produce a Genealogy application using Semantic MediaWiki - MediaWiki, the software behind Wikipedia, and Semantic MediaWiki includes various extensions related to the Semantic Web. I thought it is very easy to use with the forms and the ability to upload a GEDCOM but some feedback from people into genealogy said that it appeared too technical and didn't seem to offer anything new.
So, now the issue is whether to stay with MediaWiki and make it more user friendly or create an entirely new application that allows for adding and updating data in a triple store as well as displaying. I'm not sure how to generate a family tree graphical view of the data, like on sites like ancestry.com, where one can click on a box to see details about the person and update that info or one could click on a right or left arrow around a box to navigate the tree. The data comes from SPARQL queries sent to the data set/triple store both when displaying the initial view and when navigating the tree, where an Ajax call is needed to get more data.
Bruce

MySQL-based wiki that is suitable for custom applications?

I develop an online, Flash-based multiplayer game. It is a complex game, and requires a lot of documentation to fully explain it to our users. Ideally, I would like to find MySQL-based wiki software that can provide these editable documentation pages outside of Flash (in the HTML realm) but also within Flash for convenience, and so that players can refer to the information without interrupting their game or having to switch back-and-forth between browser tabs. I am expecting that I would need to do a lot of the work on the Flash side myself, as far as formatting, for example, but I would like to feel comfortable in querying the wiki's database to get info directly. I guess this means that I need a wiki that is structured relatively "flat" or intuitively so that I can do things like:
Run a MySQL query that returns a list of all the articles (their titles and IDs) in the wiki
For each article ID in the wiki, return the associated content
This may mean that I have to limit the kinds of formatting I put into the wiki -- things like tables would probably be omitted since they would be very difficult, if not impossible, for me to do on the Flash side. And that is fine!
Basically I am just looking for suggestions for wiki software that is pretty easy to use, but mostly is technically simple enough on the back-end that interfacing with it directly via MySQL is not difficult. When interfacing with the database directly, I only need to READ data. Any time the wiki would be edited or added to would be done via the wiki's actual front-end application.
Thanks for any suggestions!
MediaWiki is the best-known and best-supported MySQL-based Wiki, used for plenty of complex game documentation projects like MinecraftWiki. The database is not all that simple, but it's well documented and basic read operations aren't too hard. For example, here's how to fetch the current content of the page "MyPage":
SELECT old_text FROM page,revision,text WHERE page.page_title="MyPage" AND
page.page_id=revision.rev_page AND revision.rev_text_id=text.old_id;
(And yes, old_text is the current content of the page. Don't ask me why!)
Your main problem will be figuring out how to parse MediaWiki markup, there are plenty of parsers for it but I'm not aware of anything that would work in Flash.

What are the practical benefits of using microformats for every possible thing?

What practical benefits can my client get if I use microformats on his site for every possible thing?
How can I explain these benefits to a non-technical client?
Sometimes it seems like the practical benefits are hard to quantify.
Search engines already pick up and parse microformats (see e.g. https://support.google.com/webmasters/answer/99170). I believe hCard and hCalendar are fairly well supported--and if not, plenty of sites are using it, including places like MySpace.
It's the idea that adding CSS classes and specified IDs make your existing content easier to parse in a machine-readable manner.
hReview is starting to make some inroads, and hResume looks like it take off too.
I heavily use rel="nofollow" on uncontrolled links (3rd party sources) which is actually a microformat.
Check the microformats wiki for a decent starting point.
It just means your viewers can share a few generic "formats". You can generalize stylesheets, and parsing mechanisms. Rather than having a webpage consist of one "html document," you have a webpage that consist of "10 formatted micro-documents".
If you need a real world analog: think of it like attaching a formatted invoice, to a receipt, and a business card, rather than writing it all down on notebook paper with your left hand.
Overall the site becomes easier to digest for the rest of the internet. The data can be reused, combined, cross-referenced, and saved.
A simple example would be to have anywhere on the site a latitude and a longitude (geo). With Microformats, anybody that searches for that latitude and longitude can be easily referenced to their website, increasing traffic, awareness of that person / company, and allow users to easily save that information. (Although I've encountered little of this personally, this is more of 'the future' of things than it is current. But always good to stay up to date).
A second example would be a business card (hCard) where a browser can easily save and transfer it to an address book, so that just one visit to the site and the visitor has the information saved locally. Especially useful if they're getting hits from a cell phone.
I wouldn't recommend using microformats for "every possible thing". Use them for things where you get some benefit, in exchange for the effort of using them.
The main practical benefit I'm aware of is customised search engine results:
https://support.google.com/webmasters/answer/99170
Technically, Google now prefers this to be implemented using microdata (i.e. itemprop attributes) rather than microformats, but it's the same idea.
Having a micro-format can be better than no format since it lets you save every possible thing in the application.
A micro-format for every possible thing can be better than a standard format only because: it's quicker to create so it costs less and it take less space than some standard formats, like XML.
But all this depends on the context of the application and so you must explain it to the client in that context.
microformatting your content extends its reach in every, which way possible. using your sites structure as its "api" the possibilities are what you set your limits too

Best open source, extendable crawler to use for image crawling

We are in the starting phase of a project, and we are currently wondering
whether which crawler is the best choice for us.
Our project:
Basically, we're going to set up Hadoop and crawl the web for images.
We will then run our own indexing software on the images stored in HDFS
based on the Map/Reduce facility in Hadoop. We will not use other indexing
than our own.
Some particular questions:
Which crawler will handle crawling for images best?
Which crawler will best adapt to a distributed crawling system, in which we
use many servers conducting crawling together?
Right now these look like the 3 best options-
Nutch: Known to scale. Doesn't look like the best option because it seems that is it tied closely to their text searching software.
Heritrix: Also scales. This one currently looks like the best option.
Scrapy: Has not been used on a large scale (not sure though). I dont know if it has the basic stuff like URL canonicalization. I would like to use this one because it is a python framework (I like python more than java), but I don't know if they have implemented the advanced features of a web crawler.
Summary:
We need to get as many images as possible from the web. Which existing crawling framework is both scalable and efficient , but also the one which will be the easiest to modify to get only images?
Thanks!
http://lucene.apache.org/nutch/
I would think going with something with the broadest use and support (community support) would be the better approach.
Nutch may be a good option because you want to end up on HDFS. It may be useful to look into the HBase integration that are currently in the works (NUTCH-650).
You may be able to get the data you need by skipping the index step at the end and instead look at the segments themselves.
However for flexibility another option may be Droids: http://incubator.apache.org/droids/. It's still in the incubator phase at apache, but worth looking at.
You may get some ideas by looking at the SimpleRuntime example in the org.apache.droids.examples. Perhaps by replacing the Sysout handler with one that stores the images onto HDFS that may give you what you want.