While making good ontology is for sure big and mostly manual effort, it could be interesting to know if there are any techniques / tools, which automate making vendor-specific, intermediate ontologies for ETL process, given, say, rich-enough JSON examples combined with API documentation.
I am aware of Linked Data standards and techniques, but may be there is something in the form of library to make a draft RDFS+ ontology from API call responses?
For example, there are some libraries to guess JSON Schema (I have even written a primitive one myself), so the task does not seem to have problems in theory.
Please, do not consider this question as "software recommendation" one, because I doubt there is any software for this, but at least educated guess at direction I can take with this. I also believe, this is very important for semantic / linked data projects, and I wonder why I can't find any hints and need to ask for help from more experienced here.
Related
I am writing a small game,and I now have 9 C# scripts that make it work. I have lost track of what exactly is happening and how. I want to know how things work from the moment the game starts. Whats happening and how, etc.
I am a beginner, and I have heard that writing down your program flow is called documenting it. How can I document? Do I have to write comments everywhere in my code to explain the flow of the program?
Putting extensive comments into your code is not a good approach. Basically you should try to make your code as self-explanatory as possible. You do this by carefully planning what belongs into a class or function and by using meaningful names for your classes, functions and variables. Comments are nothing but a last resort if additional explanation is really required.
In most cases you should also also have some documents in addition to the code that explain certain aspects of your software:
Requirements document - what is the purpose of the software, how is it used
Architecture and design specification - what are the modules and classes of the software and how do they interact. Often this document mainly consists of one or more diagrams (UML or something else).
Build manual - how to compile and link the software
Installation instructions
User manual
This list is neither complete nor is it mandatory. If, for example, the user interface of your software is simple and self-explaining, you probably won't need a user manual.
Sometimes diagrams make better documentation than text. There is a standard way of diagramming a control flow (whether it's of a program or a business process). They're called ... wait for it ... control-flow diagrams. But I don't think that's exactly what you're after.
There are also flow charts (often spelled as one word), which may be more suited to software than general control-flow diagrams. Flow charts can be useful for understanding an algorithm, but they generally don't give a good big-picture view.
With a complicated program, what might be more important to keep in mind is the data flow. For those we have ... can you guess? ... data-flow diagrams (DFDs).
DFDs can be drawn at varying levels of detail. You can have a high-level one that shows the major components of the system and how they fit together and low-level ones that show the nitty-gritty details for the portions of the system that require more detail.
DFDs can be used for a variety of analyses, including things like threat modeling. But I find them great for getting an overview of what's-what when I'm looking at a new project (or one I've forgotten about). You should be able to find some tutorials about DFDs online, and I think some drawing software (like Visio) have templates specifically for DFDs (and probably the other types of diagrams I've mentioned).
Some might consider DFDs a bit old-school and prefer more rigorous systems like UML (Unified Modeling Language), which is capable of expressing many more concepts and of having a very direct mapping between your "model" and your code. I've never learned enough UML to get much use out of it. The diagrams in many books on software patterns are expressed in UML.
Different kind of software offer different amount of configuration/customization. Routers are one of the most configurable software systems I know of. I want to know how routers handle configurations - how they alter the code flow based on the configuration?
One obvious way is to use if..else clauses provided by most of the language(let's assume we are using C)
So is there any other programming method(or paradigm?)
Data-driven programming paradigm may be viable one. Configuration can be thought of one of the input source and so can be used to alter the code flow.
What I need to know is, is there any papers and references that I can use to enrich my understanding. Not just routers any kind of software. If the question seems to vague, let me know I will add more details.
I don't know anything about configuration of routers, but your question states you are interested in configuration for any kind of software, so the following might be of interest to you.
I am the author of Config4*, which provides C++ and Java parsers for a particular configuration syntax. I suggest you do the following. Skim Chapters 2 and 3 of the "Config4* Getting Started Guide" (HTML, PDF) to get an overview of the configuration syntax and API. Then take your time reading the "Config4* Practical Usage Guide" (HTML, PDF), which discusses the "how to" for a variety of different ways to use configuration. Although the discussion in that manual makes use of the Config4* syntax and API, the principles could be used with another syntax, for example, XML. If you focus on the principles discussed in that manual, rather than the syntax, then I suspect you will start to develop some insight into how a router handles its configuration.
I am really finding it tough to figure out the insights about how does a social networking site (Facebook being a reference) manage their comments and notifications for its users.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that. An example scenario would be that a friend comments on my status and everyone that has liked my status including me gets a notification for that. Also each user has their own read/unread functionality implemented, So I guess there is a notification reference that is stored for each user. But then there would be a lot of redundancy of notification information. If we use a separate table/collection to store these with reference of actual notificatin, then that would create realtime scalability issues. So how would you decide which way to trade-off. My brain crashes when I think about all this. Too much stuff to figure with not a ot of help available over the web.
Now how would each notification be sent to the all the users who are supposed to receive that.. and how would the data structure look like.
I read a lot of implementations those suggest to use MySql. My understanding was that the kind of data (size) that is, it would be better to use a NoSql for scalability purpose.
So how does MySql work well for such use cases, and why is a NoSql like Mongo not suggested anywhere for such implementation, when these are meant to be heavily scalable.
Well, I know a lot of questions in one. But I am not looking for a complete answer here, insights on particular things would also be a great help for me to build my own application.
The question is extremely broad, but I'll try to answer it to the best of my ability.
How would they actually store the comments data? also how would a notification be stored and sent to all the users that.
I generally don't like answering questions like this because it appears as if you did very little research before coming to SO. It also seems like you're confused with application and database roles. I'll at least start you off with some material/ideas and let you decide on your own.
There is no "silver bullet" for a backend design, especially when it comes to databases. SQL databases are generally very good at most database functionality, and rightfully so; it's a technology that is very mature and has stood the test of time for a reason. Most NOSQL solutions are specialized for particular purposes. For instance: if you were logging a lot of information, you might want to look at Cassandra. If you were dealing with a lot of relational data, you would want to use something like Neo4j (or PostgreSQL/MySQL for RMDBS). If you were dealing with a lot of real-time data, you might want to look at Redis.
It's dumb to ask NOSQL vs SQL for a few reasons:
NOSQL is a bad term in general. And it doesn't mean "No SQL". It means "Not Only SQL". Unfortunately the term has encapsulated even the most polar opposite of databases.
Only you know your application's full functionality. Even if I knew the basics of what you wanted to achieve, I still couldn't give you a definitive answer. Nor can anyone else. It's highly subjective, and again, only YOU know EXACTLY what your application should do.
The biggest reason: It's 2014. Why one database? Ten years ago "DatabaseX vs DatabaseY" would have been a practical question. Now, you can configure many application frameworks to reliably use multiple databases in a matter of minutes. Moral of the story: Use each database for its specialized purpose. More on polyglot persistence here.
As far as Facebook goes: a five minute Google search reveals what backend technologies they've used in the past, and it's not that difficult to research some of their current backend solutions. You're not Facebook. You don't need to prepare for a billion users right now. Start with simple, proven technologies. This will let you naturally scale your application. When those technologies start to become a bottleneck, then be worried about scalability.
I hope this helped you with starting your coding journey, but please use Stack Overflow as a last resort if you're having trouble with code. Not an immediate go-to.
Standard way of working on new API (library, class, whatever) usually looks like this:
you think about what methods would API user need
you implement API that you suspect user will need
So basically you trying to guess what your API should look like. It very often leads to over engineering stuff, huge APIs that you think user will need and it is very possible that great part of your code won't be used at all.
Some time ago, maybe few years even, I read some article that promoted writing client code first. I don't remember where I found it but author pointed out several advantages like better understanding how API will be used, what it should provide and what is basically obsolete. I think idea was that it goes along with SCRUM methodology and user stories but on implementation level.
Just out of curiosity for my latest private project I started not with actual API (some kind of toolkit library) but with client code that would use this API. Of course my code is all in red because classes, methods and properties does not exist and I can forget about help from intellisense but what I noticed is that after few days of coding my application "has" all basic functionalities and my library API "is" a lot smaller than I imagined when starting a project.
I don't say that if somebody took my library and started using it it wouldn't lack some features but I think it helped me to realize that my idea of this API was somewhat flawed because I usually try to cover all bases and provide methods "just in case". And sometimes it bites me badly because I made some stupid mistake in basic functions being more focused on code that somebody maybe would need.
So what I would like to ask you do you ever tried this approach when needed to create a new API and did it helped you? Is it some recognized technique that has a name?
So basically you're trying to guess what your API should look like.
And that's the biggest problem with designing anything this way: there should be no (well, minimal) guesswork in software design. Designing an API based on assumptions rather than actual information is dangerous, for several reasons:
It's directly counter to the principle of YAGNI: in order to get anything done, you have to assume what the user is going to need, with no information to back up those assumptions.
When you're done, and you finally get around to using your API, you'll invariably find that it sucks to use (poor user experience), because you weren't thinking about how the library is used (UX), you were thinking about what the library must do (features).
An API, by definition, is an interface for users (i.e., developers). Designing as anything else just makes for a bad design, without fail.
Writing sample code is like designing a GUI before writing the backend: a Good Thing. It forces you to think about user experience and practical effects of design decisions without getting bogged down in useless theorising and assumption.
And contrary to Gabriel's answer, this is not bottom-up design: it's top-down. Rather than design the concrete backend of your library and then force an abstract interface on top of it, you first design the interface and then worry about the implementation.
Generally speaking, the idea of designing the concrete first and abstracting from that afterwards is called bottom-up design. Test Driven Development uses similar principle to what you describe to support better design. Firstly you write a test, which is an use of code you are going to write afterwards. It is important to proceed stepwise, because you have to proove the API is implementable. IMportant part of each part is refactoring - this allows you design more concise API and reuse parts of your code.
EDIT:
Wow, the initial response to this question was quite negative. I think I might have triggered some pretty strong emotions by using the word "best"; it seems like a few people latched onto that word and decided to dismiss my question right away.
Obviously, there are many, many situations in which no single approach is "best", or at least, what ends up being the best solution to one problem will often not be the best solution for other, even similar, problems. I get that. But now let me try to elaborate on the reasoning behind what I'm actually asking.
I tend to find it easiest to explain myself using analogies, so here goes. In my current job I work almost exclusively in .NET. .NET has a lot of functionality built into the framework. A prime example is the System.Collections.Generic namespace, which has a bunch of collection classes that (almost) no .NET developer in his/her right mind would bother re-developing from scratch, because very good implementations are already there. If I am working on a problem that requires a doubly linked list, I'm not going to decide, "Okay, time to write a doubly linked list class"; I'm just going to use the LinkedList<T> that's already there, or, at most, extend it or wrap it with my own class that adds some extra functionality.
Am I saying the "best" version of a doubly linked list is LinkedList<T> from .NET? Of course not. That would be absurd. But I highly doubt .NET's implementation of LinkedList<T> is drastically different from most other established libraries' implementations of collections that are intended to serve the same purpose (that of a doubly linked list). On the other hand, I am relatively confident that if I were to write my own implementation from scratch, there'd be a considerable number of issues with it, in terms of robustness, performance, flexibility, etc. for one simple reason: not that I'm stupid, or lazy, or don't care about good code--simply that I'm one person, and I'm not an expert on linked lists, and I haven't thought of everything that needs to be taken into consideration when designing one.
But I happen to be a developer who does take an interest in how things are implemented internally. And so it would be nice if I could check out a page where some variant of a well thought-out design for a linked list--or for any fairly established concept for which robust, efficient implementations have been written--were available to view. (By the way, yes I am aware that the source code for .NET's LinkedList<T> is available. I'm just using that as an example; really I am talking about all problems with solutions for which good, working implementations exist.)
Now, I talked about this being something that is open; let me elaborate on that. I am not talking about sites like SourceForge.net, or CodePlex, or Google Code. These are all sites for hosting projects, i.e., applications or libraries tailored for some specific industry or field or otherwise categorizable purpose. What I'm talking about is something like this:
http://en.wikibooks.org/wiki/Category:Algorithms_and_data_structures
Maybe I should have just provided that link in the first place, as it probably illustrates what I'm getting at better than anything I've written so far. But I think the main point that differentiates what I'm asking about from any other site I've seen is that I was specifically wondering if there could be some way to work on a new problem--so, something for which there aren't necessarily any well-known, established implementations, again as in my linked list example--collaboratively, in a wiki-esque fashion, but not tied to any specific open-source project.
So, as a conclusion of sorts, I was kind of envisioning a situation like the following: I find myself faced with a new problem. Maybe it isn't common enough to be something that is addressed in a framework like .NET. But it's common enough that some developers here and there are independently working on it. If a website exists like what I'm imagining, maybe at some point one of those developers working on the problem could post an idea on that website, and over time others might discover it and suggest improvements/modifications, and given enough time and participation, a pretty darn good implementation might result from all this collaboration. And from there, eventually, something like this implementation might be considered fairly "standard", just like a linked list implementation, or a quicksort implementation, or, I don't know, some well-known pseudo-random number generator.
Does this make any more sense to anyone now? I feel quite confident that what I'm talking about is not absurd, but hey, if that's what people think, then maybe it is.
Open source projects are very popular. Some of these are libraries suited for specific purposes, the best of which include some very well-written code.
However, if you're interested in contributing to an open source project, finding a project that is well-suited to your skills can be quite a task. At the same time, if you're interesting in using an open source project in your own work, finding a project that is well-suited to your needs can also be difficult, especially when, for example, open-source library X has a lot of functionality you could use, as does library Y, and these two libraries' capabilities overlap so that integrating both into your code could be messy.
We've all seen questions, here on Stack Overflow and elsewhere on the web, posted by one developer: "How would I implement this idea?" and answered by others, often accompanied by a plethora of example code. Sometimes these answers link to an open source project/library that provides functionality similar to what the poster is asking about.
My question is: are there any well-known websites or other sources that are open in nature and provide "best-known implementations" for common (or even not-so-common) programming problems, but not associated with any particular open source project?
As a generic example, suppose I have a need for some algorithm that does X. I post a question on SO or some other site requesting ideas, asking for suggestions on how best to implement it. One person points me to project P1, which contains some code that performs something very similar to this algorithm. Another person points me to project P2. Someone else writes some sample code and says, "maybe you could do it like this."
It seems to me, if there are all these different versions of this idea floating around out in the world, it would make sense for there to be a site, somewhat in the vein of Wikipedia, where a quasi-"official" implementation ("official" is not the right word; I'm just having trouble thinking of a better one right now) could be published and modified as improvements are developed/discovered.
I feel like I have stumbled across a few different sites like this in the past, but I'm interested to know if anyone else has found any resources like what I'm describing.
The very idea is absurd. It means that there's one, single opinion on "best-known implementations" with no changes based on other people having better ideas.
It implies a that best practices are static and can be accumulated into a single repository.
If they could be collected, then Google would have them and would simply charge for access.
Interestingly, they don't have all the best practices. Interestingly, they have to expend mountains of computing power looking for more information. Then people (like you) have to read and think and judge and decide.
The read-think-judge-decide is really hard to eliminate. Unless, of course, you want someone to think for you. In which case, there are many companies who have a single solution that requires less thinking. Call Microsoft or Oracle or IBM. They have solutions that are all in one place, unified best practices, no reading, no thinking, no judging, no deciding required.
Open -- by definition -- means it's impossible to have a single authoritative source.
Here is something, maybe not the best implementations. But a book called Design Patterns contains what is considered by many programmers some of the best patterns to follow!