Anybody knows a good extendable open source web-crawler? [closed] - open-source

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
The crawler needs to have an extendable architecture to allow changing the internal process, like implementing new steps (pre-parser, parser, etc...)
I found the Heritrix Project (http://crawler.archive.org/).
But there are other nice projects like that?

Nutch is the best you can do when it comes to a free crawler. It is built off of the concept of Lucene (in an enterprise scaled manner) and is supported by the Hadoop back end using MapReduce (similar to Google) for large scale data querying. Great products! I am currently reading all about Hadoop in the new (not yet released) Hadoop in Action from manning. If you go this route I suggest getting onto their technical review team to get an early copy of this title!
These are all Java based. If you are a .net guy (like me!!) then you might be more interested in Lucene.NET, Nutch.NET, and Hadoop.NET which are all class by class and api by api ports to C#.

You May also want to try Scrapy http://scrapy.org/
It is really easy to specify and run your crawlers.

Abot is a good extensible web-crawler. Every part of the architecture is pluggable giving you complete control over its behavior. Its open source, free for commercial and personal use, written in C#.
https://github.com/sjdirect/abot

I've discovered recently one called - Nutch.

If you're not tied down to platform, I've had very good experiences with Nutch in the past.
It's written in Java and goes hand in hand with the Lucene indexer.

Related

How well does Sitecore 7 lend itself to presenting external JSON data? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm asking because I am not sure what kind of person we'll need to hire (ASP? Sitecore? Angular? JQuery) to implement the following for us:
Our school is looking to make data on courses (JSON format, about 600 courses) available as an “online catalog.” The static info (programs information, resources, etc.) will be hosted in Sitecore 7.
We’d like to see the online course catalog closely integrated with the rest of the site, so we’re looking for best approaches on how to do that.
Some manipulation of the JSON data is required: course detail pages should be simple enough, but we’ll also need to have course listings (not necessarily displaying all 600 courses at once, in one long list, but segmented by programs, class formats & locations, etc) as well as a “course search” functionality.
Would Sitecore do that well enough out-of-the-box, or would it be better/easier to go with something like Angular JS on top of Sitecore?
Please ask me for additional info if I had left something important out or if anything is unclear.
I agree with Dijkgraaf comment but to provide you with answer; Sitecore is suitable for your requirements but is a framework which means out of the box it won't meet your requirement so you will need a Developer who knows Sitecore and by extension .NET (Sitecore is built on .NET).
These developers will also know how to work with JSON, most likely serving it up from Sitecore via a .NET technology called Web API. The JSON can then be manipulated with Javascript or AngularJS. It is not as common for Sitecore developers to be familiar with AngularJS however.

Why is Mongodb prefered over MySql for NodeJs development? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Please read before you answer: I don't need any opinion-based answers or "nosql vs sql which is better" debates on the subject, just facts.
I want to slowly convert a php+mysql website I wrote with Symfony2 into a real time application using backbone with nodeJs + websockets.
I want to make a slow transition by changing single features, since I don't want to break a fully functional site.
I have been educating myself about NodeJs by reading books and watching Tutorials and there is one thing I noticed, I own more than 5 nodeJs books and none of them use MySql although its fully supported by node.
They all use MongoDB.
Here is my situation
1. My Website is already integrated into MySql(Doctrine)
2. My MySQL setup is fully functional and needs no improvements so far
I'm really frustated and I have a few questions:
Why is MySql not prefered although its a more mature piece of technology?
What are the advantages of moving to MongoDB over MySQL for the purpose of having a real-time application??
I've seen people choose Node/Mongo development because of the simplicity of the all-JavaScript stack, I've seen people choose Mongo because it's the New Hotness, and I've seen people choose Mongo because it's actually the right tool for the job: they have a large amount of document-like, unstructured data and/or they want to take advantage of Mongo's support for horizontal scaling, among other differences between MySQL and Mongo.
I'm not sure it's possible to answer this question in a non-opinion-based manner and without touching on Sql vs NoSql. Mongo is simply a tool, and it happens to be free and commonly used in the field with Node. If I were writing a Node tutorial, I'd probably choose Mongo too, because it's common and it's cool.
If Mongo is the right tool for your site's use cases, then the transition is probably worth it. If MySQL is the right tool for your site, then congratulations! You've just saved a bunch of time rewriting your DB in Mongo.
As an aside- if your question uses the word "preferred", I can't really think of a way for it to not be opinion-based, by definition.

A framework to create web forms with little code and can send form data to a database, email address, or other apis [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know of a framework to create web forms that requires little code and can send form data to a database, email address, or other api such as Microsoft CRM? I am looking for a framework where I can define the form and fields in a database which then creates the form and web endpoint automatically. Ideally I would like the form data to be stored in a normalized database. A .net solution would be the best.
It sounds like you're looking for an off-the-shelf solution, of which there are many. #David indicated in his comment that Microsoft LightSwitch and InfoPath are viable options, to which I'd agree. These offer the path of least resistance -- minimal hassle for setup, and forms can more or less be created without any coding required by the user.
If you're feeling adventurous and want to use a more capable framework, Microsoft ASP.NET Webforms is another technology that does exactly what you want really, really well. It is considerably more dynamic/open/complex than Lightswitch/Infopath, but on the whole is an excellent option for someone that just needs to get editable forms/grids up and running in the shortest amount of time possible.
ASP.NET's drag and drop components are easy to use and configure -- you can literally wire up forms/grids to a database without having to code anything at all. Another great thing about using Webforms is that you get the support of the entire .NET framework, which as hundreds(?) of components that you can take advantage of (grids, widgets, charts, etc. etc.).
.NET itself is a framework PHP has lot of options in way that you are searching :)

desktop app gui design - best tool [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
i'm working on a project - an office information management system,
the database is a MySQL database, and now i have to build the front end GUI.
already I drawed a model of a screen example with PP (from total of 15 in my system),
my question is how to build the GUI? which tool/language is simple and easy to learn?
i thought about C++ but i no experience with it...
the information have to be retrived from the DB, read, write, queries, and so on.
will be happy to read your thoughts
![Powerpoint initial model][1]
The implementation that should come to your mind is one in a programming language you know. You can program this in many languages:
Visual Basic. If you already know it, this can be the fastest. Start the IDE and put together a forms project for you DB app.
Java. Many people know Java and you can pick it up in a matter of days and you're likely to have use for it in several projects. A Swing or AWT project that you build in Eclipse or Netbeans with the DB driver for the DB connection will work.
Python is also a popular choice. You can use the library tkinter to make quick GUIs.
C/C++ will also work. But if you don't know C/C++ already you might want to build GUIs with higher abstraction.
A web application with CSS/JScript using some Javascript framework to do DB i/o. But from your question it definitely seems that you want a desktop app.
Use this project to learn a new language. You might not know Lua, Haskell, Clojure, Scala, Kotlin, Fantom, Erlang or some other tool that you don't know how to connect to MySQL with, then it'll be good practice to do so in a new language.
Any of the above will work and if I faced this project I would use tool of the above.

Open source web development framework [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am a C++ developer. I want to develop a website in which I plan to put
User accounts
User groups
Alerts based on user preferences
Can anybody suggest the best open source framework that I can use to create this website. I hope that the framework would provide basic underlying infrastructer like session management etc. In short what would you suggest to create such a website?
Thanks in advance.
I'm a mainly a PHP web developper, so I will talk about php framework.
I like expressionengine as it provide a fully functional backend with user/group management and is easily extendable (a lot of plugins exists). With it you juste have to program the public interface and you can use the backend to manage stuff.
If you want to do it your own way, you can try Symfony, code igniter (expression engine is build using code igniter) or Zend Framework. Both provide similar tools (MVC, DB Abstraction, etc.)
i'm mainly a java developer, so i'll recommend you a .. Python framework ;)
for your use case, i can highly recommend Django
it has a built-in auth system which consists of:
Users
Permissions
Groups
also important for your use case: you can easily extend the built-in User model to also include the user's preferences.
it's also very easy to get started.
some other nice features:
dynamic admin interface (~ scaffolding)
flexible templating system
rapid development
it comes with an object-relational mapper in which you describe your database layout in Python code (no need to write SQL yourself)
MVC-like
open source of course
I'm fond of Catalyst, it has an excellent plugin system which includes things like Catalyst::ActionRole::ACL (which should cover your user groups requirement)
Though the framework is not actually open source, I would suggest working with the .Net framework. You don't have to shell out for Microsoft's IDE - check out Sharp Develop for instance...