What are some good data cleanup tools? [closed] - csv

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am parsing large amounts of complex files (mostly CSV files but some are not) and I need to structure/parse them into some standard formats. This not only involves row wise cleanup of data but some simple individual cell-based logic. I want a tool that a non-programmer can use also so a business team member can write simple drag and drop logic and not take up engineering time. So far, I have looked at Google Refine and Data wrangler and the last one looks great. Are there any other such tools out there?

ETL tools are oriented more towards relational databases, but also have support for XML and CSV file input/output. Examples:
http://www.talendforge.org/
http://kettle.pentaho.com/
Could easily be too complicated for your requirements though. Also, see this similar question on SO (with additional links): What software is availible for data quality checking .

Related

Browser based SQL tool for end-user data manipulation? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for a browser based tool (or a rapid development environment which could allow us to build a tool) which would allow users to edit data in MySQL tables. We would like to allow users to insert/delete rows, edit cells. Usability features like column sorting, or limiting entry in cells to a list of choices shall be possible.
It would be awesome if the tool allows customization (via php, javacript, python etc), user permissions, db version control (or backups).
We are looking for this sort of tool as we lack good db programming expertise.
Edit: users will not be able to create/delete tables, but only enter/remove data.
phpMyAdmin is probably the best there is for MySQL. But that is designed for DB admins not for end users.

What are the performance considerations of storing files in MySQL? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm creating an application where users can edit their "files" for various purposes. Each user will have his / her own sandbox of files. The question is whether these files should actually exist on a drive or as long pieces of text in a MySQL DB?
Everytime I face this problem it turns for me that storing files in filesystem ( or S3 ) is better solution. But for example Sharepoint stores all files in DB, so it depends on your project. You could also take a look at MongoDB, but I haven't tried it yet.
Okay, based on my research, here's what I found...
Based on these two articles mainly (and other research):
http://sietch.net/ViewNewsItem.aspx?NewsItemID=124
http://blog.druva.com/2009/01/25/file-systems-vs-databases/
I think a DB would be better than a file system. The DB is optimized for fast reads and writes and is relational so lookups are QUICK. Space is cheap, so it growing fast isn't a HUGE concern.

Open source libraries for generating automated summaries [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was looking for a open source library for generating automated summaries out of few words. For ex: if two qualities are given of a person a) good thinking skills b) bad handwriting, i need to generate a sentence like "Bob has good thinking skills however needs to improve on his handwriting". I need to know if any open source library could help me achieve it even partially.
Thanks for help!
-- Mohit
You could start with MEAD. Not sure what sort of mileage you'll get with single-sentence summarization, but you may be able to do some post-processing on the output and manage it.
It would take a bit of work, but you could also construct something out of NLTK and one or more the associated databases (eg. WordNet). Python, open source.

Do you know a free test data generator for mysql database? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Do you know a free test data generator for mysql database?
Maybe native tools allow you to generate test data?
You could try GenerateData.com. It lets you quickly generate large volumes of custom data (Name, Adress, Phone number, random number, random text...) in a variety of formats (csv, xml, excel, mysql, oracle).
Edit: If you don't want to install it, there is an online generator that allows you to generate data up to 5000 rows.
I've come across this one generator.
May be useful:
http://sourceforge.net/projects/spawner/

Open Source Data Mining Software [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was wondering; what is the best open source software that I can use for non-binary association rule generations. I need a non-binary implementation because converting my currently non-binary data to binary data would not give the desired results.
Thanks and can't wait to here your comments!
Also take a look at Weka
Check out:
RapidMiner
and
R with Rattle
Try the Orange data mining toolkit.
http://www.ailab.si/orange/
Try Data Mining SDK.
These days I like Knime. See http://knime.org.
you could even try another one called Tanagra http://eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html
Its mainly for research purpose but works well and has good tutorials here
http://data-mining-tutorials.blogspot.com
I have an open-source software named SPMF with more than 130 algorithms related to association rules mining, frequent itemset mining, sequential rule mining and sequential pattern mining. You can check my webpage for more details and to download it:
It is Java source code. It has a simple graphical user interface. It also has many specialized algorithms that you will not find in other data mining software.