Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking for a corpus of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).
Any ideas or suggestions?
Project Gutenberg has 32000 books available.
Edit:
As of now (17.06.16) there are 52,284 free ebooks to download as plain text file in UTF-8 in a wide variety of topics (From science to religion).
Also in formats EPUB, Kindle or html format.
Check here Project Gutenberg
Why not use a Wikipedia dump?
I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need a sample database dump for testing the performance of mysql fulltext search feature. I need around 1-10 million rows.
Would this be available anywhere? If not, what is the simplest way of generating this database.
I also tested various full text solutions. See Full Text Search Throwdown.
I used the StackOverflow data dump to test.
https://blog.stackoverflow.com/2014/01/stack-exchange-cc-data-now-hosted-by-the-internet-archive/
It's in XML format, but it wasn't too difficult to write a script to turn the XML into SQL to load it into a database for testing.
You can download some StackOverflow real data. Check the following link for more details.
You can check the following link for more details about how to restore it.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm trying to build an application that allows users to look up a specific university and see data about it (admission rate, SAT scores, size, etc.). However, I can't find an API/database that I can use as it doesn't seem like they have a REST API that's accessible via a GET URL request.I saw so many apps having all this information, but I can't find any relevant API.
Does anyone know a way I could access this information? Thanks!
Just figured it out. Turns out you can use the IPEDS database to get this info, but it doesn't offer an API to do it. Go to IPEDS and create a group of institutions you want (ex. all 4-year, degree granting universities in the United States), select what variables you want (address, admission rate, etc.), then finally export the data in CSV. If you want the data in a less terrible format, just convert it to JSON or whatever you'd like.
An extensive collection of data in json, csv, pdf, and other formats is at:
http://www.ed.gov/developer
One funny thing: if you take one of their download links:
https://inventory.data.gov/dataset/032e19b4-5a90-41dc-83ff-6e4cd234f565/resource/38625c3d-5388-4c16-a30f-d105432553a4/download/userssharedsdfpostscndryunivsrvy2010dirinfo.csv
and you change the .csv to .json, you still get a csv file.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I need XSD unloaded into CSV file to build a mapping doc. The CSV is a list of all tags defined in XSD, in format path,type,cardinality. Something like:
tag1/tag2/tag3,E,0..n
tag1/tag2/tag3#attr1,A,0..n
tag1/tag2/tag4,E,1..1
XSD may import schemas. Is there a tool to accomplish this task? Thanks.
I've posted a possible solution here. If it is something you're willing to try, then download the tool and the sample files; please follow the document for step by step guidance. If you run into any issues, send me an email using the contact info (support) on the web site.
The cardinality problem, again, is very tricky. The sample I've prepared for you (all the download links are in the document) is one of the test cases I was using, except that I had to come up with a specific template for your file layout. One issue that seems to be subject to debate is how to calculate the value for XPaths that, from an XML Schema perspective, traverse choice compositors. Another, not so controversial maybe, is how to calculate the cardinality for particles under repeating compositors, etc.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a open source project for dictionary for a language (probably you never heard of it) which has not been "digitized". The dictionary will be from one language to several others, and several others to THE language. Since the language has not been "digitized", I need following features along with searching a word:
1 - Add your own translation to existing words/phrases
2 - Add a new word/phrase and add translation
3 - Request a word/phrase to be translated
4 - Rate (like/dislike or rate within the range) the translation (depending on the rating "correctness" get points")
5 - Possibly relate words (especially nouns) with pictures
6 - Easier to implement mobile version of it
I guess it's more "collaboration site", than dictionary. So the project I'm looking for may not be called as "Dictionary".
I know it's possible to design and write from the scratch, but would be good to begin with something in hand, especially if you are just spending your time/effort for non-profit stuff.
I'm looking around for the project, but didn't find something useful. At the same time designing the architecture in my mind.
If you could share some open source projects, it would be really great.
Thanks.
I am unsure what exactly you need, but would Wiktionary be of any help? There are a lot of localized variations to support different languages and there will probably be a way to ask them to support your language of interest, if it is not already there.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm trying to find a free downloadable dictionary (or Corpus might be the better word) which I can import into MySQL. I need to words to have the type (noun, verb, adjective) associated with them. Any tips on where I can find one? I found one several years ago that worked nicely, but I no longer have it around.
Thanks!
Chris
Project Gutenberg has public domain books you can download.
This includes 'The Gutenberg Webster's Unabridged Dictionary', but nothing modern, and not in a format immediately suitable for import into a MySQL database.
Not without some work, anyway. What was the one you found "years ago" ?
Kevin's Word List Page includes a part of speech database.
Wiktionary
The 1913 edition of Webster's Dictionary, now in the public domain
http://www.desiquintans.com/nounlist looks pretty good... I searched 'nouns.txt' and it came right up.