Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am trying to write a webapp, where one of the functionality is to exchange messages. I am trying to understand how to store these messages. I do not want to store it in DB. If i have to store in file, then how do i separate between messages.
Any links to some document would be greatly appreciated. I tried googling a lot but could not get hold of any reference
You should think about storing the messages in XML format, and use your webapp to load and parse those XML files into the message objects. Why do you not want to store the messages in the database? There are serious drawbacks to storing in the file system rather then the database (or even system memory).
A file system is a database, just not a relational database.
It's often faster than a relation database, but it has significantly less flexibility for indexing on multiple fields.
Parsing XML is gonna suck whether the XML comes from a database or a file.
Instead, you should do page caching to the file system of HTML, or HTML fragments.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am trying to make a web page that gets information about books using HTML and to place the information about books into a database to use it. Any idea of how to take the information from the website open library and store it into a database?
here is the link to the API if needed:
https://openlibrary.org/developers/api
thanks in advance.
If postgreSQL and python is a viable option, LibrariesHacked has a ready-made solution on GitHub for importing and searching Open Library data.
GitHub: LibrariesHacked / openlibrary-search
Using a postgreSQL database it should be possible to import the data directly into tables and then do complex searches with SQL.
Unfortunately the downloads provided are a bit messy. The open library file always errors as the number of columns provided seem to vary. Cleaning it up is difficult as just the text file for editions is 25GB.
That means another python script to clean up the data. The file openlibrary-data-process.py simply reads in the CSV (python is a little more forgiving about dodgy data) and writes it out again, but only if there are 5 columns.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have some XBRL files converted into pdf. Now I want to develop a project that would automatically extract all the data from these files. The project would be developed in JAVA. I am unable to get any lead. Any suggestions regarding how to start the project would be very much appreciated as there is very limited information over the internet regarding this.
I would recommend trying to get the original XBRL (or iXBRL) files rather than use the generated PDFs.
XBRL was designed in the first place in order to be easily machine readable and in order to avoid having to reverse engineer printed documents or PDFs. Attempting to read PDFs means not leveraging the potential of XBRL and may lead to imprecisions and errors.
Then, if you can get these source files, I recommend using an XBRL processor that will take care of all the complexity for you. This will save a lot of time compared to use a raw XML processor. It is likely that there are XBRL libraries written for Java.
I am sorry not to be able to give you a better answer, but I hope this helps you get started.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm building an application with the instant messaging functionality.
The application will allow the users to send files/images as well as normal text messages.
I decided to take the approach with storing the files on the filesystem and write only the file paths to the database. There will be no updates to the files (only insertions and deletions).
Which database would be the best for storing a large amount of file paths, that would be easy to query for a certain user files?
I would go with MongoDB. My experience is that a document based approach using a single Messages collection would be best. Each message document then contains all of the file paths. This eliminates joins and better supports potential future functional requirements changes.
MongoDB also provides great ways to deal with old messages such as TTL indexes.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have CSV File's With Different Columns With Few Common Columns, We Are Currently Using Excel To Remove Unwanted Rows Clean The Data, and Generate Reports, I Am Thinking Of Using Elasticsearch As A Solution For Data Storage, Transformation, Load And Reporting.
Is Elasticsearch A Good Choice For This Use Case ?
Elastic Search is, as the name indicated, using to quick search. It is build upon Lucene and similar to another Apache project, Solr...
If you want to query the raw data or do some simple aggregation upon it. It is fine and you can also use Kibana to come up with some fancy GUI so your audience can interact with the data and you can even come up with some dashboard to demonstrate some basic staff. However, it is not a replacement of a data base.
If you want to update or join.. you had better use some data base ... sql + mongo or hive for big data.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am using an email service API (can use either JSON or PHP) to send emails, and I want to store the replies into either MySQL or Mongo.
What do you think would be better for large amounts of email message storage, MySQL or Mongo?
It sort of depends on what you are doing, and what kind of metadata you want to store.
I have used both, but I have recently preferred to use MongoDB just because the way you store data (document-centric) is more conducive to the type of applications I work with than relational databases.
If what you want to store is a list of emails, and the replies to that email, then MongoDB might be a good way to go, since you could create a document structure like this:
{
'sender':'me#me.com,
'subject':'Hello world!',
'to':['tom#me.com','dick#me.com','harry#me.com'],
'body':'Hello, World, and stuff.',
'replies':[
{
'from':'tom#me.com',
'subject':'re: Hello World!',
'body':'Please stop sending me SPAM'
},
{ ... next reply...}
]
}
This makes it very nice to be able to query for specific messages and responses. Now, if you need to be able to query by individual users, the email addresses used, etc, and your primary use-case is going to be random queries from different directions, then MySQL might be better, since MongoDB doesn't really have support for joins.