How to analyze information from the comments of users on my site? - data-analysis

Can anybody suggest a way to process the information and analyze the data from the comments users post on a article in my website.
I exactly want to process the comments as follows:
Example: Like on a article on computerization may get the following comments:
I love computerization as it makes the work easier.
Computerization is spreading unemployment as 1 computer can work better than 4 people.
How I process this information -
: I take the comments and try to recognize some predefined[and extensible] keywords in it.

Assuming that you are trying to extract some useful information from the comments, you could apply some machine learning to the comments to classify or categorize the data contained within, the sentiments etc.
There are number of different types of learning you can do on the text, however I personally recommend using support vector machines or a naive bayes classifier to be able to categorize and analyze the comments. You could also possibly use clustering, but there needs to be an element of natural language processing in the solution you choose. There are number of different libraries that you can use to implement the code to use either, i.e. svmlight, javaml, etc. I have personally used javaml and it is a good library.

Related

How to create a people tracking with reidentification model?

I am currently working on a project where I want to build a model which can detect and track people with a unique ID. The main issue is when a person leaves the frame and comes back after some time. Currently, I am working with yolov4 and Deepsort to detect and track. But it is failing in this situation.
Please suggest some approach where we can do detection, reidentification and tracking of people or cars or any other object.
Thank you :)
Although YOLOv4 can detect people in an image/video stream, I think it might be too general in your case. When a person leaves the frame and comes back, ideally the model should remember seeing that person before.
One way to tackle this is to train on images of the people you want to detect.
E.g. in a system like yours, you could take multiple images of the people you want to track from different angles and label them using their unique identifiers. Afterwards you could train the model using this data (for your downstream task). This will ideally give more specific results for detecting and tracking people with their unique identifiers as opposed to the general people detection when using YOLOv4 as is..
That said, I understand that taking lots of images of people may not be practical in certain scenarios. In that case you may want to look at techniques that produce accurate results with minimal data such as domain adaptation (https://arxiv.org/abs/1812.11806). However in an application for tracking and detecting people, I'm assuming you want minimal misclassifications.. Hence you could say it's always a tradeoff.
You can find out more about dealing with lack of data in this article: (https://www.kdnuggets.com/2019/06/5-ways-lack-data-machine-learning.html)
However I think this is a better place to start for a re-identification model: (https://github.com/KaiyangZhou/deep-person-reid)
It has ample documentation to get you started..

How can I handle relational data in Meteor?

I am learning Meteor using the Discover Meteor book.
I come from a PHP and MySQL background, and the application I am thinking of doing as a side-project is a real-time Backgammon web application. While Meteor's reactivity is a very, very big plus, I am stumped on how I can handle relational data (e.g. games, users, tournaments, friends, teams, etc).
I have read a lot of answers (ranging from old to very old) on StackOverflow on how one can use MySQL with Meteor. My search has led me to numtel/meteor-mysql. However, when I look at the examples provided in that repository, it is nowhere as clean as Meteor's own implementation of MongoDB.
My options, as I understand them, are the following:
Use MongoDB, and rewrite a lot of the features present in RDBMS in Javascript
Use an RDBMS that is not as well-supported in Meteor as MongoDB
IMO, option two is much less work, and I think might lead to less problems in the future. Take the problem in the epilogue of Why You Should Never Use MongoDB, for example.
We could also model this data as a set of nested hashes. The set of information about a particular TV show is one big nested key/value data structure. Inside a TV show, there’s an array of seasons, each of which is also a hash. Within each season, an array of episodes, each of which is a hash, and so on. This is how MongoDB models the data. Each TV show is a document that contains all the information we need for one show.
But then, how would you query for the TV shows that someone has starred in?
Back to my original question: is there something I'm missing here? Handling relational data is something that a lot of applications will need to do, but I can't seem to find a clean solution
It will be much less work if you go with option 1 in my opinion.
It won't be difficult to learn to use MongoDB, and since MongoDB uses JSON objects and is supported natively by Meteor and all it's packages, it will be much less work.
I advise having a look at the aldeed packages: collection2 and simple-schema to structure your collections. I also advice using the collection-helpers package to help with joins.
If you have a posts collection with name, authorId and content fields, then to get the author of the post, you'd write Meteor.users.findOne(userId).
Hope that clears things up a bit and gets you on your way.

What are the practical benefits of using microformats for every possible thing?

What practical benefits can my client get if I use microformats on his site for every possible thing?
How can I explain these benefits to a non-technical client?
Sometimes it seems like the practical benefits are hard to quantify.
Search engines already pick up and parse microformats (see e.g. https://support.google.com/webmasters/answer/99170). I believe hCard and hCalendar are fairly well supported--and if not, plenty of sites are using it, including places like MySpace.
It's the idea that adding CSS classes and specified IDs make your existing content easier to parse in a machine-readable manner.
hReview is starting to make some inroads, and hResume looks like it take off too.
I heavily use rel="nofollow" on uncontrolled links (3rd party sources) which is actually a microformat.
Check the microformats wiki for a decent starting point.
It just means your viewers can share a few generic "formats". You can generalize stylesheets, and parsing mechanisms. Rather than having a webpage consist of one "html document," you have a webpage that consist of "10 formatted micro-documents".
If you need a real world analog: think of it like attaching a formatted invoice, to a receipt, and a business card, rather than writing it all down on notebook paper with your left hand.
Overall the site becomes easier to digest for the rest of the internet. The data can be reused, combined, cross-referenced, and saved.
A simple example would be to have anywhere on the site a latitude and a longitude (geo). With Microformats, anybody that searches for that latitude and longitude can be easily referenced to their website, increasing traffic, awareness of that person / company, and allow users to easily save that information. (Although I've encountered little of this personally, this is more of 'the future' of things than it is current. But always good to stay up to date).
A second example would be a business card (hCard) where a browser can easily save and transfer it to an address book, so that just one visit to the site and the visitor has the information saved locally. Especially useful if they're getting hits from a cell phone.
I wouldn't recommend using microformats for "every possible thing". Use them for things where you get some benefit, in exchange for the effort of using them.
The main practical benefit I'm aware of is customised search engine results:
https://support.google.com/webmasters/answer/99170
Technically, Google now prefers this to be implemented using microdata (i.e. itemprop attributes) rather than microformats, but it's the same idea.
Having a micro-format can be better than no format since it lets you save every possible thing in the application.
A micro-format for every possible thing can be better than a standard format only because: it's quicker to create so it costs less and it take less space than some standard formats, like XML.
But all this depends on the context of the application and so you must explain it to the client in that context.
microformatting your content extends its reach in every, which way possible. using your sites structure as its "api" the possibilities are what you set your limits too

How to partition a problem into smaller understandable portions?

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?
Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.
Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.
It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.
When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.
Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.

How else can one present an architecture document besides as a series of views?

Most, if not all architecture documents I've seen (and developed) have been presented as a series of views (Logical, Physical, Use-case etc). Is this the preferred layout? What other styles are there?
Since it's complex, it's hard to do otherwise.
I like to start with the one-paragraph summary of the overall requirements. If there isn't a one-paragraph summary, that's -- perhaps -- the most important thing to build.
Once the summary is out of the way, there's an overview of architectural features. And after that, no one will read a single word.
It isn't a novel. There's no story arc. No drama. No conflict. No characters. At least, I can't find a way to make an architecture readable.
The best you can hope for is a reference work with enough indexes, cross references, overviews and sidebars that people use it.
Indeed, it's the pull-outs that matter. The picture are all anyone will ever use. And those will get put into PPT's for presentation internally and externally.
So, don't waste a lot of time on writing. Invest time in overviews, summaries, feature lists and pictures people want to use every day.
This may be WAY off topic, but is there anyway to use Joel's ideas on making specifications 'fun' usable is this realm?