Show the dominant topic for each document under the document influence model - lda

I am working on DIM (Document Influence Model), and wonder how to show the dominant topic id for each document? I see no off-the-peg API for it. Can anyone help?

Related

Using JSON-LD for on-site reviews

I read the article The Complete Guide to Creating On-Site Reviews + Testimonials Pages. I would like to create my own solution on our website to collect reviews on our website that Google can find. I'm not 100% sure if I understand this correctly.
So I would create a form with appropriate inputs and take that user input and create a JSON-LD object in a <script> tag and place that in the head of our /reviews/ page. So each review listed on our /reviews/ page would be in an array of JSON-LD objects, and that's how Google can find it?
Is it as simple as that? Placing the JSON-LD in the <head> with the correct data?
This site was used as an example on the article I linked. They use a third-party service that is basically doing what I am going to set out to do. I don't see the data in the head when viewing source, but I guess it's a good practice to hide the JSON-LD somewhere? I see a JSON-LD script, but it's empty.
Can someone help me understand this better?
The idea is to provide machine-readable structured data about the reviews, using the vocabulary Schema.org. Three syntaxes are supported: JSON-LD, Microdata, RDFa.
See a comparison. With Microdata and RDFa, you would add HTML attributes to the existing markup for the reviews. With JSON-LD, you would add the structured data in a separate script element and leave the review markup untouched.
This script element can be in the head or in the body. By default, it’s visually hidden no matter where it’s placed.
If you provide such structured data, consumers (like Google Search) may make use of it. For example, Google Search offers the Review rich result feature. Their documentation describes which Schema.org types/properties are needed to qualify for it.

Detecting the changing areas in a web page

I'm trying to write a crawler that gets raw html data and finds Title, price, update date, photo etc... fields and writes it to database. This is an classic and old way to crawl data.
I think that I can do this job wit an other way.
If I crawl all pages (may be more than 1000) in the web site, and compare them all I can find the specific areas.
I mean html tags will be always the same. Only specific areas will change like title, image etc...
So, what is the best way to determine changed areas?
compare them all I can find the spesific areas
what is the best way to determine changed areas?
In your question you set the scrapeing/crawling approach of comparing pages' parts and getting the data of specific areas. This smells with regex approach. Do not use it as the very non-efficient approach. Rather use xpath, operating on XML structures.
So, be simple:
Get html
Make it DOM
Make DOM a valid XML
Apply xPath queries to XML
Believe me, xml libraries are well able to handle huge structures (including idle html tags) and traverse over them. A classical example of using xpath is in this post of mine.
To determine data node paths you just use web inspector tools (F12 - in Chrome and IE and Ctrl+Shift+I in FF) to see the html tags containing useful info.

xbrl element not present in definition link base

While going through an XBRL instance file, I saw an element. But that element is not present in the corresponding 'extended link' in the definition link base in the taxonomy(though it is present in the xsd). There is another similarly-named element in the def linkbase in that extended link, but it's not used in the xbrl.
I usually compare the financial report and the definition linkbase (section by section) to identify the XBRL elements to be used. Am I following the right approach? How can I identify which XBRL elements to use for a particular section of a financial report?
There's nothing in the the spec that says what elements can and can't be used. The closest you'd get is a requires-element relationship that describes 'if x is present than y must also be present' semantics.
If an element exists in the discovered schemas, then it's valid in the instance.

User Interface Markup Language (UIML) render to html

I am currently investigating user interface (UI) generation through some meta-languages such as UIML. The language seems to be well standardized and it is one of the pioneers in that sense. (Here is a list of some other similar languages http://en.wikipedia.org/wiki/User_interface_markup_language).
There are several UIML implementations - one particular in .NET and one in Java that I am aware of (also they are publicly available for download). But I am looking for a way to bring UIML to a browser and possibly combine it with XForms (which would be a good complement to UIML). The questions is, has someone had some experience with something similar? Are you aware if such project exists?
Through some papers, I read that a company called Harmonia used to have a UIML to HTML renderer, but apparently not anymore. Besides the official website of UIML is down (www.uiml.org), and one can find information only on the OASIS committee's website (https://www.oasis-open.org/committees/download.php/28457/uiml-4.0-cd01.pdf).
Correct me if wrong, but I guess these are the only approaches:
XSL transformation on the UIML document to a XHTML (possibly with XForms). Although, I feel that this approach is not the sort of 'native' way to UIML (due to it's vocabulary abstraction).
Implementing my own UIML to HTML renderer in a language of choice with it's specific vocabulary and a 'transformation engine' to output a html file in the end.
Hopefully, someone did some work/research in that direction. Would really appreciate any comments/guides/advises/experiences/etc.! :)
Here are some options:
PyUIML, which is using the second approach listed in the question
KUIMLRenderer, which natively parses UIML for QtWebkit browsers such as QUPZilla
Android-UIML, a UIML renderer for Android
Harmonia LiquidWeb, which uses desktop UIML to render HTML
References
PyUIML
uiml.net
KUIMLRenderer
Android-UIML
Harmonia LiquidWeb
We have augmented UIML and developed a Java based renderer that has been in use for quite some time (6-7 years). In fact, we also thought of providing
some sort of "Forms" support have plans of doing it with by adding a special
class "Form" with semantics generally expected from Forms:
<part id = "Login Form" class = "FORM" ...>
<part id = "User Id" class = "Text Field" ..>
....

Is a browser obliged to use a DOM to render an HTML page?

I was reading the page about the Document Object Model on Wikipedia.
One sentence caught my interest; it says:
A Web browser is not obliged to use DOM in order to render an HTML
document.
You can find the entire context on the page right here.
I don't understand that is there any other alternative to render an HTML document? What exactly does this sentence mean?
Strictly speaking IE (at least < IE9) does not use a DOM to render an HTML document. It uses its own internal object model (which is not always a pure tree structure).
The DOM is an API, and IE maps the API methods and properties onto actions on its internal model. Since the DOM assumes a tree structure, the mapping is not always perfect, which accounts for a number of oddities when accessing the document via the DOM in IE.
The primary job of a browser is to display HTML. Most browsers use a DOM; they parse the HTML, create a DOM structure from it (which can also be used in JavaScript) and render the page based on that DOM.
But if a browser chooses not to, it is free to do so. I wouldn't know why, and I certainly don't understand why this line is explicitly mentioned in the Wiki article..