I would like to retrieve metadata beyond the technical metadata associated with an image (thus not the EXIF data, size, versions, etc) by which I mean the "non-technical" metadata e.g. image title (not filename), description, artists/creator (not uploader), creation date (of work, not file), etc.
For example the items listed under "summary" in this File page.
Is that information accessible via the API or is it considered human created page content?
This information is available on wikis using the CommonsMetadata extension if users took care to start using it. Commons should have this information for most images, see the extension page for API examples.
Related
I wish to determine whether a given Wikipedia page belongs to a certain Wikipedia Portal using the MediaWiki API. So far, I have been experimenting with the page properties of the API but I cannot seem to find a way to derive what Portal a given page belongs to.
As an example, on the Wikipedia page for Cake in the very bottom of the page, I can press Show on the section Cakes, and a bunch of links to different cake pages show up. There I can also see that all of these belong to the Food portal. It is that information that I would wish to extract from a given page using the MediaWiki API.
As far as I know, there is actually no formal definition of "belongings to a portal" in Wikipedia. Opposed to categories which are part of the MediaWiki software, portals are custom pages for Wikipedia that are aimed to make it easier to explore a topic.
Instead of a formal definition though, you can use an heuristic and determine the connection between the page and some portal based on one of them linking to the other. There are API endpoints for both:
(Note: 100 is the id of the 'Portal` namespace)
Which portal pages are linked from the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&titles=Cake%7CPizza&plnamespace=100
Which portal pages link to the page "Cake" or "Pizza"
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=linkshere&titles=Cake%7CPizza&lhnamespace=100
(though as you can see, many unrelated portals link to "Cake" and none link to "Pizza")
A combined query for both directions
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links%7Clinkshere&titles=Cake%7CPizza&plnamespace=100&lhnamespace=100
So trough some more investigation i found the answer:
I ended up using the Revisions property in the API. This allows me to to give a series of page titles that I want to investigate, and have the HTML of each page returned to me in json format. Then I can just search for lines containing Portal and figure out what portal (if any) the page belongs to.
If anyone are in a similar situation, here is an example query to the API:
https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Bread|Bubble_tea|Pizza&format=json&redirects&rvprop=content&rvslots=main
I have a webapp that let users place dots on sitemap and link them to images.
The web app uses Javascript, CSS, and HTML.
phase1
While the user is subscribed he uses a rich set of functionalities to:
add dots on the sitemap and link them to images
edit the dots: move, delete, link momultiple images etc ..
etc..
This is done via the website that hosts the webapp.
phase2
When the user ends the subscription, he gets a .zip file with the information that he created (sitemap, images, links between the sitemap and the images, etc..).
The user can then connect to the website that hosts the webapp, without signing in and get a subset of the functionalities (e.g. he can only click on the dots and see the linked images, but he can no longer edit the dots or add images).
I want to change phase2.
Instead of interacting with the webapp on the website, I want to "freeze" the webapp into a interactive-pdf, or h5p page that can be played independently without the webapp.
There are multiple reasons that motivate to do this:
the webapp is complex, so engaging with the webapp is prone to more errors.
If the small subset functionality of the final data, which boils down to showing the image when clicking on the hyperlink, can be done via h5p browsing, then the risks for runtime errors are greatly reduced.
the interactive-pdf or .h5p file can be browsed by variety of tools potentially even when being offline.
the end product can be re-designed to appear more simple.
My questions:
is it possible to programatically convert the Javascript, CSS, and HTML content into a interactive-pdf or .h5p page?
Every end-product will be different (e.g. by the number of dots, and their location in the sitemap) so having to manually create the .h5p page every time is not practical.
are there mobile apps (e.g. on Apple Store, or Google Play) that can read .h5p content locally, e.g. when the device is offline?
Thanks
EDIT:
Oliver Tacke, thank you for replying.
Up to few days ago, looking for a solution to my problem, I did not hear about h5p at all.
When looking into h5p, I see that
many comments rlated to h5p that is a bit old - from ~5/6 years ago.
h5p is frequently talked in context of education (e.g. Moodle)
when I filed the question I could not even find a tag for 'h5p'
I could not find forums for h5p in mainstream channels like Discourse or Slack
So I want to know if I'm in the right direction at all.
Is h5p a new thing that just takes time to pick up, or is it something that started a while ago and dwindlled down,
or maybe I'm wrong and it is currently more active than I think (I'm aware of h5p.org and I do see activity there).
Basically, I want to create interactive content that can work
ideally offline, or
online but with a mainstream browser/tool/website (i.e. without needing my special website)
In the design industry, I know there are interactive catalogues.
But I don't know if the user can download them and somehow (e.g. with an epub reader) read them.
Thanks
I don't know anything about creating PDFs programmatically, so I can only offer a partial answer for the H5P related part. Given the broad scope of your question, this may be acceptable as a comment.
H5P content follows a specification that is documented at https://h5p.org/documentation/developers/h5p-specification.
You would basically have to implement an H5P content type library (file) from the files that you are given by the service. I assume that the JavaScript and CSS files are always the same, then those could be reused directly (but potentially not legally). You would also have to add some more JavaScript that takes parameters and generates the HTML output that you get from the service. You would then have to model semantics.json to suit the parameters, and then you essentially have an H5P content type. You don't have to use the then available form based editor (which probably wouldn't make sense), but you could create the content.json file programmatically and put it into the H5P content file archive. To create that file programmatically, you'd have to create a converter that identities the parameters in the HTML file generated by that service and transform them into the H5P semantics/content format. Not sure if it made more sense to rather create an editor widget for H5P, so you wouldn't have to depend on the other service at all.
There are currently no known mobile apps that allow you to load and run H5P content. They are on the roadmap of the H5P core team, but I wouldn't expect them to work on those any time soon. There's the moodle app for the moodle LMS that allows to use H5P content offline, but it needs to be fetched from a moodle instance. There's Lumi that allows to run H5P content locally on Windows, MacOS and Linux, but not on Android or iOS. However, Lumi also allows to create single standalone HTML files from H5P content containing all the content and logic ready to play, so that would allow offline use on Android and iOS.
Okay that title might sound confusing but it's hard to describe in a title.
Basically, when you display an image online you typically reference it by URL as the image must be hosted somewhere. So for example this image url is:
https://www.gettyimages.ie/gi-resources/images/Homepage/Hero/UK/CMS_Creative_164657191_Kingfisher.jpg
So what I assume is happening is the link tells the webpage where to find the data and then the hosting server replies with the image.
My question is
If a web page can interpret a link with information such as
www.example.com/__?id=01&user=ExampleName&email=exampleEmail
and use that information (id, user, and email) to then generate an image could it return the data without actually hosting the image? As in it's just receiving a request and replying?
The goal of this is to have a page with an image that is a QR code generated by an external webapp.
Yes.
Implementation details would of course differ depending on the chosen server side language, but in the end you're going to send a response which would contain the generated image data (in your case the QR code) and the appropriate headers in order for the browser (or requester of the resource) to properly interpret the response.
Example of what you're asking for would be placeholder.com where it responds with an image depending on certain parameters you provide in the URL.
You can also check for example the requests being sent on https://www.qr-code-generator.com (and probably also other QR generator sites) and see that the codes there are generated similarly to your idea.
This question is perhaps too philosophical for SO but I wanted to give it a try.
We are using a tool called Google-Tag-Manager to track website events. GTM listens for clicks and you can filter based on e.g. element class or id.
Consider this scenario. We have several pdfs on particular page and wnat to know how popular they are. The pdfs can either be downloaded or viewed online. The pdfs each have a name. So I need to get the following data:
whether downloaded pdf or viewed online
the name of the pdf
For name we can presumably use name or title attribute. But whether viewed online or downloaded is more custom.
I am currently going to ask our developers to edit the html in such a way that I can configure GTM to listen for these clicks and pass the data back into our web reporting tool.
I can tell them to add a class to all of them e.g. class='pdf_clicks". What would a conventional means of adding the other two data points? A custom attribute? I know I can use title for name. What about whehter or not was viewed online or downloaded? Is there a generic meta data type attribute for this?
Is it possible to make JSON data readable by a Google spider?
Say for instance that I have a JSON feed that contains the data for an e-commerce site. This JSON data is used to populate a human-readable page in the users browser. (I.E. The translation from JSON data to human displayed page is done inside the users browser; not my choice, just what I've been given to work with, its an old legacy CGI application and not an actual server-side scripting language.)
My concern here is that, the google spiders will not be able to pickup/directly link to the item in question when a user clicks on it in google, being presented with an index page full of all the items, rather than being linked directly to the item they clicked on.
Is there anyway of "informing" the google spider in the JSON that what they should feed the user a different link?
While Google does crawl and index JavaScript in some circumstances, it's still best to serve "normal" (X)HTML content if at all possible. In this case, it would help to know the rest of the site's setup, in particular: is the JSON content just used to create a feed of links to the product pages (with static content) or are all product pages also generated by JSON feeds? If the feed is only used to point to the actual product pages (which are static) then one way to make the product pages discoverable could be to create a HTML sitemap page or some other alternate form of navigation. A XML Sitemap file can also help, but I would recommend not using it as the sole way of making the product pages discoverable.
If all of the content is only accessible through JSON feeds, then I think you will have to make some bigger changes if you want that content to be accessible through search results.
One way to handle it could also be to use the new JavaScript crawling/indexing proposal, which basically would result in a headless browser being set up between your site and Google: http://code.google.com/web/ajaxcrawling/ (whether setting this up or revamping the rest of the site is easier is hard to say :-))
You should make a wrapper page in server-side code around the JSON data, and respond to requests with either the wrapper or the regular version depending on the User-Agent.