How do people create unofficial api's (which provide json info) for big websites such as imdb? - json

Such as this api here: http://www.omdbapi.com/
Do they just parse the whole website's HTML and save fields on their own database?
What is a good design, programming-wise?
My simple java developer mentality says this:
1 - Use jsoup (or any other html parsing library) and save the data frequently.
2 - Create restful services that return json, such as "searchByMovieName()", "searchByActor"
3 - Make the services public
Is it simple as that?

It can be, yes.
You can also fetch the pages and scrape the data in real-time (as people call your API). It'll be a little slower but there's less overhead for you and you don't need to worry about stale data.

IMDB also offers files you can use directly: Alternative Interfaces

Related

is there a way to send data to prismic.io?

I need to push an object to prismic.io. For example, I want to send a simple object like a string to the CMS.
I don't see any hints in the documentation that can help me achieve this. I'm using angular 7.
I already made my custom type, I can get the data I created with the CMS with prismatic predicates but can't send any data.
(full disclosure I'm working at Prismic ;) ) So the answer is: you can import data into Prismic. But it's not through a write API.
We want to introduce this at some point, but we also believe we can bring better solutions then just a write API for specific cases (like importing legacy content, connecting Prismic with an external data base, or with a translation service, etc.)
So our plan is to solve the most common use-cases of a "write API" in a more adapted/cleaner way, and then provide a write API afterwards.
Two solutions were already implemented:
1/ If you're looking to migrate existing content from another system into Prismic, you should check out our Import/Export feature: https://intercom.help/prismicio/import-export
2/ If you have content that lives in another system or in a custom API, it is possible to integrate this with Prismic using our Integration Fields feature: https://intercom.help/prismicio/integration-fields
3/ If what you want to do is not covered by these two solutions please let me know, that would be super helpful!
no you can not do a write api. Its a read system, the only way to input is via their writing room or import (which is in writing room as well)

What is the difference between MessagePack, Protobuf and JSON ? Can anyone tell me which one to use when

I need to understand the difference between
- message pack
- protocol buffers
- JSON
Without having jumped in deeply into the matter I'd say the following:
All three are data formats that help you serialize information in a structured form so you can easily exchange it between software components (for example client and server).
While I'm not too familiar with the other two, JSON is currently a quasi-standard due to the fact that it is practically built into JavaScript - it's not a coincidence it is called JavaScript Object Notation. The other two seem to require additional libraries on both ends to create the required format.
So when to use which? Use JSON for REST services, for example if you want to publish your API or need different clients to access it. JSON seems to have the broadest acceptance.

Can CouchDB natively convert plain text to json format?

I'm aware that there are python and powershell methods to convert plain text files, csv's etc.... into json format for upload into NoSQL DBs such as CouchDB.
But according to the CouchDB definitive guide, it makes it seems like there is a native built in way of doing this kind of conversion, without the need for a 3rd party tool.
This older thread appears to hint at this:
Filter and update functions in CouchDB?
This part in particular:
There are other design document functions that are being introduced at the >time of this writing, including _update and _filter that we aren’t covering in >depth here. Filter functions are covered in Chapter 20, Change Notifications. >Imagine a web service that POSTs an XML blob at a URL of your choosing when >particular events occur. PayPal’s instant payment notification is one of >these. With an _update handler, you can POST these directly in CouchDB and it >can parse the XML into a JSON document and save it. The same goes for CSV, >multi-part form, or any other format.
But when I dig deeper I don't find anything concrete.
The supporting wiki link is not clear to me (a beginner with json/NoSQL/curl stuff: http://wiki.apache.org/couchdb/Document_Update_Handlers
Hopefully this is a simple yes/no. And any links to help on this that is better than the above link also appreciated, thank you!
CouchDB supports transforming the internal documents/views into many other formats through the use of show and list functions. It's not a "native" transformation, as you define the transformation yourself, it's not magical.
That being said, there is not a similar mechanism for the reverse (ie: converting some arbitrary format into JSON documents) but you're much better off scripting that with a full-featured language/script and using the bulk docs API to do your imports in batch.

What is the most efficient way to send OData payloads over the wire? "Dense JSON?"

I'm designing a distributed application that will consist of a variety of REST services. Lately I've been going back and forth about whether to implement my REST services using the ASP.NET MVC 4 Web API or OData. Web API seems like it will some day be what I need but right now it's only half baked. Specifically, it only has a partial implementation of OData-style URI querying and doesn't do hypermedia out-of-the-box.
So this forces me to take another long hard look at OData. I really like the URI querying capability and structural hypermedia for lazy loading; I think I will use these features a lot in my application. However, the Atom Pub specification appears to be grossly inefficient.
I recently read a post about an efficient format for OData which mentions "dense JSON" but such a thing does not appear to actually exist. Is this true? And even if there's no such thing as dense JSON, regular JSON is still much more efficient than Atom Pub, correct?
Is there any situation where I would want to use Atom Pub over JSON?
There should be very little difference between ATOM and JSON on the semantic level with OData. Also most OData servers (WCF Data Services for sure) support both, so it's a choice of the client which one to use. As the blog post from Pablo mentions, to get the best payload size you should enable HTTP compression. It works great on both ATOM and JSON.
Reading JSON tends to be faster (XML parsing is kind of expensive), but that's if you're concerned with CPU consumption on the client. If I remember correctly, last time I saw the numbers, the compressed payload size for ATOM and JSON is not that different.
ATOM PUB is usually easier to consume in client which has available good XML or ATOM libraries and not JSON. And vice versa. But other than that, there should not be much of a difference.

Why use XML(SOAP) when JSON so simple and easy to handle?

Receiving and sending data with JSON is done with simple HTTP requests. Whereas in SOAP, we need to take care of a lot of things. Parsing XML is also, sometimes, hard. Even Facebook uses JSON in Graph API. I still wonder why one should still use SOAP? Is there any reason or area where SOAP is still a better option? (Despite the data format)
Also, in simple client-server apps (like Mobile apps connected with a server), can SOAP give any advantage over JSON?
I will be very thankful if someone can enlist the major/prominent differences between JSON and SOAP considering the information I have provided(If there are any).
I found the following on advantages of SOAP:
There is one big reason everyone sticks with SOAP instead of using JSON. With every JSON setup, you're always coming up with your own data structure for each project. I don't mean how the data is encoded and passed, but how the data formatted format is defined, the data model.
SOAP has an industry-mature way of specifying that data will be in a certain format: e.g. "Cart is a collection of Products and each Product can have these attributes, etc." A well put together WSDL document really has this nailed. See W3C specification: Web Services Description Language
JSON has similar ways of specifying this data structure — a JavaScript class comes to mind as the most common way of doing this — but a JavaScript class isn't really a data structure used for this purpose in any kind of agnostic, well established, widely used way.
In short, SOAP has a way of specifying the data structure in a maturely formatted document (WSDL). JSON doesn't have a standard way of doing this.
If you are creating a client application and your server implementation is done with SOAP then you have to use SOAP in client side.
Also, see: Why use SOAP over JSON and custom data format in an “ENTERPRISE” application? [closed]
Nowadays SOAP is a complete overkill, IMHO. It was nice to use it, nice to learn it, and it is beautiful we can use JSON now.
The only difference between SOAP and REST services (no matter whether using JSON) is that SOAP WS always has it's own WSDL document that could be easily transformed into a self-descriptive documentation while within REST you have to write the documentation for yourself (at least to document the data structures). Here are my cons'&'pros for both:
REST
Pros
lightweight (in all means: no server- nor client-side extensions needed, no big chunks of XML are needed to be transfered here and there)
free choice of the data format - it's up on you to decide whether you can use plain TXT, JSON, XML, or even create you own format of data
most of the current data formats (and even if used XML) ensures that only the really required amount of data is transfered over HTTP while with SOAP for 5 bytes of data you need 1 kB of XML junk (exaggerated, ofc, but you got the point)
Cons
even there are tools that could generate the documentation from docblock comments there is need to write such comments in very descriptive way if one wants to achieve a good documentation as well
SOAP
Pros
has a WSDL that could be generated from even basic docblock comments (in many languages even without them) that works well as a documentation
even there are tools that could work with WSDL to give an enhanced try this request interface (while I do not know about any such tool for REST)
strict data structure
Cons
strict data structure
uses an XML (only!) for data transfers while each request contains a lot of junk and the response contains five times more junk of information
the need for external libraries (for client and/or server, though nowadays there are such libraries already a native part of many languages yet people always tend to use some third-party ones)
To conclude, I do not see a big reason to prefer SOAP over REST (and JSON). Both can do the same, there is a native support for JSON encoding and decoding in almost every popular web programming language and with JSON you have more freedom and the HTTP transfers are cleansed from lot of useless information junk. If I were to build any API now I would use REST with JSON.
I disagree a bit on the trend of JSON I see here. Although JSON is an order maginitude easier, I'd venture to say it's quite limited. For example, SOAP WS is not the last thing. Indeed, between soap client/server you now have enterprise services bus, authentification scheme based on crypto, user management, timestamping requests/replies, etc. For all of this, there're some huge software platforms that provide services around SOAP (well, "web services") and will inject stuff in your XML. So although JSON is probably enough for small projects and an order of magnitude easier there, I think it becomes quite limited if you have decoupled transmission control and content (ie. you develop the content stuff, the actual server, but all the transmission is managed by another team, the authentification by one more team, deployment by yet another team). I don't know if my experience at a big corp is relevant, but I'd say that JSON won't survive there. There are too many constraints on top of the basic need of data representation. So the problem is not JSON RPC itself, the problem is it misses the additional tools to manage the complexity that arises in complex applications (not to say that what you do is not complex, it's just that the software reflects the complexity of the company that produces it)
I think there is a lot of basic misinformation on this thread. SOAP, REST, XML, and JSON concepts seem to be mixed up in the responses.
Here is some clarification -
XML and JSON (an others) are encodings of information.
SOAP is a communications protocol
REST is an (Architecture) style
each is used for something different although you might use more than one of these things together.
Lets start with encoding data structures as XML vs JSON:
Everything JSON currently supports can be done in XML, but not the other way around. JSON will eventually adopt all the features that XML has, but its proponents haven't encountered all of the problems yet, once they get more experience things will be added on to close the gap. for example JSON didn't start out with Schemas and binary formats.
SOAP is a communication protocol for calling an operation. It runs on top of things like, HTTP, SMTP, etc. Aside from many other features, SOAP messages can span multiple "application" layer protocols. i.e. i can sent a SOAP message by HTTP to a service endpoint which then puts it on a message queue for another system. SOAP solves the problem of maintaining authentication, message authenticity, etc. as the requested moved between different parts of a distributed system.
JSON and other data formats canbe sent via SOAP. I work with some systems that sent binary fixed-width encoded objects via SOAP, its not a problem.
The analogy is that - if only the postman is allowed to send you a letter, then it is just HTTP, but if anyone can send you a letter, then you want SOAP. (i.e. message transport security vs message content security)
the 6 REST constraints are architectural style. Interestingly the first several years of REST the examples were in SOAP. (there is no such thing as REST or SOAP they are not opposites)
A "heavyweight bloated, etc.etc." SOA SOAP system might have monoliths with operations like GET, PUT, POST instances of a single entity. SOAP doesn't have those operations predefined, but that is typically how it is used.
Consider that if you built a "REST" service on HTTP alone with an SSL/TLS terminating proxy, then you may have violated the 4th constraint of REST.
So for your software development today, you wouldn't normally interact with any of these directly. Just as if you were written a graphics program you wouldn't directly work with HDMI vs. DisplayPort typically.
The question is do you understand architecturally what your system needs to do and configure it to use the mechanism that does that job. (for example, all the challenges of applying today's microservices to general systems are old problems previously solved by SOAP, CORBA and the old protocols)
I have spent several years writing SOAP web services (with JAX WS). They are not hard to write. And I love the idea of a single endpoint and single HTTP method (POST). For me, REST is too verbose.
But as a data container, JSON is simpler, smaller, more readable, more flexible, looks closer to programming languages.
So, I reinvented the wheel and created my own approach to writing backends for AJAX requests. In comparison:
REST:
get user: method GET https://example.com/users/{id}
update user: method POST https://example.com/users/ (JSON with User object in request body)
RPC:
get user: method GET https://example.com/getUser?id=1
update user: method POST https://example.com/updateUser (JSON with User object in the request body)
My way (the proposed name is JOH - JSON over HTTP):
get user: method POST https://example.com/ (JSON specifies both user ID and class/method responsible for handling request)
update user: method POST https://example.com/ (JSON specifies both user object and class/method responsible for handling request)