Is there a possibility to scrape an entire job offering database?

Is there a possibility to scrape an entire job offering database? - json

I am trying to automatically retrieve all active job listings from Deutsche Telekom from the following website: https://telekom.jobs/global-careers
At the moment, there are 1,813 job offerings.
There seems to be an API that returns JSON code (https://telekom.jobs/globaljobboard_api/v3/search/), however the results are limited to 10 (SearchResultCount).
Is there any possibility to parse some parameters to the API so it returns a JSON file with all 1,800 current job offerings?
Thanks
I have tried to add some parameters to the URL, however I was not succesfull with it
https://telekom.jobs/globaljobboard_api/v3/search/%7B%22JobadID%22:%22%22,%22LanguageCode%22:%222%22,%22SearchParameters%22:%7B%22FirstItem%22:1,%22CountItem%22:10000,%22Sort%22:[%7B%22Criterion%22:%22FavoriteJobIndicator%22,%22Direction%22:%22DESC%22%7D],%22MatchedObjectDescriptor%22:[%22ID%22,%22PositionTitle%22,%22ParentOrganization%22,%22ParentOrganizationName%22,%22PositionURI%22,%22PositionLocation.CountryName%22,%22PositionLocation.CountrySubDivisionName%22,%22PositionLocation.CityName%22,%22PositionLocation.Longitude%22,%22PositionLocation.Latitude%22,%22PositionBenefit.Code%22,%22PositionBenefit.Name%22,%22FavoriteJobIndicator%22,%22FavoriteJobIndicatorName%22]%7D,%22SearchCriteria%22:[%7B%22CriterionName%22:%22PositionLocation.Latitude%22,%22CriterionValue%22:[%2250.73743%22]%7D,%7B%22CriterionName%22:%22PositionLocation.Longitude%22,%22CriterionValue%22:[%227.098206800000071%22]%7D,%7B%22CriterionName%22:%22PositionLocation.Distance%22,%22CriterionValue%22:[%229.013064227023515%22]%7D,%7B%22CriterionName%22:%22PositionLocation.CountryCode%22,%22CriterionValue%22:[%22DE%22]%7D,%7B%22CriterionName%22:%22PositionLocation.AreaCode%22,%22CriterionValue%22:[%22DE%22]%7D]
My expected results are a JSON file of the whole database, not only the ten most recent entries.

You need to do a POST with the data {"SearchParameters":{"CountItem":1813}}.
Here an example using curl:
curl 'https://telekom.jobs/globaljobboard_api/v3/search/' -H 'Accept: application/json' --compressed -H 'Content-Type: application/json' --data '{"SearchParameters":{"CountItem":1813}}'
Notice this is particular to this API, this will not necessarily work for others.

Related

How do I save a historical (past) dataset in Foundry

I want to save a version of my dataset called (order_clean). But not the current version. The version I want to save is from the past. I understand this is a historical transaction that still exists within my retention policy window. How would I be able to do this?

This can be done via an API call and hitting a specific endpoint.
This answer assumes a working knowledge around authorization tokens and curl requests:
Obtain the desired dataset’s RID (Example in screenshot below), as well as the transaction ID of the transaction corresponding to the version of the dataset you want.
[
Create a new branch in your specific dataset by running a curl request populated as follows:
curl -X POST -H "Content-type: application/json" -H "Authorization: Bearer YOUR_AUTH_TOKEN" "STACK_URL/foundry-catalog/api/catalog/datasets/YOUR_DATASET_RID/branchesUnrestricted2/NEW_BRANCH_NAME" -d '{"parentRef": "YOUR_TRANSACTION_ID", "parentBranchId": "master"}'
Replace YOUR_AUTH_TOKEN, STACK_URL, YOUR_DATASET_RID, NEW_BRANCH_NAME and YOUR_TRANSACTION_ID with appropriate values. Short lived authentication token can be generated following these instructions.
Save the branch into a new dataset with a transform in a Code Repo, specifying the Input with the path to the dataset as well as the new branch’s name from #2.

Get JSON data from Parse.com database using HTTP request

I am new to Parse.com and have some data stored in my database class (musical_data).
Is it possible to send a get request in my browser and have it return JSON to me? I have tried to retrieve the data, but can't figure out how to display the database class data in JSON via my browser.
Thanks
Example, this CURL form works.. But I want to type it into the browser url bar.
curl -X GET \
-H "X-Parse-Application-Id: wfwefwefwef" \
-H "X-Parse-REST-API-Key: wefwefwe123" \
https://api.parse.com/1/classes/musical_data

This format should work in the browser:
https://APPID:javascript-key=JSKEY#api.parse.com/1/classes/musical_data
Replace APPID with your application id and replace JSKEY with your JavaScript key

trying to retrieve one object based on conditions values using Android REST API

I’m using Parse.com, and trying to retrieve one object based on conditions values using Android REST API
here is a snippet from the Parse documentation for the REST API
curl -X GET \
-H "X-Parse-Application-Id: MYAPPID" \
-H "X-Parse-REST-API-Key: MYRESTKEY" \
-G \
--data-urlencode 'where={"$relatedTo":{"object":{"__type":"Pointer","className":"Post","objectId":"8TOXdXf3tz"},"key":"likes"}}' \
https://api.parse.com/1/users
How can I achieve this in android?

It's curl based api, you have to explicitly pass all its parameter.
I recommending you to first test api using client application like Postman.
-X define term like GET,POST,PUT,DELETE
-H define header, which has key-value form data separated by ":" sign.
-G When used, all data to be used in an HTTP GET request instead of the POST request that otherwise would be used. The data will be appended to the URL with a '?' separator.
Pass all these parameters and test it, once it is working fine then implement to your application.

How to upload multiple documents with multiple JSON files to Cloudant DB via cURL?

Currently I am able to PUT a single json file to a document in Cloudant using this : curl -X PUT 'https://username.cloudant.com/dummydb/doc3' -H "Content-Type: application/json" -d #numbers.json.I have many JSON files to be uploaded as different documents in the same DB.How can it be done?

So you definitely want to use Cloudant's _bulk_docs API endpoint in this scenario. It's more efficient (and cost-effective) if you're doing a bunch of writes. You basically POST an array that contains all your JSON docs. Here's the documentation on it: https://docs.cloudant.com/document.html#bulk-operations
Going one step further, so long as you've structured your JSON file properly, you can just upload the file to _bulk_docs. In cURL, that would look something like this: curl -X POST -d #file.json <domain>/db/_bulk_docs ... (plus the content type and all that other verbose stuff).
One step up from that would be using the ccurl (CouchDB/Cloudant cURL) tool that wraps your cURL statements to Cloudant and makes them less verbose. See https://developer.ibm.com/clouddataservices/2015/10/19/command-line-tools-for-cloudant-and-couchdb/ from https://stackoverflow.com/users/4264864/glynn-bird for more.
Happy Couching!

You can create a for loop and create documents from each JSON file.
For example, in the command below I have 4 JSON files in my directory and I create 4 documents in my people database:
for file in *.json
> do
> curl -d #$file https://username:password#myinstance.cloudant.com/people/ -H "Content-Type:application/json"
> done
{"ok":true,"id":"763a28122dad6c96572e585d56c28ebd","rev":"1-08814eea6977b2e5f2afb9960d50862d"}
{"ok":true,"id":"763a28122dad6c96572e585d56c292da","rev":"1-5965ef49d3a7650c5d0013981c90c129"}
{"ok":true,"id":"763a28122dad6c96572e585d56c2b49c","rev":"1-fcb732999a4d99ab9dc5462593068bed"}
{"ok":true,"id":"e944282beaedf14418fb111b0ac1f537","rev":"1-b20bcc6cddcc8007ef1cfb8867c2de81"}

How to send headers from YQL in order to return JSON format when querying opendatabc API?

I'm wondering if there is a way to send headers from YQL (or the YQL console) like there is in cURL.
I would like to return JSON by specifying the header Accept: application/json.
I am able to return JSON in with cURL and the command line like this:
curl -H 'Accept: application/json' http://www.opendatabc.ca/data?=births
but I can't figure out how to set the header when sending YQL.

You can do this with YQL Open Data Tables.
Here is a simple demonstration.
You can find the gist for the example Open Data Table XML file here: https://gist.github.com/2042904 (Check out the documentation here.)
You will notice in my example XML that I'm using y.xmlToJson on the response object received from the get() request. This is because YQL converts JSON taken from web services into E4X. More about that in a question of mine.

You should use format parameter when querying YQL over API, instead of headers. Either by format=json or format=xml.
JSON example:
curl -G --data-urlencode 'q=SELECT * FROM html WHERE url = "example.com"' http://query.yahooapis.com/v1/public/yql?format=json
XML example:
curl -G --data-urlencode 'q=SELECT * FROM html WHERE url = "example.com"' http://query.yahooapis.com/v1/public/yql?format=xml

use "jq", if have not installed it yet, run this command first
sudo apt-get install jq
then you can curl your url like this
curl -H 'Accept: application/json' http://www.opendatabc.ca/data?=births | jq '.'

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Is there a possibility to scrape an entire job offering database? - json

Related

How do I save a historical (past) dataset in Foundry

Get JSON data from Parse.com database using HTTP request

trying to retrieve one object based on conditions values using Android REST API

How to upload multiple documents with multiple JSON files to Cloudant DB via cURL?

How to send headers from YQL in order to return JSON format when querying opendatabc API?

Categories

Resources