I got this Error when serving a model into databricks using MLflow,
Unrecognized content type parameters: format. IMPORTANT: The MLflow Model scoring protocol has changed in MLflow version 2.0. If
you are seeing this error, you are likely using an outdated scoring
request format. To resolve the error, either update your request
format or adjust your MLflow Model's requirements file to specify an
older version of MLflow (for example, change the 'mlflow' requirement
specifier to 'mlflow==1.30.0'). If you are making a request using the
MLflow client (e.g. via mlflow.pyfunc.spark_udf()), upgrade your
MLflow client to a version >= 2.0 in order to use the new request
format. For more information about the updated MLflow Model scoring
protocol in MLflow 2.0, see
https://mlflow.org/docs/latest/models.html#deploy-mlflow-models.
I'm looking after the right format to use on my Json input, as the format I am using looks like this example :
[
{
"input1":12,
"input2":290.0,
"input3":'red'
}
]
I don't really know if it's related to a version of my mlfow (currently I'm using mlflow==1.24.0), I can not update the version as I do not have some privileges.
I also have tried the solution suggested here and got :
TypeError:spark_udf() got an unexpected keyword argument 'env_manager'
I do not find any documentation so far to solve this issue.
Thank you for your help, in advance.
When you are logging the model, your MLflow version is 1.24, but when you serve it as an API in Databrick's there will be a new environment created for it. This new environment is installing a 2.0+ version of MLflow. As the error message suggests, you can either specify the MLflow version or update the request format.
If you are using Classic Model Serving, you should specify the version, if you are using Serverless Model Serving, you should update the request format. If you must use Classic Model Serving and do not want to upgrade, scroll to the bottom.
Specify the MLflow version
When logging the model, you can specify a new Conda environment or add additional pip requirements that are used when the model is being served.
pip
# log model with mlflow==1.* specified
mlflow.<flavor>.log_model(..., extra_pip_requirements=["mlflow==1.*"])
Conda
# get default conda env
conda_env = mlflow.<flavor>.get_default_conda_env()
print(conda_env)
# specify mlflow==1.*
conda_env = {
"channels": ["conda-forge"],
"dependencies": [
"python=3.9.5",
"pip<=21.2.4",
{"pip": ["mlflow==1.*", "cloudpickle==2.0.0"]},
],
"name": "mlflow-env",
}
# log model with new conda_env
mlflow.<flavor>.log_model(..., conda_env=conda_env)
Update the request
An alternative is to update the JSON request format, but this only will work if you are using Databrick's Serverless.
In the MLflow docs link at the end of the error message, you can see all the formats. From the data, you provided, I would suggest using dataframe_split or dataframe_records.
{
"dataframe_split": {
"columns": ["input1", "input2", "input3"],
"data": [[1, 2, "red"]]
}
}
{
"dataframe_records": [
{
"input1": 12,
"input2": 290,
"input3": "red"
}
]
}
Classic model serving with MLflow 2.0+
If you are using Classic Model Serving, don't want to specify the MLflow version and want to use the UI for inference, DO NOT log an input_example when you log the model. I know this does not follow "best practice" for MLflow, but because of some investigating, I believe there is an issue with Databricks when you do this.
When you log an input_example, MLFlow logs information about the example including type and pandas_orient. This information is used to generate the inference recipe. As you can see in the generated curl command, it sets format=pandas-records (the JSON is not generated). But this returns the Unrecognized content type... error.
curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json; format=pandas-records" \
-d '{
"dataframe_split": {
"columns": ["input1", "input2", "input3"],
"data": [[12, 290, 3]]
}
}' \
https://<url>/model/<model>/<version>/invocations
For me when I removed format=pandas-records entirely, then everything works as expected. Because of this, I believe if you log an example and use the UI then Databricks is adding this format to the request for you. Which results in an error even if you did everything correctly. While in serverless the generated curl does not include this parameter at all.
I'm building an OTA update for my custom Android 10 build as follows:
./build/make/tools/releasetools/ota_from_target_files \
--output_metadata_path metadata.txt \
target-files.zip \
ota.zip
The resulting ota.zip can be applied by extracting the payload.bin and payload_properties.txt according to the android documentation for update_engine_client.
update_engine_client --payload=file:///<wherever>/paypload.bin \
--update \
--headers=<Contents of payload_properties.txt>
This all works so I'm pretty sure from this result that I've created the OTA correctly, however, I'd like to be able to download the metadata and verify that the payload can be applied before having the client download the entire payload.
Looking at the update_engine_client --help options, it appears one can verify the metadata as follows:
update_engine_client --verify --metadata=<path to metadata.txt from above>
This is where I'm failing to achieve the desired result though. I get an error that says it failed to parse the payload header. It's failing with kDownloadInvalidMetadataMagicString which when I read the source appears to be the first 4 bytes of the metadata. Apparently the metadata.txt I created isn't right for the verification tool.
So I'm hoping someone can point me in the right direction to either generate the metadata correctly or tell me how to use the tool correctly.
Turns out the metadata generated by the ota tool is in human readable format. The verify method expects a binary file. That file is not part of the zip contents as a unique file. Instead, it's prepended to the payload.bin. So the first bytes of payload.bin are actually payload_metadata.bin, and those bytes will work correctly with the verify method of update_engine_client to determine if the payload is applicable.
I'm extracting the payload_metadata.bin in a makefile as follows:
$(DEST)/%.meta: $(DEST)/%.zip
unzip $< -d /tmp META-INF/com/android/metadata
python -c 'import re; meta=open("/tmp/META-INF/com/android/metadata").read(); \
m=re.match(".*payload_metadata.bin:([0-9]*):([0-9]*)", meta); \
s=int(m.groups()[0]); l=int(m.groups()[1]); \
z=open("$<","rb").read(); \
open("$#","wb").write(z[s:s+l])'
rm -rf /tmp/META-INF
I'm currently triggering my Jenkins builds through a GitHub webhook. How would I parse the JSON payload? If I try to parameterize my build and use the $payload variable, the GitHub webhook fails with the following error:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 400 This page expects a form submission</title>
</head>
<body><h2>HTTP ERROR 400</h2>
<p>Problem accessing /job/Jumph-CycleTest/build. Reason:
<pre> This page expects a form submission</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
</body>
</html>
How can I get my GitHub webhook to work with a parameterized Jenkins build, and how could I then parse the webhook payload to use certain lines, such as the username of the committer, as conditionals in the build?
There are a few tricks to get this to work, and I found the (now defunct) chloky.com blog post to be helpful for most of it. Since it sounds like you've gotten the webhook communicating with your Jenkins instance at least, I'll skip over those steps for now. But, if you want more detail, just scroll past the end of my answer to see the content I was able to salvage from chloky.com - I do not know the original author and the information might be out of date but I did find it helpful.
So to summarize, you can do the following to deal with the payload:
Set up a string parameter called "payload" in your Jenkins job. If you are planning on manually running the build, it might be a good idea to give it a default JSON document at some point but you don't need one right now. This parameter name appears to be case-sensitive (I'm using Linux so that's no surprise...)
Set up the webhook in github to use the buildWithParameters endpoint instead of the build endpoint, i.e.
http://<<yourserver>>/job/<<yourjob>>/buildWithParameters?token=<<yourtoken>>
Configure your webhook to use application/x-www-form-encoded instead of application/json. The former approach packs the JSON data in a form variable called "payload", which is presumably how Jenkins can assign it to an environment variable. The application/json approach just POSTs raw JSON which does not seem to be mappable to anything (I couldn't get it to work). You can see the difference by pointing your webhook to something like requestbin and inspecting the results.
At this point, you should get your $payload variable when you kick off the build. To parse the JSON, I highly recommend installing jq on your Jenkins server and try out some of the parsing syntax here. JQ is especially nice because it's cross-platform.
From here, just parse what you need from the JSON into other environment variables. Combined with conditional build steps, this could give you a lot of flexibility.
Hope this helps!
EDIT here's what I could grab from the original blog posts at http://chloky.com/tag/jenkins/, which has been dead for a while.
Hopefully this content is also useful for someone.
Post #1 - July 2012
Github provides a nice way to fire off notifications to a CI system like jenkins whenever a commit is made against a repository. This is really useful for kicking off build jobs in jenkins to test the commits that were just made on the repo. You simply need to go to the administration section of the repository, click on service hooks on the left, click ‘webhook URLs’ at the top of the list, and then enter the URL of the webhook that jenkins is expecting (look at this jenkins plugin for setting up jenkins to receive these hooks from github).
Recently though, I was looking for a way to make a webhook fire when a pull request is made against a repo, rather than when a commit is made to the repo. This is so that we could have jenkins run a bunch of tests on the pull request, before deciding whether to merge the pull request in – useful for when you have a lot of developers working on their own forks and regularly submitting pull requests to the main repo.
It turns out that this is not as obvious as one would hope, and requires a bit of messing about with the github API.
By default, when you configure a github webhook, it is configured to only fire when a commit is made against a repo. There is no easy way to see, or change, this in the github web interface when you set up the webhook. In order to manipulate the webhook in any way, you need to use the API.
To make changes on a repo via the github API, we need to authorize ourselves. We’re going to use curl, so if we wanted to we could pass our username and password each time, like this:
# curl https://api.github.com/users/mancdaz --user 'mancdaz'
Enter host password for user 'mancdaz':
Or, and this is a much better option if you want to script any of this stuff, we can grab an oauth token and use it in subsequent requests to save having to keep entering our password. This is what we’re going to do in our example. First we need to create an oauth authorization and grab the token:
curl https://api.github.com/authorizations --user "mancdaz" \
--data '{"scopes":["repo"]}' -X POST
You will be returned something like the following:
{
"app":{
"name":"GitHub API",
"url":"http://developer.github.com/v3/oauth/#oauth-authorizations-api"
},
"token":"b2067d190ab94698a592878075d59bb13e4f5e96",
"scopes":[
"repo"
],
"created_at":"2012-07-12T12:55:26Z",
"updated_at":"2012-07-12T12:55:26Z",
"note_url":null,
"note":null,
"id":498182,
"url":"https://api.github.com/authorizations/498182"
}
Now we can use this token in subsequent requests for manipulating our github account via the API. So let’s query our repo and find the webhook we set up in the web interface earlier:
# curl https://api.github.com/repos/mancdaz/mygithubrepo/hooks?access_token=b2067d190ab94698592878075d59bb13e4f5e96
[
{
"created_at": "2012-07-12T11:18:16Z",
"updated_at": "2012-07-12T11:18:16Z",
"events": [
"push"
],
"last_response": {
"status": "unused",
"message": null,
"code": null
},
"name": "web",
"config": {
"insecure_ssl": "1",
"content_type": "form",
"url": "http://jenkins-server.chloky.com/post-hook"
},
"id": 341673,
"active": true,
"url": "https://api.github.com/repos/mancdaz/mygithubrepo/hooks/341673"
}
]
Note the important bit from that json output:
"events": [
"push"
]
This basically says that this webhook will only trigger when a commit (push) is made to the repo. The github API documentation describes numerous different event types that can be added to this list – for our purposes we want to add pull_request, and this is how we do it (note that we get the id of the webhook from the json output above. If you have multiple hooks defined, your output will contain all these hooks so be sure to get the right ID):
# curl https://api.github.com/repos/mancdaz/mygithubrepo/hooks/341673?access_token=b2067d190ab94698592878075d59bb13e4f5e96 -X PATCH --data '{"events": ["push", "pull_request"]}'
{
"created_at": "2012-07-12T11:18:16Z",
"updated_at": "2012-07-12T16:03:21Z",
"last_response": {
"status": "unused",
"message": null,
"code": null
},
"events": [
"push",
"pull_request"
],
"name": "web",
"config": {
"insecure_ssl": "1",
"content_type": "form",
"url": "http://jenkins-server.chloky.com/post-hook"
},
"id": 341673,
"active": true,
"url": "https://api.github.com/repos/mancdaz/mygithubrepo/hooks/341673"
}
See!
"events": [
"push",
"pull_request"
],
This webhook will now trigger whenever either a commit OR a pull request is made against our repo. Exactly what you do in your jenkins/with this webhook is up to you. We use it to kick off a bunch of integration tests in jenkins to test the proposed patch, and then actually merge and close (again using the API) the pull request automatically. Pretty sweet.
Post #2 - September 2012
In an earlier post, I talked about configuring the github webhook to fire on a pull request, rather than just a commit. As mentioned, there are many events that happen on a github repo, and as per the github documentation, a lot of these can be used to trigger the webhook.
Regardless of what event you decide to trigger on, when the webhook fires from github, it essentially makes a POST to the URL configured in the webhook, including a json payload in the body. The json payload contains various details about the event that caused the webhook to fire. An example payload that fired on a simple commit can be seen here:
payload
{
"after":"c04a2b2af96a5331bbee0f11fe12965902f5f571",
"before":"78d414a69db29cdd790659924eb9b27baac67f60",
"commits":[
{
"added":[
"afile"
],
"author":{
"email":"myemailaddress#mydomain.com",
"name":"Darren Birkett",
"username":"mancdaz"
},
"committer":{
"email":"myemailaddress#mydomain.com",
"name":"Darren Birkett",
"username":"mancdaz"
},
"distinct":true,
"id":"c04a2b2af96a5331bbee0f11fe12965902f5f571",
"message":"adding afile",
"modified":[
],
"removed":[
],
"timestamp":"2012-09-03T02:35:59-07:00",
"url":"https://github.com/mancdaz/mygithubrepo/commit/c04a2b2af96a5331bbee0f11fe12965902f5f571"
}
],
"compare":"https://github.com/mancdaz/mygithubrepo/compare/78d414a69db2...c04a2b2af96a",
"created":false,
"deleted":false,
"forced":false,
"head_commit":{
"added":[
"afile"
],
"author":{
"email":"myemailaddress#mydomain.com",
"name":"Darren Birkett",
"username":"mancdaz"
},
"committer":{
"email":"myemailaddress#mydomain.com",
"name":"Darren Birkett",
"username":"mancdaz"
},
"distinct":true,
"id":"c04a2b2af96a5331bbee0f11fe12965902f5f571",
"message":"adding afile",
"modified":[
],
"removed":[
],
"timestamp":"2012-09-03T02:35:59-07:00",
"url":"https://github.com/mancdaz/mygithubrepo/commit/c04a2b2af96a5331bbee0f11fe12965902f5f571"
},
"pusher":{
"email":"myemailaddress#mydomain.com",
"name":"mancdaz"
},
"ref":"refs/heads/master",
"repository":{
"created_at":"2012-07-12T04:17:51-07:00",
"description":"",
"fork":false,
"forks":1,
"has_downloads":true,
"has_issues":true,
"has_wiki":true,
"name":"mygithubrepo",
"open_issues":0,
"owner":{
"email":"myemailaddress#mydomain.com",
"name":"mancdaz"
},
"private":false,
"pushed_at":"2012-09-03T02:36:06-07:00",
"size":124,
"stargazers":1,
"url":"https://github.com/mancdaz/mygithubrepo",
"watchers":1
}
}
This entire payload gets passed in the POST requests as a single parameter, with the imaginative title payload. It contains a ton of information about the event that just happened, all or any of which can be used by jenkins when we build jobs after the trigger. In order to use this payload in Jenkins, we have a couple of options. I discuss one below.
Getting the $payload
In jenkins, when creating a new build job, we have the option of specifying the names of parameters that we expect to pass to the job in the POST that triggers the build. In this case, we would pass a single parameter payload, as seen here:
Passing parameters to a jenkins build job
Further down in the job configuration, we can specify that we would like to be able to trigger the build remotely (ie. that we want to allow github to trigger the build by posting to our URL with the payload):
Then, when we set up the webhook in our github repo (as described in the first post), we give it the URL that jenkins tells us to:
You can’t see it all in the screencap, but the URL I specified for the webhook was the one that jenkins told me to:
http://jenkins-server.chloky.com:8080/job/mytestbuild//buildWithParameters?token=asecuretoken
Now, when I built my new job in jenkins, for the purposes of this test I simply told it to echo out the contents of the ‘payload’ parameter (which is available in paramterized builds as a shell variable of the same name), using a simple script:
#!/bin/bash
echo "the build worked! The payload is $payload"
Now to test the whole thing we simply have to make a commit to our repo, and then pop over to jenkins to look at the job that was triggered:
mancdaz#chloky$ (git::master)$ touch myfile
mancdaz#chloky$ (git::master) git add myfile
mancdaz#chloky$ (git::master) git commit -m 'added my file'
[master 4810490] added my file
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 myfile
mancdaz#chloky$ (git::master) git push
Counting objects: 3, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 232 bytes, done.
Total 2 (delta 1), reused 0 (delta 0)
To git#github.com:mancdaz/mygithubrepo.git
c7ecafa..4810490 master -> master
And over in our jenkins server, we can look at the console output of the job that was triggered, and lo and behold there is our ‘payload’ contained in the $payload variable and available for us to consume:
So great, all the info about our github event is here. and fully available in our jenkins job! True enough, it’s in a big json blob, but with a bit of crafty bash you should be good to go.
Of course, this example used a simple commit to demonstrate the principles of getting at the payload inside jenkins. As we discussed in the earlier post, a commit is one of many events on a repo that can trigger a webhook. What you do inside jenkins once you’ve triggered is up to you, but the real fun comes when you start interacting with github to take further actions on the repo (post comments, merge pull requests, reject commits etc) based on the results of your build jobs that got triggered by the initial event.
Look out for a subsequent post where I tie it all together and show you how to process, run tests for, and finally merge a pull request if successful – all automatically inside jenkins. Automation is fun!
There is a Generic Webhook Trigger plugin that can contribute values from the post content to the build.
If the post content is:
{
"app":{
"name":"GitHub API",
"url":"http://developer.github.com/v3/oauth/#oauth-authorizations-api"
}
}
You can configure it like this:
And when triggering with some post content:
curl -v -H "Content-Type: application/json" -X POST -d '{ "app":{ "name":"GitHub API", "url":"http://developer.github.com/v3/oauth/" }}' http://localhost:8080/jenkins/generic-webhook-trigger/invoke?token=sometoken
It will resolv variables and make them available in the build job.
{
"status":"ok",
"data":{
"triggerResults":{
"free":{
"id":2,
"regexpFilterExpression":"",
"regexpFilterText":"",
"resolvedVariables":{
"app_name":"GitHub API",
"everything_app_url":"http://developer.github.com/v3/oauth/",
"everything":"{\"app\":{\"name\":\"GitHub API\",\"url\":\"http://developer.github.com/v3/oauth/\"}}",
"everything_app_name":"GitHub API"
},
"searchName":"",
"searchUrl":"",
"triggered":true,
"url":"queue/item/2/"
}
}
}
}
To be used in another external script, we need the list of repositories hosted in a git repository server. We have GitWeb also enabled on the server.
Any one know if GitWeb exposes some API through which we can get the list of repositories ? Like GitBlit RPC (http://gitblit.com/rpc.html like https://your.glitblit.url/rpc?req=LIST_REPOSITORIES) ?
Thanks.
No, from what I can see of the gitweb.cgi (from gitweb/gitweb.perl) implementation, there is no RPC API with JSON return messages.
This is only visible through the web page.
In the bottom right corner there is a small button that reads: TXT
You can get the list of projects there, for example:
For sourceware, the gitweb page: https://sourceware.org/git/
The TXT button links here: https://sourceware.org/git/?a=project_index
It should return a list of projects which are basically
<name of the git repository> <owner>
in plain text, perfectly parseable by script.
But if you want JSON, you'd have to convert it with something like this:
$ wget -q -O- "https://sourceware.org/git/?a=project_index" \
| jq -R -n '[ inputs | split(" ")[0:2] | {"project": .[0], "owner": .[1]} ]'