I have an el5 system with "tcl" and "expect", intended use for use Asterisk Call Manager / 1.0, I make a telnet connection to send messages (use of VGSMII, with vgsm_sms_tx command).
When I actually make the connection in telnet and write the commands, and the text of the messages, no problem. Instead when I do it to the script, Asterisk Call Manager is unable to send messages with special characters (example: € èé)
The O.system uses en_US.utf-8 as the encoding.
TCL should use iso-8859 (if I'm not wrong).
I tried to set up
set var1 [encoding convertto utf-8 $var0]
but nothing seems to change...
I also tried with the gsm0338
Thanks
Normally, you'd want the same encoding to be used by Tcl and by the far end program that you're using at the moment; it's pretty rare for other programs in-between (such as ssh or telnet) to do much other than carry the majority of bytes through unchanged. If the other side expects to be UTF-8, Tcl should be told to use UTF-8 on that channel. In theory, you can put the channel into binary mode and use encoding convertto utf-8 to generate the bytes to write… but that's horrible and easy to get wrong so it should be avoided when you're not doing something complicated.
It's not very well documented, but Expect's spawn IDs are (a special type of) Tcl channels. That means you can, after the spawn, do this once:
# Assuming you're not in a procedure; use $::spawn_id otherwise
fconfigure $spawn_id -encoding utf-8
and everything should Just Work™ from there on.
In the end, i convert only the messages in base64, with this works fine.
i follow this(in view mode, doesn't view anything):
http://open.voismart.it/index.php?title=VGSM_Manager_Interface&action=edit
=vgsm_sms_tx Action=
The vGSM Asterisk's channel driver provides a manager action to send Short Messages (SMS). The action is named '''vgsm_sms_tx'''.
If the message does not contain characters in the [http://www.dreamfabric.com/sms/default_alphabet.html GSM default alphabet] the message will be sent with UCS2 but the available characters will be reduced to 80.
In pre-0.21.0 releases the action was named '''VGSMsmstx'''. This name is now deprecated and vgsm_sms_tx will be supported starting from 0.21.0.
== Parameters ==
{| class="wikitable"
!Header
!Usage
!Description
|-
|'''To'''
|Mandatory
|The phone number to which to send the SMS. It may be in national format (347123456) or international format (+39347123456). The 00 or other operator-specific prefixes are not supported.
|-
|'''X-SMS-ME'''
|Optional
|Specifies the interface on which the SMS is sent. If not specified the SMS is sent on the first available interface. Huntgroups are supported using the <tt>huntgroup:name</tt> syntax, but sending will not currently be retried if there is a failure on the chosen module. Also, only sequential hunting is supported.
|-
|'''X-SMS-SMCC-Number'''
|Optional
|If present, forces the use of a specific Service Center.
|-
|'''X-SMS-Reject-Duplicates'''
|Optional
|Maps to TP-Reject-Duplicates (TP-RD), Ref. TS 100 901, §9.2.3.27
|-
|'''X-SMS-Reply-Path'''
|Optional
|Maps to TP-Reply-Path (TP-RP), Ref. TS 100 901, §9.2.3.17
|-
|'''X-SMS-Status-Report-Request'''
|Optional
|Maps to TP-Status-Report-Request (TP-SRR), Ref. TS 100 901, §9.2.3.5
|-
|'''X-SMS-Message-Reference'''
|Optional
|Maps to TP-Message-Reference (TP-MR), Ref. TS 100 901, §9.2.3.6
|-
|'''X-SMS-Validity-Period'''
|Optional
|Maps to TP-Validity-Period (TP-VP), Ref. TS 100 901, §9.2.3.12, specifies for how much time (in seconds, starting from now) the SMS message is valid and delivery should be attempted. If not specified the default value is 4 days.
|-
|'''X-SMS-Class'''
|Optional
|If specified sets the SMS class. Class 0 is used for flash SMSes, class 3 is used for normal messages. The use of other classes has to be evaluated.
|-
|'''X-SMS-Concatenate-RefID'''
|Optional
|In UDH Concatenate IE, specifies the Reference Id of the split message
|-
|'''X-SMS-Concatenate-Total-Messages'''
|Optional
|In UDH Concatenate IE, specifies the number of messages in which the main message is split
|-
|'''X-SMS-Concatenate-Sequence-Number'''
|Optional
|In UDH Concatenate IE, specifies the sequence number of this messages
|-
|'''Content-Type'''
|Optional
|Defines the content type; Only ''text/plain'' is currently supported
|-
|'''Content-Transfer-Encoding'''
|Optional
|Defines the content encoding, valid values are:
* ''7bit'': 7-bit ASCII text
* ''hex'': Hex-Encoded text
* ''base64'': Base64-encoded text
* ''quoted-printable'': Quoted-printable escaping
|-
|'''Content'''
'''Content2'''
'''ContentN'''
|Mandatory
|The SMS body in the encoding specified in the Content-Transfer-Encoding or 7-bit ASCII if that header is missing.
|-
|}
''IMPORTANT'': Asterisk Manager Interface does NOT support line lengths greater that 80 characters, including the header name, thus, it is mandatory to split the '''Content''' header in more headers with at most 65 characters each, unfortunately the splitting is supported starting from vstuff 1.0.0 which is not yet released. Please use a snapshot in the meantime.
''FIXME!!'' This statement needs to be confirmed. Asterisk version 1.4.14 allows a message of 160 char in one single Content: line. Will a message of size > 160 trigger more than 1 SMS?
===Response statuses===
=====Success=====
* 201 Message Sent
=====Temporary failures =====
* 400 Network out of order
* 401 Module is not ready
* 402 Module is not registered
* 403 Module is already sending a message
* 404 Cannot find an available module
* 405 Cannot allocate message
* 406 Out of memory
=====Permanent failures=====
* 501 Cannot find module
* 502 Services Center number not set
* 503 Cannot open iconv context
* 504 Invalid Content-Type
* 505 Unsupported content-Type
* 506 Unsupported Content-Transfer-Encoding
* 507 Charset conversion error
* 508 Cannot find huntgroup
* 509 Content: header missing
* 510 To: header missing
* 511 Message too big
* 512 Unspecified message preparation error
== Example of a SMS sending session ==
===Authentication===
[root#voismart-4-000000 chan_vgsm]# telnet localhost 5038
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Asterisk Call Manager/1.0
Action: login
Username: sms
Secret: sms
Response: Success
Message: Authentication accepted
===Simple ASCII message===
Action: vgsm_sms_tx
To: +393471234567
Content-type: text/plain; charset=ASCII
Content: Ciao, questo e' un SMS. Niente caratteri 8-bit, qui.
Status: 201
X-SMS-Reference: 22
Response: Success
Message: Message sent
===UTF-8 encoded message with characters in the GSM alphabet===
Action: vgsm_sms_tx
To: +393471234567
X-SMS-ME: vodafone
X-SMS-Class: 3
X-SMS-SMCC-Number: +393492000200
Content-type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64
Content: VGVzdCBVVEYtOCBlbmNvZGluZyB3aXRoIGNoYXJhY3RlcnMgaW4gdGhlIEdTTSBhbHBoY
Content2: WJldC4gQWNjZW50czogw6DDqMOsw7LDuSwgR3JlZWsgTGV0dGVyczogzqbOk86bzqnOo
Content3: M6ozqPOmM6eLCBPdGhlcjogwqXCo8OHw5jDuMOFw6XigqzDhsOmw5/DicKkCg==
===UTF-8 encoded message with characters outside the GSM alphabet===
Action: vgsm_sms_tx
To: +393471234567
X-SMS-ME: vodafone
X-SMS-Class: 3
X-SMS-SMCC-Number: +393492000200
Content-type: text/plain; charset=UTF-8
Content-Transfer-Encoding: base64
Content: Q2hhcnMgb3V0c2lkZSBHU00gY2hhcnNldC4gQXJhYjog27Hbstuz27Tbtdu227fbuNu5L
Content2: CBIZWI6INeQ15HXkteT15TXldeW15fXmNeZCg==
===Concatenated messages===
Action: vgsm_sms_tx
To: +393471234567
X-SMS-ME: vodafone
X-SMS-Concatenate-RefID: 58
X-SMS-Concatenate-Total-Messages: 2
X-SMS-Concatenate-Sequence-Number: 1
Content-type: text/plain; charset=ASCII
Content: This is message part 1 of 2, that will be followed by part 2 of 2 which
Content2: will be sent later.
Action: vgsm_sms_tx
To: +393471234567
X-SMS-ME: vodafone
X-SMS-Concatenate-RefID: 58
X-SMS-Concatenate-Total-Messages: 2
X-SMS-Concatenate-Sequence-Number: 2
Content-type: text/plain; charset=ASCII
Content: This is part 2 of 2, the message is now complete. We can thus send up
Content2: to 255 parts for a total of 40,800 charaters (for just 38 Euros!)
===Sending message using a huntgroup===
Action: vgsm_sms_tx
To: +34600123456
X-SMS-ME: huntgroup:safaricom
Content-type: text/plain; charset=ASCII
Content: This message is sent using one of the several ME in the group safaricom
=vgsm_sms_rx Event=
On reception of an inbound SMS (SMS-DELIVERY) the message will also be reported as a manager event, however, acknowledgment still relies on SMS spooler to handle the message. This event is generated starting from 0.21.0
Here follows an example of a received SMS via the manager interface:
Event: vgsm_sms_rx
Privilege: call,all
Received: from GSM module vodafone2, registered on 22210 (Vodafone, Italy); Wed, 20 Jun 2007 19:40:14 +0200
From: <+393471234567#sms.voismart.it>
Subject: SMS message
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: base64
Date: Wed, 20 Jun 2007 19:39:25 +0200
X-SMS-Message-Type: SMS-DELIVER
X-SMS-Sender-NP: ISDN telephony
X-SMS-Sender-TON: International
X-SMS-Sender-Number: +393471234567
X-SMS-SMCC-NP: ISDN telephony
X-SMS-SMCC-TON: International
X-SMS-SMCC-Number: +393492000429
X-SMS-More-Messages-To-Send: yes
X-SMS-Reply-Path: no
X-SMS-User-Data-Header-Indicator: no
X-SMS-Status-Report-Indication: no
Content: SG8gY2hpYW1hdG8gYWxsZSAxOTozOSBkZWwgMjAvMDYvMDcuIEluZm9ybWF6aW9uZSBncmF0dWl0YSBkZWwgc2Vydml6aW8gQ0hJQU1BTUkgZGkgVm9kYWZvbmUu
=vgsm_me_state Event=
Whenever a ME (GSM module) changes working state, an event is generated. Here is an example of such type of events:
Event: vgsm_me_state
Privilege: call,all
X-vGSM-ME-State: POWERING_OFF
X-vGSM-ME-Old-State: READY
X-vGSM-ME-State-Change-Reason: Asterisk shutdown
The currently implemented states are:
CLOSED
OFF
POWERING_ON
POWERING_OFF
RESETTING
WAITING_INITIALIZATION
INITIALIZING
READY
WAITING_SIM
WAITING_PIN
FAILED
=vgsm_net_state Event=
Whenever the registration status of a GSM module changes, a '''vgsm_net_state''' event is generated. This event is available starting from 0.21.0.
Here follows an example of such event:
Event: vgsm_net_state
Privilege: call,all
X-vGSM-GSM-Registration: REGISTERED_HOME
Valid registration states are:
* NOT_SEARCHING
* NOT_REGISTERED
* REGISTERED_HOME
* UNKNOWN
* REGISTRATION_DENIED
* REGISTERED_ROAMING
so I'm trying to use and Arduino Due and a SIM7600 LTE Shield to send a GET request to a server. I've tried multiple servers to no avail and I'm not really sure what I'm doing wrong. Below are my AT commands
19:34:00.607 -> AT+CHTTPACT="website.co.uk",80
19:34:00.710 -> +CHTTPACT: REQUEST
19:34:07.533 -> GET website.co.uk/4gtest.php HTTP/1.0
19:34:12.302 -> Host: website.co.uk
19:34:19.101 -> Content-Length: 42
19:34:28.000 ->
19:34:28.000 -> OK
And below is the response:
19:34:28.581 -> +CHTTPACT: DATA,295
19:34:28.581 -> http/1.1 400 bad request
19:34:28.581 -> server: nginx
19:34:28.581 -> date: tue, 04 feb 2020 19:34:27 gmt
19:34:28.581 -> content-type: text/html
19:34:28.581 -> content-length: 150
19:34:28.581 -> connection: close
19:34:28.581 ->
19:34:28.581 -> <html>
19:34:28.581 -> <head><title>400 Bad Request</title></head>
19:34:28.615 -> <body>
19:34:28.615 -> <center><h1>400 Bad Request</h1></center>
19:34:28.615 -> <hr><center>nginx</center>
19:34:28.615 -> </body>
19:34:28.615 -> </html>
19:34:28.648 ->
19:34:28.648 -> +CHTTPACT: 0
There is definitely an internet connection as it returns custom error pages from the servers but I'm not sure why it can't get the pages I want.
Any help would really be appreciated
Thanks
I've been able to get a server response with the following request:
GET /4gtest.php HTTP/1.1<CR><LF>
Host: website.co.ukCR><LF>
<CR><LF>
<CR><LF>
I got Error 404 response, probably because it was a test page that had currently been removed. Anyway it is not Error 400 response meaning that, at least, the request is not malformed.
Some description about the request:
After GET command, only the path is expected (as correctly suggested by user #juraj's comment)
Since your error response specified HTTP/1.1, I used the same version for my request
The hostname is passed in the line Host: website.co.uk
Please note that each line is terminated with the <CR> + <LF> couple of characters (carriage return, 0x13 ASCII, and line feed, 0x10 ASCII). It is important to specify it because in your question it is not clear how the lines are terminated
Also note that after the last line a double <CR> + <LF> couple terminates the header section of the request
About Content-Length
I have omitted Content-Length field in my request. As brilliantly explained in the answers to this question, this field specifies the number of octects contained in the message body, after the header. But:
The AT command log in your question did not mention any body after the header
As explained in this answer, request bodies in GET requests are not explicitely forbidden, but they are not recommended
Yes, you can send a request body with GET but it should not have any
meaning. If you give it meaning by parsing it on the server and
changing your response based on its contents, then you are ignoring
this recommendation in the HTTP/1.1 spec, section 4.3
I'm running into a problem as I attempt to automate an API process into BigQuery.
The issue is that I need the data to be in a newline delimited JSON format to go into my BigQuery database but the data I'm pulling does not do that, so I need to parse it out.
Here is a link to pastebin so you can get an idea of what the data looks like, but also, here it is just because:
{"type":"user.list","users":[{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test#gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}},{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test#mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}],"scroll_param":"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"}
The two problems are the first line:
{"type":"user.list","users":
And the final piece at the bottom:
,"scroll_param":"24bd0rac-b2f9-46b2-944a-9zz543dcd1b1"}
If you eliminate those two, you are simply left with the necessary data needed, and I know what filter is needed to parse it out to put it in newline delimited format.
You can see for yourself by playing around with this tool, but if you only copy and paste everything from that first open bracket to the close bracket on the final line, set it to "Compact Output" and apply the filter:
.[]
The result will be like what you see here, in a nice and neat newline delimited format like you see here., also here it is not in the link:
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test#gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}}
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test#mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}
So what I need is a filter I can apply in the same manner I used .[] that pull out all the text prior to the first open bracket (as I highlighted above) as well as all the text prior to the closed bracket at the end.
But here's where the final problem comes in. While I need that final piece of text out of the equation, I still do need that string of letters and numbers known as the scroll paramater. This is because in order to fully capture all the data I need in the API, I need to continuously use the new scroll paramater it generates from the command line call until all the data is in.
The initial call looks as such:
$ curl -s https://api.program.io/users/scroll -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json'
But in ordere to get all the info in, I need that scroll parameter for a seperate call that looks like:
curl -s https://api.intercom.io/users/scroll?scroll_param=foo -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json' >scroll.json
So while I need to get rid of the text in the blob that contains the paramater in order to put it in newline delimited format, I still need to extract whatever that paramater is to loop back into another script that will continue to run until it is empty.
Would love to hear any advice in working around this!
Like others who have posted comments, I won't pretend to understand the details of the specific question, but if the general question is how to use jq to emit newline-delimited JSON (that is, ensure that each JSON text is followed by a newline, and that no other (raw) newlines are added), the answer is simple: use jq with the -c option, and without the -r option.
From a cursory examination of your data, the filter
.users[]
will give you just the user data to load and the filter
.scroll_param
will return just the scroll parameter. If you put your data in a file you could invoke jq once for each filter but if you have to stream the data you could simply use the , operator to return one value after another. e.g.
.scroll_param
, .users[]
If you use that filter along with the -c option jq will generate output like
"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr",...
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf",...
presumably the script that reads the output from jq could capture the first line for use in the curl invocation and put the rest of the data into the file you load.
Hope this helps.
I have a json file with the format given below.I want to modify the file so as to add another key-value pair to it. The key should be url and the value should be www.mywebsite.co.nz extracted from the message given below. What is the easiset way to do this?
{"
Timestamp":"Mon Mar 16 21:37:22 EDT 2015","Event":"Reporting Time","Message":"load for http://xxx.xx.xx.xx:1xxxx/operations&proxy=www.mywebsite.co.nz&send=https://xxx.xx.xx.xx:xxxx/operations?event took 9426 ms (X Time: 306 ms, Y Time: 1923 ms)
StatusCode: Unknown<br>Cookies: nzh_weatherlocation=12; dax_ppv=11|NZH:home|NZH:home|NZH:home|9|undefined; _ga=GA1.4.1415798036.1426208630; _gat=1<br>Links: 225<br>Images: 24<br>Forms: 10<br>Browser: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/41.0.2272.76 Chrome/41.0.2272.76 Safari/537.36<br>CPUs: 2<br>Language: en-GB","UserInfo":"Reporting Time"}
As a combination of jq and sed:
jq ".url = \"$(jq '.Message' input.json | sed 's/.*proxy=\([^&]*\).*/\1/')\"" input.json > output.json
This consists of three steps:
jq '.Message' input.json
extracts the message part from the input JSON,
sed 's/.*proxy=\([^&]*\).*/\1/'
extracts the domain from the message, and
jq ".url = \"domainname\"" input.json > output.json
sets the .url attribute of the input json to the extracted domain name, writing the result to output.json.
I feel compelled to point out, by the way, that a domain name by itself is not technically a URL, so you may want to rethink that attribute name.
For perl users, using ojo:
perl -Mojo -E '$j=j(b("input.file")->slurp);if($j->{Message}=~m/proxy=(.*?)&/){$j->{url}=$1;say j($j)}'
decomposed:
b()->slurp - reads the input.file
j() - converts the json to perl data
if the Message contains "proxy=site&" - get the site
add to the data the url => site
j() convert to json string
and print it.
Why is the space character URL encoded to %20?
I don't see a reason why space is considered to be a reserved character.
because space is used as a separator in a lot of cases (program with arguments, HTTP commands, etc), so it often has to be escaped, with a \ in unix command line, with surroundings " in a windows command line, with %20 in URLs, etc.
in HTTP protocol, when you try to reach http://www.foo.com, your browser opens a connection to the server www.foo.com on port 80, and send the commands:
GET http://www.foo.com HTTP/1.0
Accept : text/html
The syntax is "METHOD URL HTTPVERSION"
If you tried to request http://www.foo.com/my page.html instead of http://www.foo.com/my%20page.html, the server would think "page.html" is the HTTPVersion you're looking for...
See RFC 3986 Section 2.3:
2.3. Unreserved Characters
Characters that are allowed in a URI but do not have a reserved
purpose are called unreserved. These include uppercase and lowercase
letters, decimal digits, hyphen, period, underscore, and tilde.
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
Because the Request-Line of an HTTP request is defined as:
Method (Space) Request-URI (Space) HTTP-Version CRLF
Naive HTTP servers that stricly adhere to the spec will do something like this:
splitInput = requestLine.Split(' ')
method = splitInput[0]
requestUri = splitInput[1]
httpVersion = splitInput[2]
That will break if you'd allow spaces in an URL.