I have an output :
MysqlResult = {selected,["id","first_name","last_name"],
[{1,"Matt","Williamson"},
{2,"Matt","Williamson2"}]}
how to make it look like :
XML = "
<result id='1'>
<first_name>Matt</first_name>
<last_name>Williamson</last_name>
</result>
<result id='2'>
<first_name>Matt</first_name>
<last_name>Williamson2</last_name>
</result>"
I am looking for a smart way for placing it into IQ ( ejabberd )
IQ#iq{type = result, sub_el =
[{xmlelement, "result",
[{"xmlns", ?NS_NAMES}],
[{xmlelement, "userinfo", [],
[{xmlcdata,"???"?? }]}]}]}
First extract the results element from the tuple:
{selected, _Columns, Results} = MysqlResult.
Then convert it to ejabberd's internal XML format with a list comprehension:
XML = [{xmlelement, "result", [{"id", integer_to_list(Id)}],
[{xmlelement, "first_name", [], [{xmlcdata, FirstName}]},
{xmlelement, "last_name", [], [{xmlcdata, LastName}]}]}
|| {Id, FirstName, LastName} <- Results].
And insert it into your IQ record:
IQ#iq{type = result, sub_el =
[{xmlelement, "result",
[{"xmlns", ?NS_NAMES}],
[{xmlelement, "userinfo", [],
XML}]}]}
(assuming that you want the <result/> elements as children of the <userinfo/> element)
Use xmerl to create XML in Erlang:
1> MysqlResult = {selected,["id","first_name","last_name"],
1> [{1,"Matt","Williamson"},
1> {2,"Matt","Williamson2"}]}.
{selected,["id","first_name","last_name"],
[{1,"Matt","Williamson"},{2,"Matt","Williamson2"}]}
2> {selected, _Columns, Results} = MysqlResult.
{selected,["id","first_name","last_name"],
[{1,"Matt","Williamson"},{2,"Matt","Williamson2"}]}
3> Content = [{result, [{id, Id}], [{first_name, [First]}, {last_name, [Last]}]} || {Id, First, Last} <- Results].
[{result,[{id,1}],
[{first_name,["Matt"]},{last_name,["Williamson"]}]},
{result,[{id,2}],
[{first_name,["Matt"]},{last_name,["Williamson2"]}]}]
4> xmerl:export_simple(, xmerl_xml).
["<?xml version=\"1.0\"?>",
[[["<","result",[[" ","id","=\"","1","\""]],">"],
[[["<","first_name",">"],["Matt"],["</","first_name",">"]],
[["<","last_name",">"],
["Williamson"],
["</","last_name",">"]]],
["</","result",">"]],
[["<","result",[[" ","id","=\"","2","\""]],">"],
[[["<","first_name",">"],["Matt"],["</","first_name",">"]],
[["<","last_name",">"],
["Williamson2"],
["</","last_name",">"]]],
["</","result",">"]]]]
5> io:format("~s", [v(-1)]).
<?xml version="1.0"?><result id="1"><first_name>Matt</first_name><last_name>Williamson</last_name></result><result id="2"><first_name>Matt</first_name><last_name>Williamson2</last_name></result>ok
Try to use --xml and --execute option in mysql command line client.
mysql client
The xmerl solution is absolutely fine, and probably the way to go if this is a one-off type thing.
However, if you are writing an xmpp client, even a simple one, consider using exmpp - https://github.com/processone/exmpp . You can use some of the tactics to extract data and generate XML, but in general, the helper functions (most likely within the exmpp_iq and exmpp_stanza modules) will be very handy.
exmpp isn't going anywhere either -- the alpha of ejabberd3 is using it internally (finally)
Related
Given the following json:
Full file here: https://pastebin.com/Hzt9bq2a
{
"name": "Visma Public",
"domains": [
"accountsettings.connect.identity.stagaws.visma.com",
"admin.stage.vismaonline.com",
"api.home.stag.visma.com",
"api.workbox.dk",
"app.workbox.dk",
"app.workbox.co.uk",
"authz.workbox.dk",
"connect.identity.stagaws.visma.com",
"eaccounting.stage.vismaonline.com",
"eaccountingprinting.stage.vismaonline.com",
"http://myservices-api.stage.vismaonline.com/",
"identity.stage.vismaonline.com",
"myservices.stage.vismaonline.com"
]
}
How can I transform the data to the below. Which is, to identify the domains in the format of site.SLD.TLD present and then remove the duplication of them. (Not including the subdomains, protocols or paths as illustrated below.)
{
"name": "Visma Public",
"domains": [
"workbox.co.uk",
"workbox.dk",
"visma.com",
"vismaonline.com"
]
}
I would like to do so in jq as that is what I've used to wrangled the data into this format so far, but at this stage any solution that I can run on Debian (I'm using bash) without any extraneous tooling ideally would be fine.
I'm aware that regex can be used within jq so I assume the best way is to regex out the domain and then pipe to unique however I'm unable to get anything working so far I'm currently trying this version which seems to me to need only the text transformation stage adding in somehow either during the jq process or with a run over with something like awk after the event perhaps:
jq '[.[] | {name: .name, domain: [.domains[]] | unique}]' testfile.json
This appears to be useful: https://github.com/stedolan/jq/issues/537
One solution was offered which does a regex match to extract the last two strings separated by . and call the unique function on that & works up to a point but doesn't cover site.SLD.TLD that has 2 parts. Like google.co.uk would return only co.uk with this jq for example:
jq '.domains |= (map(capture("(?<x>[[:alpha:]]+).(?<z>[[:alpha:]]+)(.?)$") | join(".")) | unique)'
A programming language is much more expressive than jq.
Try the following snippet with python3.
import json
import pprint
import urllib.request
from urllib.parse import urlparse
import os
def get_tlds():
f = urllib.request.urlopen("https://publicsuffix.org/list/effective_tld_names.dat")
content = f.read()
lines = content.decode('utf-8').split("\n")
# remove comments
tlds = [line for line in lines if not line.startswith("//") and not line == ""]
return tlds
def extract_domain(url, tlds):
# get domain
url = url.replace("http://", "").replace("https://", "")
url = url.split("/")[0]
# get tld/sld
parts = url.split(".")
suffix1 = parts[-1]
sld1 = parts[-2]
if len(parts) > 2:
suffix2 = ".".join(parts[-2:])
sld2 = parts[-3]
else:
suffix2 = suffix1
sld2 = sld1
# try the longger first
if suffix2 in tlds:
tld = suffix2
sld = sld2
else:
tld = suffix1
sld = sld1
return sld + "." + tld
def clean(site, tlds):
site["domains"] = list(set([extract_domain(url, tlds) for url in site["domains"]]))
return site
if __name__ == "__main__":
filename = "Hzt9bq2a.json"
cache_path = "tlds.json"
if os.path.exists(cache_path):
with open(cache_path, "r") as f:
tlds = json.load(f)
else:
tlds = get_tlds()
with open(cache_path, "w") as f:
json.dump(tlds, f)
with open(filename) as f:
d = json.load(f)
d = [clean(site, tlds) for site in d]
pprint.pprint(d)
with open("clean.json", "w") as f:
json.dump(d, f)
May I offer you achieving the same query with jtc: the same could be achieved in other languages (and of course in jq) - the query is mostly how to come up with the regex to satisfy your ask:
bash $ <file.json jtc -w'<domains>l:>((?:[a-z0-9]+\.)?[a-z0-9]+\.[a-z0-9]+)[^.]*$<R:' -u'{{$1}}' /\
-ppw'<domains>l:><q:' -w'[domains]:<[]>j:' -w'<name>l:'
{
"domains": [
"stagaws.visma.com",
"stage.vismaonline.com",
"stag.visma.com",
"api.workbox.dk",
"app.workbox.dk",
"workbox.co.uk",
"authz.workbox.dk"
],
"name": "Visma Public"
}
bash $
Note: it does extract only DOMAIN.TLD, as per your ask. If you like to extract DOMAIN.SLD.TLD, then the task becomes a bit less trivial.
Update:
Modified solution as per the comment: extract domain.sld.tld where 3 or more levels and domain.tld where there’s only 2
PS. I'm the creator of the jtc - JSON processing utility. This disclaimer is SO requirement.
One of the solutions presented on this page offers that:
A programming language is much more expressive than jq.
It may therefore be worthwhile pointing out that jq is an expressive, Turing-complete programming language, and that it would be as straightforward (and as tedious) to capture all the intricacies of the "Public Suffix List" using jq as any other programming language that does not already provide support for this list.
It may be useful to illustrate an approach to the problem that passes the (revised) test presented in the Q. This approach could easily be extended in any one of a number of ways:
def extract:
sub("^[^:]*://";"")
| sub("/.*$";"")
| split(".")
| (if (.[-1]|length) == 2 and (.[-2]|length) <= 3
then -3 else -2 end) as $ix
| .[$ix : ]
| join(".") ;
{name, domain: (.domains | map(extract) | unique)}
Output
{
"name": "Visma Public",
"domain": [
"visma.com",
"vismaonline.com",
"workbox.co.uk",
"workbox.dk"
]
}
Judging from your example, you don't actually want top-level domains (just one component, e.g. ".com"), and you probably don't really want second-level domains (last two components) either, because some domain registries don't operate at the TLD level. Given www.foo.com.br, you presumably want to find out about foo.com.br, not com.br.
To do that, you need to consult the Public Suffix List. The file format isn't too complicated, but it has support for wildcards and exceptions. I dare say that jq isn't the ideal language to use here — pick one that has a URL-parsing module (for extracting hostnames) and an existing Public Suffix List module (for extracting the domain parts from those hostnames).
Searched online and read through the documents, but have not been able to find an answer. I am fairly new and part of learning Ruby I wanted to make the script below.
The Script essentially does a Carrier Lookup on a list of numbers that are provided through a CSV file. The CSV file has just one row with the column header "number".
Everything runs fine UNTIL the API gives me an output that is different from the others. In this example, it tells me that one of the numbers in my file is not a valid US number. This then causes my script to stop running.
I am looking to see if there is a way to either ignore it (I read about Begin and End, but was not able to get it to work) or ideally either create a separate file with those errors or just put the data into the main file.
Any help would be much appreciated. Thank you.
Ruby Code:
require 'csv'
require 'uri'
require 'net/http'
require 'json'
number = 0
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
uri = URI("https://api.message360.com/api/v3/carrier/lookup.json?PhoneNumber=#{number}")
req = Net::HTTP::Post.new(uri)
req.basic_auth 'XXX' , 'XXX'
res = Net::HTTP.start(uri.hostname, uri.port, :use_ssl => true) {|http|
http.request(req)
}
json = JSON.parse(res.body)
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
end
File Data:
number
5556667777
9998887777
Good Response example in JSON:
{"Message360"=>{"ResponseStatus"=>1, "Carrier"=>{"ApiVersion"=>"3", "CarrierSid"=>"XXX", "AccountSid"=>"XXX", "PhoneNumber"=>"+19495554444", "Network"=>"Cellco Partnership dba Verizon Wireless - CA", "Wireless"=>"true", "ZipCode"=>"92604", "City"=>"Irvine", "Price"=>0.0003, "Status"=>"success", "DateCreated"=>"2018-05-15 23:05:15"}}}
The response that causes Script to stop:
{
"Message360": {
"ResponseStatus": 0,
"Errors": {
"Error": [
{
"Code": "ER-M360-CAR-111",
"Message": "Allowed Only Valid E164 North American Numbers.",
"MoreInfo": []
}
]
}
}
}
It would appear you can just check json["Message360"]["ResponseStatus"] first for a 0 or 1 to indicate failure or success.
I'd probably add a rescue to help catch any other errors (malformed JSON, network issue, etc.)
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
...
json = JSON.parse(res.body)
if json["Message360"]["ResponseStatus"] == 1
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
else
# handle bad response
end
rescue StandardError => e
# request failed for some reason, log e and the number?
end
I have been experimenting with Plumber in R recently, and am having success when I pass the following data using a POST request;
{"Gender": "F", "State": "AZ"}
This allows me to write a function like the following to return the data.
#* #post /score
score <- function(Gender, State){
data <- list(
Gender = as.factor(Gender)
, State = as.factor(State))
return(data)
}
However, when I try to POST an array of JSON objects, I can't seem to access the data through the function
[{"Gender":"F","State":"AZ"},{"Gender":"F","State":"NY"},{"Gender":"M","State":"DC"}]
I get the following error
{
"error": [
"500 - Internal server error"
],
"message": [
"Error in is.factor(x): argument \"Gender\" is missing, with no default\n"
]
}
Does anyone have an idea of how Plumber parses JSON? I'm not sure how to access and assign the fields to vectors to score the data.
Thanks in advance
I see two possible solutions here. The first would be a command line based approach which I assume you were attempting. I tested this on a Windows OS and used column based data.frame encoding which I prefer due to shorter JSON string lengths. Make sure to escape quotation marks correctly to avoid 'argument "..." is missing, with no default' errors:
curl -H "Content-Type: application/json" --data "{\"Gender\":[\"F\",\"F\",\"M\"],\"State\":[\"AZ\",\"NY\",\"DC\"]}" http://localhost:8000/score
# [["F","F","M"],["AZ","NY","DC"]]
The second approach is R native and has the advantage of having everything in one place:
library(jsonlite)
library(httr)
## sample data
lst = list(
Gender = c("F", "F", "M")
, State = c("AZ", "NY", "DC")
)
## jsonify
jsn = lapply(
lst
, toJSON
)
## query
request = POST(
url = "http://localhost:8000/score?"
, query = jsn # values must be length 1
)
response = content(
request
, as = "text"
, encoding = "UTF-8"
)
fromJSON(
response
)
# [,1]
# [1,] "[\"F\",\"F\",\"M\"]"
# [2,] "[\"AZ\",\"NY\",\"DC\"]"
Be aware that httr::POST() expects a list of length-1 values as query input, so the array data should be jsonified beforehand. If you want to avoid the additional package imports altogether, some system(), sprintf(), etc. magic should do the trick.
Finally, here is my plumber endpoint (living in R/plumber.R and condensed a little bit):
#* #post /score
score = function(Gender, State){
lapply(
list(Gender, State)
, as.factor
)
}
and code to fire up the API:
pr = plumber::plumb("R/plumber.R")
pr$run(port = 8000)
I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements, with the following body
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString() version of the dataframe.
Is there some way to return a dataframe as JSON using the Livy Server?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
I recommend using the built-in (albeit hard to find documentation for) magics %json and %table:
%json
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.
I am trying to get erlang-mysql-driver working, I managed to set it up and make queries but there are two things I cannot do.(https://code.google.com/archive/p/erlang-mysql-driver/issues)
(BTW, I am new to Erlang)
So Here is my code to connect MySQL.
<erl>
out(Arg) ->
mysql:start_link(p1, "127.0.0.1", "root", "azzkikr", "MyDB"),
{data, Result} = mysql:fetch(p1, "SELECT * FROM messages").
</erl>
1. I cannot get data from table.
mysql.erl doesn't contain any specific information on how to get table datas but this is the farthest I could go.
{A,B} = mysql:get_result_rows(Result),
B.
And the result was this:
ERROR erlang code threw an uncaught exception:
File: /Users/{username}/Sites/Yaws/index.yaws:1
Class: error
Exception: {badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
Req: {http_request,'GET',{abs_path,"/"},{1,1}}
Stack: [{m181,out,1,
[{file,"/Users/{username}/.yaws/yaws/default/m181.erl"},
{line,18}]},
{yaws_server,deliver_dyn_part,8,
[{file,"yaws_server.erl"},{line,2818}]},
{yaws_server,aloop,4,[{file,"yaws_server.erl"},{line,1232}]},
{yaws_server,acceptor0,2,[{file,"yaws_server.erl"},{line,1068}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]
I understand that somehow I need to get second element and use foreach to get each data but strings are returned in different format like queried string is Success but returned string is <<"Success">>.
{badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
First Question is: How do I get datas from table?
2. How to insert values into table using variables?
I can insert data into table using this method:
Msg = "Hello World",
mysql:prepare(add_message,<<"INSERT INTO messages (`message`) VALUES (?)">>),
mysql:execute(p1, add_message, [Msg]).
But there are two things I am having trouble,
1. I am inserting data without << and >> operators, because When I do Msg = << ++ "Hello World" >>, erlang throws out an exception (I think I am doing something wrong), i don't know wether they are required but without them I am able to insert data into table except this error bothers me after execution:
yaws code at /Users/{username}/Yaws/index.yaws:1 crashed or ret bad val:{updated,
{mysql_result,
[],
[],
1,
[]}}
Req: {http_request,'GET',{abs_path,"/"},{1,1}}
returned atom is updated while I commanded to insert data.
Question 2 is: How do I insert data into table in a proper way?
Error:
{badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
Tells you that returned values is:
[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]
Which obviously can't match with either {data, Data} nor {A, B}. You can obtain your data as:
<erl>
out(Arg) ->
mysql:start_link(p1, "127.0.0.1", "root", "azzkikr", "MyDB"),
{ehtml,
[{table, [{border, "1"}],
[{tr, [],
[{td, [],
case Val of
_ when is_binary(Val) -> yaws_api:htmlize(Val);
_ when is_integer(val) -> integer_to_binary(Val)
end}
|| Val <- Row
]}
|| Row <- mysql:fetch(p1, "SELECT * FROM messages")
]}
]
}.
</erl>